Performance Benchmarks

Real-world performance measurements and comparisons

Benchmark Environment

Hardware

Platform:Ubuntu 24.04 LTS
Kernel:Linux 6.14.0-35-generic
CPU:Standard x86_64

Software

Charl:v0.3.0 (Release build)
Python:3.12.3
PyTorch:2.9.0+cpu

Comparative Performance: Charl vs PyTorch

MNIST classification task (784→128→64→10 network, 109,386 parameters, 1,000 synthetic samples, 5 epochs, CPU-only)

Metric Charl v0.3.0 PyTorch 2.9.0 Ratio
Total Training Time 414.43 ms 9.255 s 22.33x
Average Time per Epoch 82.89 ms 1.851 s 22.33x
Throughput 12,064 samples/s 540 samples/s 22.33x
Peak Memory Usage 60 MB 200 MB 3.3x

Implementation Differences

Charl's performance characteristics stem from:

  • Native compiled code (Rust) vs. interpreted runtime (Python/C++)
  • Direct memory access without garbage collection overhead
  • Zero-copy tensor operations via bytemuck
  • Static compilation with LLVM optimizations

Note: PyTorch is optimized for GPU workloads and large-scale training. This comparison uses CPU-only execution with a small dataset (1,000 samples), which favors implementations with low runtime overhead.

Training with Autograd (v0.3.0)

XOR problem training - 100 epochs with automatic differentiation

Metric Value
Total Time (100 epochs) < 10 ms
Initial Loss 0.260
Final Loss 0.241
Improvement 7.2%
Parameters 17
Memory Usage ~5.7 MB

Note: Full gradient computation with backward pass and optimizer updates in under 10ms for 100 iterations. Memory usage includes computation graph tracking.

Matrix Multiplication Performance

CPU performance on different matrix sizes

Size Elements Time (measured) Memory
128 × 128 16,384 < 1 ms ~0.5 MB
512 × 512 262,144 ~50 ms ~8 MB
1024 × 1024 1,048,576 ~700 ms ~60 MB

Tensor Operations Performance

Element-wise operations (add, mul, sub) and activations (ReLU, Sigmoid)

Operation Size Elements Total Time Memory
Small 1,000 < 1 ms ~0.1 MB
Medium 10,000 < 5 ms ~1 MB
Large 100,000 ~20 ms ~10 MB
Very Large 1,000,000 ~100 ms ~62 MB

Reproducing Benchmarks

All benchmarks are reproducible from the source repository.

Charl vs PyTorch Comparison

# Clone repository
git clone https://github.com/charlcoding-stack/charlcode.git
cd charlcode

# Build Charl benchmark
cargo build --release --bin charl_mnist_bench

# Setup Python environment
python3 -m venv venv
venv/bin/pip install torch numpy

# Run comparison
./benchmarks/pytorch_comparison/mnist/compare.sh

Training Benchmark (XOR)

# Run training benchmark
./target/release/charl run benchmark_training.ch

Matrix Multiplication Benchmark

# Run matrix multiplication benchmark
./target/release/charl run benchmark_matmul.ch

Important Notes

  • • All benchmarks conducted on CPU (no GPU acceleration)
  • • PyTorch comparison uses synthetic data (1,000 samples)
  • • Results may vary based on hardware and system load
  • • Memory measurements include peak resident set size