Performance Benchmarks - Charl Programming Language

Benchmark Environment

Hardware

Platform:	Ubuntu 24.04 LTS
Kernel:	Linux 6.14.0-35-generic
CPU:	Standard x86_64

Software

Charl:	v0.3.0 (Release build)
Python:	3.12.3
PyTorch:	2.9.0+cpu

Comparative Performance: Charl vs PyTorch

MNIST classification task (784→128→64→10 network, 109,386 parameters, 1,000 synthetic samples, 5 epochs, CPU-only)

Metric	Charl v0.3.0	PyTorch 2.9.0	Ratio
Total Training Time	414.43 ms	9.255 s	22.33x
Average Time per Epoch	82.89 ms	1.851 s	22.33x
Throughput	12,064 samples/s	540 samples/s	22.33x
Peak Memory Usage	60 MB	200 MB	3.3x

Implementation Differences

Charl's performance characteristics stem from:

Native compiled code (Rust) vs. interpreted runtime (Python/C++)
Direct memory access without garbage collection overhead
Zero-copy tensor operations via bytemuck
Static compilation with LLVM optimizations

Note: PyTorch is optimized for GPU workloads and large-scale training. This comparison uses CPU-only execution with a small dataset (1,000 samples), which favors implementations with low runtime overhead.

Training with Autograd (v0.3.0)

XOR problem training - 100 epochs with automatic differentiation

Metric	Value
Total Time (100 epochs)	< 10 ms
Initial Loss	0.260
Final Loss	0.241
Improvement	7.2%
Parameters	17
Memory Usage	~5.7 MB

Note: Full gradient computation with backward pass and optimizer updates in under 10ms for 100 iterations. Memory usage includes computation graph tracking.

Matrix Multiplication Performance

CPU performance on different matrix sizes

Size	Elements	Time (measured)	Memory
128 × 128	16,384	< 1 ms	~0.5 MB
512 × 512	262,144	~50 ms	~8 MB
1024 × 1024	1,048,576	~700 ms	~60 MB

Tensor Operations Performance

Element-wise operations (add, mul, sub) and activations (ReLU, Sigmoid)

Operation Size	Elements	Total Time	Memory
Small	1,000	< 1 ms	~0.1 MB
Medium	10,000	< 5 ms	~1 MB
Large	100,000	~20 ms	~10 MB
Very Large	1,000,000	~100 ms	~62 MB

Reproducing Benchmarks

All benchmarks are reproducible from the source repository.

Charl vs PyTorch Comparison

# Clone repository
git clone https://github.com/charlcoding-stack/charlcode.git
cd charlcode

# Build Charl benchmark
cargo build --release --bin charl_mnist_bench

# Setup Python environment
python3 -m venv venv
venv/bin/pip install torch numpy

# Run comparison
./benchmarks/pytorch_comparison/mnist/compare.sh

Training Benchmark (XOR)

# Run training benchmark
./target/release/charl run benchmark_training.ch

Matrix Multiplication Benchmark

# Run matrix multiplication benchmark
./target/release/charl run benchmark_matmul.ch

Important Notes

• All benchmarks conducted on CPU (no GPU acceleration)
• PyTorch comparison uses synthetic data (1,000 samples)
• Results may vary based on hardware and system load
• Memory measurements include peak resident set size