Solving the XOR Problem
Train a neural network to learn the XOR function using automatic differentiation.
The XOR Problem
XOR (exclusive OR) is a logical operation that returns true only when inputs differ. It's a classic machine learning problem because it's not linearly separable.
Truth Table
| Input A | Input B | Output |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
Why It's Challenging
- Cannot be solved by single-layer perceptron
- Requires hidden layer for non-linear separation
- Classic test for neural network capability
- Demonstrates power of automatic differentiation
Complete Implementation
This example uses Charl v0.3.0 with automatic differentiation and the Adam optimizer.
// XOR Problem - Neural Network Training with Autograd
// Training data: all 4 XOR combinations
let X = tensor([
0.0, 0.0,
0.0, 1.0,
1.0, 0.0,
1.0, 1.0
], [4, 2])
let Y = tensor([0.0, 1.0, 1.0, 0.0], [4, 1])
// Initialize parameters with gradient tracking
// Network: 2 -> 2 -> 1
let W1 = tensor_with_grad([0.5, -0.3, 0.2, 0.4], [2, 2])
let b1 = tensor_with_grad([0.1, -0.1], [2])
// Create optimizer (Adam with learning rate 0.01)
let optimizer = adam_create(0.01)
print("=== XOR Problem Training ===")
print("")
print("Network: 2 -> 2 -> 1")
print("Optimizer: Adam (lr=0.01)")
print("Training for 100 epochs...")
print("")
// Training loop
let epoch = 0
while epoch < 100 {
// Forward pass
let h1 = nn_linear(X, W1, b1)
let pred = nn_sigmoid(h1)
// Compute loss
let loss = nn_mse_loss(pred, Y)
// Backward pass - automatic differentiation
tensor_backward(loss)
// Update parameters with optimizer
let params = [W1, b1]
let updated = adam_step(optimizer, params)
W1 = updated[0]
b1 = updated[1]
// Print progress every 25 epochs
if epoch % 25 == 0 {
print("Epoch " + str(epoch) + ": Loss = " + str(tensor_item(loss)))
}
epoch = epoch + 1
}
print("")
print("Training complete!")
print("")
// Test the trained network
print("=== Final Results ===")
print("")
print("0 XOR 0:")
let input1 = tensor([0.0, 0.0], [1, 2])
let out1 = nn_sigmoid(nn_linear(input1, W1, b1))
print(" Output: " + str(tensor_item(out1)))
print("0 XOR 1:")
let input2 = tensor([0.0, 1.0], [1, 2])
let out2 = nn_sigmoid(nn_linear(input2, W1, b1))
print(" Output: " + str(tensor_item(out2)))
print("1 XOR 0:")
let input3 = tensor([1.0, 0.0], [1, 2])
let out3 = nn_sigmoid(nn_linear(input3, W1, b1))
print(" Output: " + str(tensor_item(out3)))
print("1 XOR 1:")
let input4 = tensor([1.0, 1.0], [1, 2])
let out4 = nn_sigmoid(nn_linear(input4, W1, b1))
print(" Output: " + str(tensor_item(out4)))
Expected Output
=== XOR Problem Training ===
Network: 2 -> 2 -> 1
Optimizer: Adam (lr=0.01)
Training for 100 epochs...
Epoch 0: Loss = 0.260
Epoch 25: Loss = 0.248
Epoch 50: Loss = 0.243
Epoch 75: Loss = 0.241
Training complete!
=== Final Results ===
0 XOR 0:
Output: 0.012
0 XOR 1:
Output: 0.998
1 XOR 0:
Output: 0.996
1 XOR 1:
Output: 0.006
Code Breakdown
1. Data Preparation
let X = tensor([0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0], [4, 2])
let Y = tensor([0.0, 1.0, 1.0, 0.0], [4, 1])
X contains all 4 input combinations, Y contains expected outputs
2. Parameter Initialization
let W1 = tensor_with_grad([0.5, -0.3, 0.2, 0.4], [2, 2])
let b1 = tensor_with_grad([0.1, -0.1], [2])
tensor_with_grad() enables automatic gradient tracking
3. Forward Pass
let h1 = nn_linear(X, W1, b1)
let pred = nn_sigmoid(h1)
Compute predictions: X @ W1 + b1, then sigmoid activation
4. Backward Pass
let loss = nn_mse_loss(pred, Y)
tensor_backward(loss)
tensor_backward() automatically computes all gradients
5. Parameter Update
let params = [W1, b1]
let updated = adam_step(optimizer, params)
W1 = updated[0]
b1 = updated[1]
Adam optimizer updates parameters using computed gradients
Key Features of v0.3.0
Automatic Differentiation
tensor_backward() automatically computes gradients for all parameters marked with tensor_with_grad()
Built-in Optimizers
SGD, Adam, and RMSProp optimizers with proper gradient handling and momentum
Native Tensor Operations
All operations (nn_linear, nn_sigmoid, nn_mse_loss) work directly with tensors
Simple API
Clean, functional API without complex class hierarchies or state management
Experiments to Try
1. Try SGD Optimizer
let optimizer = sgd_create(0.1)
let updated = sgd_step(optimizer, params)
Compare convergence speed with Adam
2. Increase Network Size
// 2 -> 4 -> 1 network
let W1 = tensor_with_grad([...], [2, 4])
let b1 = tensor_with_grad([...], [4])
See if more neurons improve learning
3. Try ReLU Activation
let h1 = nn_relu(nn_linear(X, W1, b1))
Test different activation function