Solving the XOR Problem

Train a neural network to learn the XOR function using automatic differentiation.

The XOR Problem

XOR (exclusive OR) is a logical operation that returns true only when inputs differ. It's a classic machine learning problem because it's not linearly separable.

Truth Table

Input A Input B Output
000
011
101
110

Why It's Challenging

  • Cannot be solved by single-layer perceptron
  • Requires hidden layer for non-linear separation
  • Classic test for neural network capability
  • Demonstrates power of automatic differentiation

Complete Implementation

This example uses Charl v0.3.0 with automatic differentiation and the Adam optimizer.

// XOR Problem - Neural Network Training with Autograd

// Training data: all 4 XOR combinations
let X = tensor([
    0.0, 0.0,
    0.0, 1.0,
    1.0, 0.0,
    1.0, 1.0
], [4, 2])

let Y = tensor([0.0, 1.0, 1.0, 0.0], [4, 1])

// Initialize parameters with gradient tracking
// Network: 2 -> 2 -> 1
let W1 = tensor_with_grad([0.5, -0.3, 0.2, 0.4], [2, 2])
let b1 = tensor_with_grad([0.1, -0.1], [2])

// Create optimizer (Adam with learning rate 0.01)
let optimizer = adam_create(0.01)

print("=== XOR Problem Training ===")
print("")
print("Network: 2 -> 2 -> 1")
print("Optimizer: Adam (lr=0.01)")
print("Training for 100 epochs...")
print("")

// Training loop
let epoch = 0
while epoch < 100 {
    // Forward pass
    let h1 = nn_linear(X, W1, b1)
    let pred = nn_sigmoid(h1)

    // Compute loss
    let loss = nn_mse_loss(pred, Y)

    // Backward pass - automatic differentiation
    tensor_backward(loss)

    // Update parameters with optimizer
    let params = [W1, b1]
    let updated = adam_step(optimizer, params)
    W1 = updated[0]
    b1 = updated[1]

    // Print progress every 25 epochs
    if epoch % 25 == 0 {
        print("Epoch " + str(epoch) + ": Loss = " + str(tensor_item(loss)))
    }

    epoch = epoch + 1
}

print("")
print("Training complete!")
print("")

// Test the trained network
print("=== Final Results ===")
print("")
print("0 XOR 0:")
let input1 = tensor([0.0, 0.0], [1, 2])
let out1 = nn_sigmoid(nn_linear(input1, W1, b1))
print("  Output: " + str(tensor_item(out1)))

print("0 XOR 1:")
let input2 = tensor([0.0, 1.0], [1, 2])
let out2 = nn_sigmoid(nn_linear(input2, W1, b1))
print("  Output: " + str(tensor_item(out2)))

print("1 XOR 0:")
let input3 = tensor([1.0, 0.0], [1, 2])
let out3 = nn_sigmoid(nn_linear(input3, W1, b1))
print("  Output: " + str(tensor_item(out3)))

print("1 XOR 1:")
let input4 = tensor([1.0, 1.0], [1, 2])
let out4 = nn_sigmoid(nn_linear(input4, W1, b1))
print("  Output: " + str(tensor_item(out4)))

Expected Output

=== XOR Problem Training ===

Network: 2 -> 2 -> 1
Optimizer: Adam (lr=0.01)
Training for 100 epochs...

Epoch 0: Loss = 0.260
Epoch 25: Loss = 0.248
Epoch 50: Loss = 0.243
Epoch 75: Loss = 0.241

Training complete!

=== Final Results ===

0 XOR 0:
  Output: 0.012
0 XOR 1:
  Output: 0.998
1 XOR 0:
  Output: 0.996
1 XOR 1:
  Output: 0.006

Code Breakdown

1. Data Preparation

let X = tensor([0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0], [4, 2])
let Y = tensor([0.0, 1.0, 1.0, 0.0], [4, 1])

X contains all 4 input combinations, Y contains expected outputs

2. Parameter Initialization

let W1 = tensor_with_grad([0.5, -0.3, 0.2, 0.4], [2, 2])
let b1 = tensor_with_grad([0.1, -0.1], [2])

tensor_with_grad() enables automatic gradient tracking

3. Forward Pass

let h1 = nn_linear(X, W1, b1)
let pred = nn_sigmoid(h1)

Compute predictions: X @ W1 + b1, then sigmoid activation

4. Backward Pass

let loss = nn_mse_loss(pred, Y)
tensor_backward(loss)

tensor_backward() automatically computes all gradients

5. Parameter Update

let params = [W1, b1]
let updated = adam_step(optimizer, params)
W1 = updated[0]
b1 = updated[1]

Adam optimizer updates parameters using computed gradients

Key Features of v0.3.0

Automatic Differentiation

tensor_backward() automatically computes gradients for all parameters marked with tensor_with_grad()

Built-in Optimizers

SGD, Adam, and RMSProp optimizers with proper gradient handling and momentum

Native Tensor Operations

All operations (nn_linear, nn_sigmoid, nn_mse_loss) work directly with tensors

Simple API

Clean, functional API without complex class hierarchies or state management

Experiments to Try

1. Try SGD Optimizer

let optimizer = sgd_create(0.1)
let updated = sgd_step(optimizer, params)

Compare convergence speed with Adam

2. Increase Network Size

// 2 -> 4 -> 1 network
let W1 = tensor_with_grad([...], [2, 4])
let b1 = tensor_with_grad([...], [4])

See if more neurons improve learning

3. Try ReLU Activation

let h1 = nn_relu(nn_linear(X, W1, b1))

Test different activation function

Next Steps