Building a Neural Network
Complete example: Build and train a multi-layer neural network with automatic differentiation.
Overview
This tutorial demonstrates how to build a complete neural network in Charl v0.3.0 from scratch, including:
Architecture
- Input layer: 2 features
- Hidden layer 1: 4 neurons + ReLU
- Hidden layer 2: 2 neurons + ReLU
- Output layer: 1 neuron + Sigmoid
Training
- Loss: Mean Squared Error
- Optimizer: SGD
- Learning rate: 0.1
- Automatic differentiation
Complete Program
This example trains a 3-layer network on the XOR problem using automatic differentiation.
// Multi-Layer Neural Network Training
// Training data
let X = tensor([
0.0, 0.0,
0.0, 1.0,
1.0, 0.0,
1.0, 1.0
], [4, 2])
let Y = tensor([0.0, 1.0, 1.0, 0.0], [4, 1])
print("=== 3-Layer Neural Network ===")
print("")
print("Architecture: 2 -> 4 -> 2 -> 1")
print("Dataset: XOR (4 samples)")
print("")
// Layer 1: 2 -> 4
let W1 = tensor_with_grad([
0.5, -0.3, 0.2, 0.4,
-0.1, 0.6, 0.3, -0.2
], [2, 4])
let b1 = tensor_with_grad([0.1, -0.1, 0.2, -0.2], [4])
// Layer 2: 4 -> 2
let W2 = tensor_with_grad([
0.4, -0.3,
0.2, -0.1,
0.5, 0.2,
-0.3, 0.4
], [4, 2])
let b2 = tensor_with_grad([0.0, 0.1], [2])
// Layer 3: 2 -> 1
let W3 = tensor_with_grad([0.5, -0.3], [2, 1])
let b3 = tensor_with_grad([0.0], [1])
// Create optimizer
let optimizer = sgd_create(0.1)
print("Training for 200 epochs...")
print("")
// Training loop
let epoch = 0
while epoch < 200 {
// Forward pass
// Layer 1
let h1 = nn_linear(X, W1, b1)
let a1 = nn_relu(h1)
// Layer 2
let h2 = nn_linear(a1, W2, b2)
let a2 = nn_relu(h2)
// Layer 3 (output)
let h3 = nn_linear(a2, W3, b3)
let pred = nn_sigmoid(h3)
// Compute loss
let loss = nn_mse_loss(pred, Y)
// Backward pass - automatic differentiation
tensor_backward(loss)
// Update all parameters with optimizer
let params = [W1, b1, W2, b2, W3, b3]
let updated = sgd_step(optimizer, params)
W1 = updated[0]
b1 = updated[1]
W2 = updated[2]
b2 = updated[3]
W3 = updated[4]
b3 = updated[5]
// Print progress every 50 epochs
if epoch % 50 == 0 {
print("Epoch " + str(epoch) + ": Loss = " + str(tensor_item(loss)))
}
epoch = epoch + 1
}
print("")
print("Training complete!")
print("")
// Test the trained network
print("=== Final Predictions ===")
print("")
// Test each input
print("Input: [0, 0]")
let test1 = tensor([0.0, 0.0], [1, 2])
let out1_h1 = nn_relu(nn_linear(test1, W1, b1))
let out1_h2 = nn_relu(nn_linear(out1_h1, W2, b2))
let out1 = nn_sigmoid(nn_linear(out1_h2, W3, b3))
print(" Prediction: " + str(tensor_item(out1)) + " (expected: 0.0)")
print("Input: [0, 1]")
let test2 = tensor([0.0, 1.0], [1, 2])
let out2_h1 = nn_relu(nn_linear(test2, W1, b1))
let out2_h2 = nn_relu(nn_linear(out2_h1, W2, b2))
let out2 = nn_sigmoid(nn_linear(out2_h2, W3, b3))
print(" Prediction: " + str(tensor_item(out2)) + " (expected: 1.0)")
print("Input: [1, 0]")
let test3 = tensor([1.0, 0.0], [1, 2])
let out3_h1 = nn_relu(nn_linear(test3, W1, b1))
let out3_h2 = nn_relu(nn_linear(out3_h1, W2, b2))
let out3 = nn_sigmoid(nn_linear(out3_h2, W3, b3))
print(" Prediction: " + str(tensor_item(out3)) + " (expected: 1.0)")
print("Input: [1, 1]")
let test4 = tensor([1.0, 1.0], [1, 2])
let out4_h1 = nn_relu(nn_linear(test4, W1, b1))
let out4_h2 = nn_relu(nn_linear(out4_h1, W2, b2))
let out4 = nn_sigmoid(nn_linear(out4_h2, W3, b3))
print(" Prediction: " + str(tensor_item(out4)) + " (expected: 0.0)")
Expected Output
=== 3-Layer Neural Network ===
Architecture: 2 -> 4 -> 2 -> 1
Dataset: XOR (4 samples)
Training for 200 epochs...
Epoch 0: Loss = 0.256
Epoch 50: Loss = 0.124
Epoch 100: Loss = 0.042
Epoch 150: Loss = 0.015
Training complete!
=== Final Predictions ===
Input: [0, 0]
Prediction: 0.023 (expected: 0.0)
Input: [0, 1]
Prediction: 0.981 (expected: 1.0)
Input: [1, 0]
Prediction: 0.976 (expected: 1.0)
Input: [1, 1]
Prediction: 0.019 (expected: 0.0)
Step-by-Step Explanation
Step 1: Initialize Parameters
// Use tensor_with_grad() to enable automatic differentiation
let W1 = tensor_with_grad([...], [2, 4])
let b1 = tensor_with_grad([...], [4])
All parameters that need gradients must be created with tensor_with_grad()
Step 2: Forward Pass
// Layer by layer computation
let h1 = nn_linear(X, W1, b1)
let a1 = nn_relu(h1)
let h2 = nn_linear(a1, W2, b2)
let a2 = nn_relu(h2)
let pred = nn_sigmoid(nn_linear(a2, W3, b3))
Each layer: linear transformation followed by activation
Step 3: Compute Loss
let loss = nn_mse_loss(pred, Y)
Mean Squared Error measures prediction quality
Step 4: Backward Pass
tensor_backward(loss)
Automatically computes gradients for ALL parameters (W1, b1, W2, b2, W3, b3)
Step 5: Update Parameters
let params = [W1, b1, W2, b2, W3, b3]
let updated = sgd_step(optimizer, params)
// Reassign updated tensors
W1 = updated[0]
b1 = updated[1]
// ... etc
Optimizer uses computed gradients to update parameters
Key Concepts
Automatic Differentiation
Charl automatically builds a computation graph and computes gradients through the entire network with a single tensor_backward() call.
No manual gradient calculations needed!
Multi-Layer Networks
Stack multiple layers to learn complex non-linear patterns. Each layer transforms the input in a different way.
More layers = more representational power
Activation Functions
ReLU and Sigmoid introduce non-linearity, allowing the network to learn complex decision boundaries.
Without activations, multiple layers = one layer
Optimization
SGD optimizer iteratively adjusts parameters to minimize loss. Learning rate controls the step size.
Smaller learning rate = slower but more stable
Experiments to Try
1. Try Different Architectures
// Deeper network: 2 -> 8 -> 4 -> 2 -> 1
let W1 = tensor_with_grad([...], [2, 8])
let W2 = tensor_with_grad([...], [8, 4])
let W3 = tensor_with_grad([...], [4, 2])
let W4 = tensor_with_grad([...], [2, 1])
2. Try Adam Optimizer
let optimizer = adam_create(0.01)
let updated = adam_step(optimizer, params)
Adam often converges faster than SGD
3. Try Tanh Activation
let a1 = nn_tanh(h1) // Instead of nn_relu(h1)
Different activations can affect learning dynamics
4. Vary Learning Rate
let optimizer = sgd_create(0.01) // Slower
// or
let optimizer = sgd_create(0.5) // Faster (may be unstable)