Basic Optimization
Learn gradient descent by minimizing a simple quadratic function: f(x) = (x - 5)²
Problem Overview
In this example, we'll use gradient descent to find the minimum of a simple function. This demonstrates the core concepts used in machine learning optimization.
Goal: Find the value of x that minimizes f(x) = (x - 5)². The answer should be x = 5 where f(5) = 0.
Mathematical Background
- Function: f(x) = (x - 5)²
- Derivative: f'(x) = 2(x - 5)
- Minimum: at x = 5 where f(5) = 0
- Update Rule: x ← x - α × f'(x), where α is the learning rate
Step-by-Step Implementation
Step 1: Setup and Initialization
// Starting point - we'll optimize from here
let x = tensor([10.0]) // Start far from the minimum
// Hyperparameters
let learning_rate = 0.1 // Step size for updates
let max_iterations = 50 // Maximum number of steps
let tolerance = 0.001 // Stop when close enough to minimum
print("Starting optimization...")
print("Initial x: " + str(tensor_get(x, 0)))
print("Target x: 5.0")
print("")
Step 2: Define the Objective Function
// Function to compute f(x) = (x - 5)^2
fn compute_loss(x: Tensor) -> float {
let target = tensor([5.0])
let diff = tensor_subtract(x, target)
let squared = tensor_multiply(diff, diff)
return tensor_get(squared, 0)
}
// Test the function
let initial_loss = compute_loss(x)
print("Initial loss: " + str(initial_loss)) // Should be 25.0 since (10-5)^2 = 25
Step 3: Compute Gradients
// Function to compute gradient: f'(x) = 2(x - 5)
fn compute_gradient(x: Tensor) -> Tensor {
let target = tensor([5.0])
let diff = tensor_subtract(x, target)
let two = tensor([2.0])
let gradient = tensor_multiply(two, diff)
return gradient
}
// Test gradient computation
let grad = compute_gradient(x)
print("Initial gradient: " + str(tensor_get(grad, 0))) // Should be 10.0 = 2*(10-5)
Step 4: Optimization Loop
let iteration = 0
let converged = false
while iteration < max_iterations && !converged {
// Compute current loss
let loss = compute_loss(x)
// Compute gradient
let gradient = compute_gradient(x)
// Print progress every 5 iterations
if iteration % 5 == 0 {
print("Iteration " + str(iteration))
print(" x: " + str(tensor_get(x, 0)))
print(" loss: " + str(loss))
print(" gradient: " + str(tensor_get(gradient, 0)))
print("")
}
// Check convergence
if loss < tolerance {
print("Converged at iteration " + str(iteration))
let converged = true
} else {
// Update x using gradient descent: x = x - learning_rate * gradient
let x = optim_sgd_step(x, gradient, learning_rate)
let iteration = iteration + 1
}
}
Step 5: Results
// Print final results
print("=== Optimization Complete ===")
print("Final x: " + str(tensor_get(x, 0)))
print("Target x: 5.0")
print("Final loss: " + str(compute_loss(x)))
print("Total iterations: " + str(iteration))
// Calculate error
let final_x_value = tensor_get(x, 0)
let error = if final_x_value > 5.0 {
final_x_value - 5.0
} else {
5.0 - final_x_value
}
print("Error from target: " + str(error))
Complete Program
Here's the full program you can run:
// Basic Optimization: Minimize f(x) = (x - 5)^2 using gradient descent
// Compute the objective function
fn compute_loss(x: Tensor) -> float {
let target = tensor([5.0])
let diff = tensor_subtract(x, target)
let squared = tensor_multiply(diff, diff)
return tensor_get(squared, 0)
}
// Compute gradient: df/dx = 2(x - 5)
fn compute_gradient(x: Tensor) -> Tensor {
let target = tensor([5.0])
let diff = tensor_subtract(x, target)
let two = tensor([2.0])
return tensor_multiply(two, diff)
}
// Main optimization function
fn optimize() {
print("=== Basic Optimization Example ===")
print("Goal: Minimize f(x) = (x - 5)^2")
print("")
// Initialize
let x = tensor([10.0])
let learning_rate = 0.1
let max_iterations = 50
let tolerance = 0.001
print("Starting x: " + str(tensor_get(x, 0)))
print("Target x: 5.0")
print("Learning rate: " + str(learning_rate))
print("")
// Optimization loop
let iteration = 0
let converged = false
while iteration < max_iterations && !converged {
// Forward: compute loss
let loss = compute_loss(x)
// Backward: compute gradient
let gradient = compute_gradient(x)
// Print progress
if iteration % 5 == 0 {
print("Iteration " + str(iteration))
print(" x = " + str(tensor_get(x, 0)))
print(" loss = " + str(loss))
print(" gradient = " + str(tensor_get(gradient, 0)))
}
// Check convergence
if loss < tolerance {
print("")
print("Converged! Loss below tolerance.")
let converged = true
} else {
// Update: gradient descent step
let x = optim_sgd_step(x, gradient, learning_rate)
let iteration = iteration + 1
}
}
// Final results
print("")
print("=== Final Results ===")
print("Iterations: " + str(iteration))
print("Final x: " + str(tensor_get(x, 0)))
print("Final loss: " + str(compute_loss(x)))
print("Expected x: 5.0")
}
// Run optimization
optimize()
Expected Output
=== Basic Optimization Example ===
Goal: Minimize f(x) = (x - 5)^2
Starting x: 10.0
Target x: 5.0
Learning rate: 0.1
Iteration 0
x = 10.0
loss = 25.0
gradient = 10.0
Iteration 5
x = 6.554
loss = 2.417
gradient = 3.108
Iteration 10
x = 5.430
loss = 0.185
gradient = 0.860
Iteration 15
x = 5.113
loss = 0.013
gradient = 0.226
Iteration 20
x = 5.030
loss = 0.0009
gradient = 0.060
Converged! Loss below tolerance.
=== Final Results ===
Iterations: 22
Final x: 5.024
Final loss: 0.0006
Expected x: 5.0
Understanding the Results
Gradient Descent Behavior
Iteration 0:
- Started at x = 10.0, far from the minimum
- Large gradient (10.0) indicates steep slope
- Large loss (25.0) shows we're far from target
Iteration 10:
- x = 5.430, getting close to minimum
- Gradient (0.860) much smaller - slope is gentler
- Loss (0.185) significantly reduced
Convergence:
- Final x ≈ 5.024, very close to target of 5.0
- Loss ≈ 0.0006, below tolerance threshold
- Gradient ≈ 0.048, nearly flat
Experiments to Try
1. Different Learning Rates
// Too small (slow convergence)
let learning_rate = 0.01 // Will take many iterations
// Too large (may overshoot)
let learning_rate = 0.5 // May oscillate or diverge
// Just right
let learning_rate = 0.1 // Good balance
2. Different Starting Points
// Start close to minimum
let x = tensor([5.5]) // Converges quickly
// Start far away
let x = tensor([100.0]) // Takes more iterations
// Start on the other side
let x = tensor([0.0]) // Still finds minimum at 5.0
3. Different Target Values
// Modify the target in compute_loss() and compute_gradient()
let target = tensor([10.0]) // Find minimum at x = 10
let target = tensor([0.0]) // Find minimum at x = 0
let target = tensor([-5.0]) // Find minimum at x = -5
Key Concepts Demonstrated
Gradient Descent
Iteratively move in the direction that reduces loss, guided by gradients.
Learning Rate
Controls step size. Too small is slow, too large causes instability.
Convergence
Stop when loss is small enough or max iterations reached.
Gradients
Indicate direction and magnitude of steepest increase in loss.