Basic Optimization

Learn gradient descent by minimizing a simple quadratic function: f(x) = (x - 5)²

Problem Overview

In this example, we'll use gradient descent to find the minimum of a simple function. This demonstrates the core concepts used in machine learning optimization.

Goal: Find the value of x that minimizes f(x) = (x - 5)². The answer should be x = 5 where f(5) = 0.

Mathematical Background

  • Function: f(x) = (x - 5)²
  • Derivative: f'(x) = 2(x - 5)
  • Minimum: at x = 5 where f(5) = 0
  • Update Rule: x ← x - α × f'(x), where α is the learning rate

Step-by-Step Implementation

Step 1: Setup and Initialization

// Starting point - we'll optimize from here
let x = tensor([10.0])  // Start far from the minimum

// Hyperparameters
let learning_rate = 0.1  // Step size for updates
let max_iterations = 50  // Maximum number of steps
let tolerance = 0.001    // Stop when close enough to minimum

print("Starting optimization...")
print("Initial x: " + str(tensor_get(x, 0)))
print("Target x: 5.0")
print("")

Step 2: Define the Objective Function

// Function to compute f(x) = (x - 5)^2
fn compute_loss(x: Tensor) -> float {
    let target = tensor([5.0])
    let diff = tensor_subtract(x, target)
    let squared = tensor_multiply(diff, diff)
    return tensor_get(squared, 0)
}

// Test the function
let initial_loss = compute_loss(x)
print("Initial loss: " + str(initial_loss))  // Should be 25.0 since (10-5)^2 = 25

Step 3: Compute Gradients

// Function to compute gradient: f'(x) = 2(x - 5)
fn compute_gradient(x: Tensor) -> Tensor {
    let target = tensor([5.0])
    let diff = tensor_subtract(x, target)
    let two = tensor([2.0])
    let gradient = tensor_multiply(two, diff)
    return gradient
}

// Test gradient computation
let grad = compute_gradient(x)
print("Initial gradient: " + str(tensor_get(grad, 0)))  // Should be 10.0 = 2*(10-5)

Step 4: Optimization Loop

let iteration = 0
let converged = false

while iteration < max_iterations && !converged {
    // Compute current loss
    let loss = compute_loss(x)

    // Compute gradient
    let gradient = compute_gradient(x)

    // Print progress every 5 iterations
    if iteration % 5 == 0 {
        print("Iteration " + str(iteration))
        print("  x: " + str(tensor_get(x, 0)))
        print("  loss: " + str(loss))
        print("  gradient: " + str(tensor_get(gradient, 0)))
        print("")
    }

    // Check convergence
    if loss < tolerance {
        print("Converged at iteration " + str(iteration))
        let converged = true
    } else {
        // Update x using gradient descent: x = x - learning_rate * gradient
        let x = optim_sgd_step(x, gradient, learning_rate)
        let iteration = iteration + 1
    }
}

Step 5: Results

// Print final results
print("=== Optimization Complete ===")
print("Final x: " + str(tensor_get(x, 0)))
print("Target x: 5.0")
print("Final loss: " + str(compute_loss(x)))
print("Total iterations: " + str(iteration))

// Calculate error
let final_x_value = tensor_get(x, 0)
let error = if final_x_value > 5.0 {
    final_x_value - 5.0
} else {
    5.0 - final_x_value
}
print("Error from target: " + str(error))

Complete Program

Here's the full program you can run:

// Basic Optimization: Minimize f(x) = (x - 5)^2 using gradient descent

// Compute the objective function
fn compute_loss(x: Tensor) -> float {
    let target = tensor([5.0])
    let diff = tensor_subtract(x, target)
    let squared = tensor_multiply(diff, diff)
    return tensor_get(squared, 0)
}

// Compute gradient: df/dx = 2(x - 5)
fn compute_gradient(x: Tensor) -> Tensor {
    let target = tensor([5.0])
    let diff = tensor_subtract(x, target)
    let two = tensor([2.0])
    return tensor_multiply(two, diff)
}

// Main optimization function
fn optimize() {
    print("=== Basic Optimization Example ===")
    print("Goal: Minimize f(x) = (x - 5)^2")
    print("")

    // Initialize
    let x = tensor([10.0])
    let learning_rate = 0.1
    let max_iterations = 50
    let tolerance = 0.001

    print("Starting x: " + str(tensor_get(x, 0)))
    print("Target x: 5.0")
    print("Learning rate: " + str(learning_rate))
    print("")

    // Optimization loop
    let iteration = 0
    let converged = false

    while iteration < max_iterations && !converged {
        // Forward: compute loss
        let loss = compute_loss(x)

        // Backward: compute gradient
        let gradient = compute_gradient(x)

        // Print progress
        if iteration % 5 == 0 {
            print("Iteration " + str(iteration))
            print("  x = " + str(tensor_get(x, 0)))
            print("  loss = " + str(loss))
            print("  gradient = " + str(tensor_get(gradient, 0)))
        }

        // Check convergence
        if loss < tolerance {
            print("")
            print("Converged! Loss below tolerance.")
            let converged = true
        } else {
            // Update: gradient descent step
            let x = optim_sgd_step(x, gradient, learning_rate)
            let iteration = iteration + 1
        }
    }

    // Final results
    print("")
    print("=== Final Results ===")
    print("Iterations: " + str(iteration))
    print("Final x: " + str(tensor_get(x, 0)))
    print("Final loss: " + str(compute_loss(x)))
    print("Expected x: 5.0")
}

// Run optimization
optimize()

Expected Output

=== Basic Optimization Example ===
Goal: Minimize f(x) = (x - 5)^2

Starting x: 10.0
Target x: 5.0
Learning rate: 0.1

Iteration 0
  x = 10.0
  loss = 25.0
  gradient = 10.0

Iteration 5
  x = 6.554
  loss = 2.417
  gradient = 3.108

Iteration 10
  x = 5.430
  loss = 0.185
  gradient = 0.860

Iteration 15
  x = 5.113
  loss = 0.013
  gradient = 0.226

Iteration 20
  x = 5.030
  loss = 0.0009
  gradient = 0.060

Converged! Loss below tolerance.

=== Final Results ===
Iterations: 22
Final x: 5.024
Final loss: 0.0006
Expected x: 5.0

Understanding the Results

Gradient Descent Behavior

Iteration 0:

  • Started at x = 10.0, far from the minimum
  • Large gradient (10.0) indicates steep slope
  • Large loss (25.0) shows we're far from target

Iteration 10:

  • x = 5.430, getting close to minimum
  • Gradient (0.860) much smaller - slope is gentler
  • Loss (0.185) significantly reduced

Convergence:

  • Final x ≈ 5.024, very close to target of 5.0
  • Loss ≈ 0.0006, below tolerance threshold
  • Gradient ≈ 0.048, nearly flat

Experiments to Try

1. Different Learning Rates

// Too small (slow convergence)
let learning_rate = 0.01  // Will take many iterations

// Too large (may overshoot)
let learning_rate = 0.5   // May oscillate or diverge

// Just right
let learning_rate = 0.1   // Good balance

2. Different Starting Points

// Start close to minimum
let x = tensor([5.5])  // Converges quickly

// Start far away
let x = tensor([100.0])  // Takes more iterations

// Start on the other side
let x = tensor([0.0])  // Still finds minimum at 5.0

3. Different Target Values

// Modify the target in compute_loss() and compute_gradient()
let target = tensor([10.0])  // Find minimum at x = 10
let target = tensor([0.0])   // Find minimum at x = 0
let target = tensor([-5.0])  // Find minimum at x = -5

Key Concepts Demonstrated

Gradient Descent

Iteratively move in the direction that reduces loss, guided by gradients.

Learning Rate

Controls step size. Too small is slow, too large causes instability.

Convergence

Stop when loss is small enough or max iterations reached.

Gradients

Indicate direction and magnitude of steepest increase in loss.

Next Steps