Loss Functions

2 loss functions for training neural networks and measuring prediction error.

Overview

Loss functions (also called cost functions or objective functions) measure how well a model's predictions match the actual target values. During training, the goal is to minimize the loss function.

Key Concept: Loss functions return a single scalar value representing the error. Lower values indicate better predictions.

Choosing a Loss Function

Loss Function Best For Output Type
nn_mse_loss Regression tasks Continuous values
nn_cross_entropy_loss Classification tasks Class probabilities

Quick Reference

Function Description
nn_mse_loss() Mean Squared Error for regression
nn_cross_entropy_loss() Cross-entropy for classification

nn_mse_loss()

nn_mse_loss(predictions: Tensor, targets: Tensor) → float

Calculates the Mean Squared Error (MSE) between predictions and target values. MSE is the average of the squared differences between predicted and actual values.

Formula

MSE = (1/n) × Σ(yᵢ - ŷᵢ)²

where n is the number of samples, yᵢ is the target, and ŷᵢ is the prediction

Parameters

Parameter Type Description
predictions Tensor Model predictions
targets Tensor Ground truth values

Returns

Type Description
float Mean squared error (scalar value ≥ 0)

Example: Simple Regression

// Simple linear regression example
let predictions = tensor([2.5, 3.8, 5.1, 7.2])
let targets = tensor([2.0, 4.0, 5.0, 7.0])

// Calculate MSE
let loss = nn_mse_loss(predictions, targets)
print("MSE Loss: " + str(loss))  // 0.0625

// Perfect predictions (loss = 0)
let perfect_pred = tensor([1.0, 2.0, 3.0])
let target_vals = tensor([1.0, 2.0, 3.0])
let zero_loss = nn_mse_loss(perfect_pred, target_vals)
print("Perfect Loss: " + str(zero_loss))  // 0.0

Example: Training Loop

// Training a simple model
let x_train = tensor([1.0, 2.0, 3.0, 4.0, 5.0])
let y_train = tensor([2.0, 4.0, 6.0, 8.0, 10.0])

// Initialize weight and bias
let w = tensor([1.5])
let b = tensor([0.0])

let learning_rate = 0.01
let epochs = 100
let epoch = 0

while epoch < epochs {
    // Forward pass: y = wx + b
    let predictions = tensor_add(
        tensor_multiply(x_train, w),
        b
    )

    // Calculate loss
    let loss = nn_mse_loss(predictions, y_train)

    if epoch % 10 == 0 {
        print("Epoch " + str(epoch) + ", Loss: " + str(loss))
    }

    // Compute gradients (simplified)
    let grad_w = autograd_compute_mse_grad(predictions, y_train, x_train)

    // Update weights
    let w = optim_sgd_step(w, grad_w, learning_rate)

    let epoch = epoch + 1
}

print("Final weight: " + str(tensor_get(w, 0)))

Use Cases

  • Regression: Predicting continuous values (prices, temperatures, distances)
  • Time series: Forecasting future values
  • Image reconstruction: Autoencoder training
  • Function approximation: Learning arbitrary mappings

Note: MSE heavily penalizes large errors due to squaring. Consider using Mean Absolute Error (MAE) if outliers are a concern (not yet implemented in Charl).

nn_cross_entropy_loss()

nn_cross_entropy_loss(predictions: Tensor, targets: Tensor) → float

Calculates the cross-entropy loss between predicted class probabilities and target labels. This is the standard loss function for classification problems.

Formula

CE = -(1/n) × Σ yᵢ × log(ŷᵢ)

where n is the number of samples, yᵢ is the true label (one-hot), and ŷᵢ is the predicted probability

Parameters

Parameter Type Description
predictions Tensor Predicted class probabilities (usually after softmax)
targets Tensor True labels (one-hot encoded or class indices)

Returns

Type Description
float Cross-entropy loss (scalar value ≥ 0)

Example: Binary Classification

// Binary classification (2 classes)
// Predictions after sigmoid
let predictions = tensor([0.9, 0.1, 0.8, 0.3, 0.7])

// True labels (0 or 1)
let targets = tensor([1.0, 0.0, 1.0, 0.0, 1.0])

// Calculate cross-entropy loss
let loss = nn_cross_entropy_loss(predictions, targets)
print("Cross-Entropy Loss: " + str(loss))

// Perfect predictions (low loss)
let perfect_pred = tensor([1.0, 0.0, 1.0])
let perfect_targets = tensor([1.0, 0.0, 1.0])
let perfect_loss = nn_cross_entropy_loss(perfect_pred, perfect_targets)
print("Perfect Loss: " + str(perfect_loss))  // Very close to 0

Example: Multi-class Classification

// Multi-class classification (3 classes)
// Network output (before softmax)
let logits = tensor([2.0, 1.0, 0.1])

// Apply softmax to get probabilities
let predictions = nn_softmax(logits)

// One-hot encoded target (class 0)
let target = tensor([1.0, 0.0, 0.0])

// Calculate loss
let loss = nn_cross_entropy_loss(predictions, target)
print("Multi-class Loss: " + str(loss))

Example: Training Classifier

// Train a simple classifier
let input_size = 4
let num_classes = 3

// Initialize weights
let weights = tensor_randn([input_size, num_classes])
let bias = tensor_zeros([num_classes])

// Sample data
let x = tensor([0.5, 1.0, 0.3, 0.8])
let target = tensor([1.0, 0.0, 0.0])  // Class 0

let learning_rate = 0.1
let epochs = 50
let epoch = 0

while epoch < epochs {
    // Forward pass
    let logits = nn_linear(x, weights, bias)
    let predictions = nn_softmax(logits)

    // Calculate loss
    let loss = nn_cross_entropy_loss(predictions, target)

    if epoch % 10 == 0 {
        print("Epoch " + str(epoch) + ", Loss: " + str(loss))
    }

    // Compute gradients
    let grad = autograd_compute_linear_grad()

    // Update weights
    let weights = optim_sgd_step(weights, grad, learning_rate)

    let epoch = epoch + 1
}

// Final predictions
let final_logits = nn_linear(x, weights, bias)
let final_preds = nn_softmax(final_logits)
print("Final predictions: ")
print("Class 0: " + str(tensor_get(final_preds, 0)))
print("Class 1: " + str(tensor_get(final_preds, 1)))
print("Class 2: " + str(tensor_get(final_preds, 2)))

Use Cases

  • Binary classification: Yes/no, spam/not spam, fraud detection
  • Multi-class classification: Image classification, digit recognition
  • Multi-label classification: Tag prediction, object detection
  • Language modeling: Next word prediction

Best Practice: Always apply softmax to network outputs before computing cross-entropy loss. The loss expects probabilities that sum to 1.

Comparing Loss Functions

Aspect MSE Cross-Entropy
Problem Type Regression Classification
Output Range Any real number Probabilities [0, 1]
Last Layer Linear Softmax
Error Scale Quadratic (squared) Logarithmic
Perfect Score 0.0 0.0
Gradient Behavior Proportional to error Large when confident & wrong

Best Practices

DO:

  • Use MSE for regression problems (continuous outputs)
  • Use cross-entropy for classification problems
  • Apply softmax before cross-entropy for multi-class
  • Monitor loss during training to check convergence
  • Normalize inputs to help loss converge faster
  • Use appropriate learning rates (too high can increase loss)

DON'T:

  • Use MSE for classification (outputs aren't meaningful)
  • Use cross-entropy for regression
  • Forget to check for NaN or inf values in loss
  • Ignore increasing loss (indicates training problems)
  • Compare loss values across different dataset sizes directly

Related Topics