Loss Functions

Overview

Loss functions (also called cost functions or objective functions) measure how well a model's predictions match the actual target values. During training, the goal is to minimize the loss function.

Key Concept: Loss functions return a single scalar value representing the error. Lower values indicate better predictions.

Choosing a Loss Function

Loss Function	Best For	Output Type
nn_mse_loss	Regression tasks	Continuous values
nn_cross_entropy_loss	Classification tasks	Class probabilities

Quick Reference

Function	Description
nn_mse_loss()	Mean Squared Error for regression
nn_cross_entropy_loss()	Cross-entropy for classification

nn_mse_loss()

nn_mse_loss(predictions: Tensor, targets: Tensor) → float

Calculates the Mean Squared Error (MSE) between predictions and target values. MSE is the average of the squared differences between predicted and actual values.

Formula

MSE = (1/n) × Σ(yᵢ - ŷᵢ)²

where n is the number of samples, yᵢ is the target, and ŷᵢ is the prediction

Parameters

Parameter	Type	Description
predictions	Tensor	Model predictions
targets	Tensor	Ground truth values

Returns

Type	Description
float	Mean squared error (scalar value ≥ 0)

Example: Simple Regression

// Simple linear regression example
let predictions = tensor([2.5, 3.8, 5.1, 7.2])
let targets = tensor([2.0, 4.0, 5.0, 7.0])

// Calculate MSE
let loss = nn_mse_loss(predictions, targets)
print("MSE Loss: " + str(loss))  // 0.0625

// Perfect predictions (loss = 0)
let perfect_pred = tensor([1.0, 2.0, 3.0])
let target_vals = tensor([1.0, 2.0, 3.0])
let zero_loss = nn_mse_loss(perfect_pred, target_vals)
print("Perfect Loss: " + str(zero_loss))  // 0.0

Example: Training Loop

// Training a simple model
let x_train = tensor([1.0, 2.0, 3.0, 4.0, 5.0])
let y_train = tensor([2.0, 4.0, 6.0, 8.0, 10.0])

// Initialize weight and bias
let w = tensor([1.5])
let b = tensor([0.0])

let learning_rate = 0.01
let epochs = 100
let epoch = 0

while epoch < epochs {
    // Forward pass: y = wx + b
    let predictions = tensor_add(
        tensor_multiply(x_train, w),
        b
    )

    // Calculate loss
    let loss = nn_mse_loss(predictions, y_train)

    if epoch % 10 == 0 {
        print("Epoch " + str(epoch) + ", Loss: " + str(loss))
    }

    // Compute gradients (simplified)
    let grad_w = autograd_compute_mse_grad(predictions, y_train, x_train)

    // Update weights
    let w = optim_sgd_step(w, grad_w, learning_rate)

    let epoch = epoch + 1
}

print("Final weight: " + str(tensor_get(w, 0)))

Use Cases

Regression: Predicting continuous values (prices, temperatures, distances)
Time series: Forecasting future values
Image reconstruction: Autoencoder training
Function approximation: Learning arbitrary mappings

Note: MSE heavily penalizes large errors due to squaring. Consider using Mean Absolute Error (MAE) if outliers are a concern (not yet implemented in Charl).

nn_cross_entropy_loss()

nn_cross_entropy_loss(predictions: Tensor, targets: Tensor) → float

Calculates the cross-entropy loss between predicted class probabilities and target labels. This is the standard loss function for classification problems.

Formula

CE = -(1/n) × Σ yᵢ × log(ŷᵢ)

where n is the number of samples, yᵢ is the true label (one-hot), and ŷᵢ is the predicted probability

Parameters

Parameter	Type	Description
predictions	Tensor	Predicted class probabilities (usually after softmax)
targets	Tensor	True labels (one-hot encoded or class indices)

Returns

Type	Description
float	Cross-entropy loss (scalar value ≥ 0)

Example: Binary Classification

// Binary classification (2 classes)
// Predictions after sigmoid
let predictions = tensor([0.9, 0.1, 0.8, 0.3, 0.7])

// True labels (0 or 1)
let targets = tensor([1.0, 0.0, 1.0, 0.0, 1.0])

// Calculate cross-entropy loss
let loss = nn_cross_entropy_loss(predictions, targets)
print("Cross-Entropy Loss: " + str(loss))

// Perfect predictions (low loss)
let perfect_pred = tensor([1.0, 0.0, 1.0])
let perfect_targets = tensor([1.0, 0.0, 1.0])
let perfect_loss = nn_cross_entropy_loss(perfect_pred, perfect_targets)
print("Perfect Loss: " + str(perfect_loss))  // Very close to 0

Example: Multi-class Classification

// Multi-class classification (3 classes)
// Network output (before softmax)
let logits = tensor([2.0, 1.0, 0.1])

// Apply softmax to get probabilities
let predictions = nn_softmax(logits)

// One-hot encoded target (class 0)
let target = tensor([1.0, 0.0, 0.0])

// Calculate loss
let loss = nn_cross_entropy_loss(predictions, target)
print("Multi-class Loss: " + str(loss))

Example: Training Classifier

// Train a simple classifier
let input_size = 4
let num_classes = 3

// Initialize weights
let weights = tensor_randn([input_size, num_classes])
let bias = tensor_zeros([num_classes])

// Sample data
let x = tensor([0.5, 1.0, 0.3, 0.8])
let target = tensor([1.0, 0.0, 0.0])  // Class 0

let learning_rate = 0.1
let epochs = 50
let epoch = 0

while epoch < epochs {
    // Forward pass
    let logits = nn_linear(x, weights, bias)
    let predictions = nn_softmax(logits)

    // Calculate loss
    let loss = nn_cross_entropy_loss(predictions, target)

    if epoch % 10 == 0 {
        print("Epoch " + str(epoch) + ", Loss: " + str(loss))
    }

    // Compute gradients
    let grad = autograd_compute_linear_grad()

    // Update weights
    let weights = optim_sgd_step(weights, grad, learning_rate)

    let epoch = epoch + 1
}

// Final predictions
let final_logits = nn_linear(x, weights, bias)
let final_preds = nn_softmax(final_logits)
print("Final predictions: ")
print("Class 0: " + str(tensor_get(final_preds, 0)))
print("Class 1: " + str(tensor_get(final_preds, 1)))
print("Class 2: " + str(tensor_get(final_preds, 2)))

Use Cases

Binary classification: Yes/no, spam/not spam, fraud detection
Multi-class classification: Image classification, digit recognition
Multi-label classification: Tag prediction, object detection
Language modeling: Next word prediction

Best Practice: Always apply softmax to network outputs before computing cross-entropy loss. The loss expects probabilities that sum to 1.

Comparing Loss Functions

Aspect	MSE	Cross-Entropy
Problem Type	Regression	Classification
Output Range	Any real number	Probabilities [0, 1]
Last Layer	Linear	Softmax
Error Scale	Quadratic (squared)	Logarithmic
Perfect Score	0.0	0.0
Gradient Behavior	Proportional to error	Large when confident & wrong

Best Practices

DO:

Use MSE for regression problems (continuous outputs)
Use cross-entropy for classification problems
Apply softmax before cross-entropy for multi-class
Monitor loss during training to check convergence
Normalize inputs to help loss converge faster
Use appropriate learning rates (too high can increase loss)

DON'T:

Use MSE for classification (outputs aren't meaningful)
Use cross-entropy for regression
Forget to check for NaN or inf values in loss
Ignore increasing loss (indicates training problems)
Compare loss values across different dataset sizes directly

Overview

Choosing a Loss Function

Quick Reference

nn_mse_loss()

Formula

Parameters

Returns

Example: Simple Regression

Example: Training Loop

Use Cases

nn_cross_entropy_loss()

Formula

Parameters

Returns

Example: Binary Classification

Example: Multi-class Classification

Example: Training Classifier

Use Cases

Comparing Loss Functions

Best Practices

Related Topics

Backpropagation

Optimizers

Neural Network Example

Neural Networks