Loss Functions
2 loss functions for training neural networks and measuring prediction error.
Overview
Loss functions (also called cost functions or objective functions) measure how well a model's predictions match the actual target values. During training, the goal is to minimize the loss function.
Key Concept: Loss functions return a single scalar value representing the error. Lower values indicate better predictions.
Choosing a Loss Function
| Loss Function | Best For | Output Type |
|---|---|---|
| nn_mse_loss | Regression tasks | Continuous values |
| nn_cross_entropy_loss | Classification tasks | Class probabilities |
Quick Reference
| Function | Description |
|---|---|
| nn_mse_loss() | Mean Squared Error for regression |
| nn_cross_entropy_loss() | Cross-entropy for classification |
nn_mse_loss()
nn_mse_loss(predictions: Tensor, targets: Tensor) → float
Calculates the Mean Squared Error (MSE) between predictions and target values. MSE is the average of the squared differences between predicted and actual values.
Formula
MSE = (1/n) × Σ(yᵢ - ŷᵢ)²
where n is the number of samples, yᵢ is the target, and ŷᵢ is the prediction
Parameters
| Parameter | Type | Description |
|---|---|---|
| predictions | Tensor | Model predictions |
| targets | Tensor | Ground truth values |
Returns
| Type | Description |
|---|---|
| float | Mean squared error (scalar value ≥ 0) |
Example: Simple Regression
// Simple linear regression example
let predictions = tensor([2.5, 3.8, 5.1, 7.2])
let targets = tensor([2.0, 4.0, 5.0, 7.0])
// Calculate MSE
let loss = nn_mse_loss(predictions, targets)
print("MSE Loss: " + str(loss)) // 0.0625
// Perfect predictions (loss = 0)
let perfect_pred = tensor([1.0, 2.0, 3.0])
let target_vals = tensor([1.0, 2.0, 3.0])
let zero_loss = nn_mse_loss(perfect_pred, target_vals)
print("Perfect Loss: " + str(zero_loss)) // 0.0
Example: Training Loop
// Training a simple model
let x_train = tensor([1.0, 2.0, 3.0, 4.0, 5.0])
let y_train = tensor([2.0, 4.0, 6.0, 8.0, 10.0])
// Initialize weight and bias
let w = tensor([1.5])
let b = tensor([0.0])
let learning_rate = 0.01
let epochs = 100
let epoch = 0
while epoch < epochs {
// Forward pass: y = wx + b
let predictions = tensor_add(
tensor_multiply(x_train, w),
b
)
// Calculate loss
let loss = nn_mse_loss(predictions, y_train)
if epoch % 10 == 0 {
print("Epoch " + str(epoch) + ", Loss: " + str(loss))
}
// Compute gradients (simplified)
let grad_w = autograd_compute_mse_grad(predictions, y_train, x_train)
// Update weights
let w = optim_sgd_step(w, grad_w, learning_rate)
let epoch = epoch + 1
}
print("Final weight: " + str(tensor_get(w, 0)))
Use Cases
- Regression: Predicting continuous values (prices, temperatures, distances)
- Time series: Forecasting future values
- Image reconstruction: Autoencoder training
- Function approximation: Learning arbitrary mappings
Note: MSE heavily penalizes large errors due to squaring. Consider using Mean Absolute Error (MAE) if outliers are a concern (not yet implemented in Charl).
nn_cross_entropy_loss()
nn_cross_entropy_loss(predictions: Tensor, targets: Tensor) → float
Calculates the cross-entropy loss between predicted class probabilities and target labels. This is the standard loss function for classification problems.
Formula
CE = -(1/n) × Σ yᵢ × log(ŷᵢ)
where n is the number of samples, yᵢ is the true label (one-hot), and ŷᵢ is the predicted probability
Parameters
| Parameter | Type | Description |
|---|---|---|
| predictions | Tensor | Predicted class probabilities (usually after softmax) |
| targets | Tensor | True labels (one-hot encoded or class indices) |
Returns
| Type | Description |
|---|---|
| float | Cross-entropy loss (scalar value ≥ 0) |
Example: Binary Classification
// Binary classification (2 classes)
// Predictions after sigmoid
let predictions = tensor([0.9, 0.1, 0.8, 0.3, 0.7])
// True labels (0 or 1)
let targets = tensor([1.0, 0.0, 1.0, 0.0, 1.0])
// Calculate cross-entropy loss
let loss = nn_cross_entropy_loss(predictions, targets)
print("Cross-Entropy Loss: " + str(loss))
// Perfect predictions (low loss)
let perfect_pred = tensor([1.0, 0.0, 1.0])
let perfect_targets = tensor([1.0, 0.0, 1.0])
let perfect_loss = nn_cross_entropy_loss(perfect_pred, perfect_targets)
print("Perfect Loss: " + str(perfect_loss)) // Very close to 0
Example: Multi-class Classification
// Multi-class classification (3 classes)
// Network output (before softmax)
let logits = tensor([2.0, 1.0, 0.1])
// Apply softmax to get probabilities
let predictions = nn_softmax(logits)
// One-hot encoded target (class 0)
let target = tensor([1.0, 0.0, 0.0])
// Calculate loss
let loss = nn_cross_entropy_loss(predictions, target)
print("Multi-class Loss: " + str(loss))
Example: Training Classifier
// Train a simple classifier
let input_size = 4
let num_classes = 3
// Initialize weights
let weights = tensor_randn([input_size, num_classes])
let bias = tensor_zeros([num_classes])
// Sample data
let x = tensor([0.5, 1.0, 0.3, 0.8])
let target = tensor([1.0, 0.0, 0.0]) // Class 0
let learning_rate = 0.1
let epochs = 50
let epoch = 0
while epoch < epochs {
// Forward pass
let logits = nn_linear(x, weights, bias)
let predictions = nn_softmax(logits)
// Calculate loss
let loss = nn_cross_entropy_loss(predictions, target)
if epoch % 10 == 0 {
print("Epoch " + str(epoch) + ", Loss: " + str(loss))
}
// Compute gradients
let grad = autograd_compute_linear_grad()
// Update weights
let weights = optim_sgd_step(weights, grad, learning_rate)
let epoch = epoch + 1
}
// Final predictions
let final_logits = nn_linear(x, weights, bias)
let final_preds = nn_softmax(final_logits)
print("Final predictions: ")
print("Class 0: " + str(tensor_get(final_preds, 0)))
print("Class 1: " + str(tensor_get(final_preds, 1)))
print("Class 2: " + str(tensor_get(final_preds, 2)))
Use Cases
- Binary classification: Yes/no, spam/not spam, fraud detection
- Multi-class classification: Image classification, digit recognition
- Multi-label classification: Tag prediction, object detection
- Language modeling: Next word prediction
Best Practice: Always apply softmax to network outputs before computing cross-entropy loss. The loss expects probabilities that sum to 1.
Comparing Loss Functions
| Aspect | MSE | Cross-Entropy |
|---|---|---|
| Problem Type | Regression | Classification |
| Output Range | Any real number | Probabilities [0, 1] |
| Last Layer | Linear | Softmax |
| Error Scale | Quadratic (squared) | Logarithmic |
| Perfect Score | 0.0 | 0.0 |
| Gradient Behavior | Proportional to error | Large when confident & wrong |
Best Practices
DO:
- Use MSE for regression problems (continuous outputs)
- Use cross-entropy for classification problems
- Apply softmax before cross-entropy for multi-class
- Monitor loss during training to check convergence
- Normalize inputs to help loss converge faster
- Use appropriate learning rates (too high can increase loss)
DON'T:
- Use MSE for classification (outputs aren't meaningful)
- Use cross-entropy for regression
- Forget to check for NaN or inf values in loss
- Ignore increasing loss (indicates training problems)
- Compare loss values across different dataset sizes directly