Neural Network Functions
5 functions for building neural network layers and activations
Quick Reference
| Function | Description |
|---|---|
| nn_linear() | Fully connected layer: y = xW + b |
| nn_relu() | ReLU activation: max(0, x) |
| nn_sigmoid() | Sigmoid: 1 / (1 + e^-x) |
| nn_tanh() | Hyperbolic tangent |
| nn_softmax() | Softmax for classification |
nn_linear()
nn_linear(input: Tensor, weight: Tensor, bias: Tensor) → Tensor
Applies a linear (fully connected) transformation to the input data: y = xW + b. This is the fundamental building block of neural networks.
Parameters
| Parameter | Type | Shape | Description |
|---|---|---|---|
| input | Tensor | [N] | Input features |
| weight | Tensor | [N, M] | Weight matrix |
| bias | Tensor | [M] | Bias vector |
Returns
Output tensor with shape [M]
Mathematical Definition
y = xW + b
where x is input [N], W is weights [N, M], b is bias [M], and y is output [M]
Examples
// Simple linear layer: 3 inputs -> 5 outputs
let input = tensor([1.0, 2.0, 3.0]) // [3]
let weights = tensor_randn([3, 5]) // [3, 5]
let bias = tensor_zeros([5]) // [5]
let output = nn_linear(input, weights, bias) // [5]
// Multi-layer network
let x = tensor([1.0, 0.5]) // Input [2]
let w1 = tensor_randn([2, 128]) // [2, 128]
let b1 = tensor_zeros([128]) // [128]
let w2 = tensor_randn([128, 10]) // [128, 10]
let b2 = tensor_zeros([10]) // [10]
let hidden = nn_linear(x, w1, b1) // [128]
let output_layer = nn_linear(hidden, w2, b2) // [10]
Weight Initialization
// Xavier/Glorot initialization
let fan_in = 784
let fan_out = 128
let limit = sqrt(6.0 / (fan_in + fan_out))
let weights = tensor_mul(tensor_randn([fan_in, fan_out]), limit)
let bias = tensor_zeros([fan_out])
Note: Biases are typically initialized to zeros, while weights use random initialization to break symmetry.
nn_relu()
nn_relu(x: Tensor) → Tensor
Applies the Rectified Linear Unit activation function element-wise. ReLU is the most common activation function in modern neural networks.
Mathematical Definition
ReLU(x) = max(0, x)
Output is 0 for negative inputs, and x for positive inputs
Examples
// Apply ReLU to a tensor
let x = tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
let activated = nn_relu(x)
// Result: [0.0, 0.0, 0.0, 1.0, 2.0]
// In a neural network layer
let z = nn_linear(input, weights, bias)
let activation = nn_relu(z) // Apply ReLU after linear layer
Properties
- Non-saturating: No vanishing gradient problem for positive values
- Sparse activation: Many neurons output exactly zero
- Computationally efficient: Simple thresholding operation
- Non-linear: Enables learning of complex patterns
nn_sigmoid()
nn_sigmoid(x: Tensor) → Tensor
Applies the sigmoid activation function element-wise. Outputs values between 0 and 1, commonly used for binary classification.
Mathematical Definition
σ(x) = 1 / (1 + e^(-x))
Maps any real number to (0, 1)
Examples
// Binary classification output
let logits = nn_linear(features, weights, bias)
let probability = nn_sigmoid(logits)
// Output is between 0 and 1
// Example values
let x = tensor([-2.0, 0.0, 2.0])
let sig = nn_sigmoid(x)
// Result: [0.119, 0.5, 0.881]
Use Cases
- Binary classification (output layer)
- Multi-label classification
- Gating mechanisms in LSTMs/GRUs
Note: Sigmoid can suffer from vanishing gradients. Use ReLU for hidden layers.
nn_tanh()
nn_tanh(x: Tensor) → Tensor
Applies hyperbolic tangent activation. Outputs values between -1 and 1, zero-centered unlike sigmoid.
Mathematical Definition
tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))
Maps input to (-1, 1)
Examples
let x = tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
let activated = nn_tanh(x)
// Result: [-0.964, -0.762, 0.0, 0.762, 0.964]
Advantages over Sigmoid
- Zero-centered output (helps with gradient flow)
- Stronger gradients than sigmoid
- Better for hidden layers in simple networks
nn_softmax()
nn_softmax(x: Tensor) → Tensor
Applies softmax function to convert logits into a probability distribution. Essential for multi-class classification.
Mathematical Definition
softmax(x_i) = e^(x_i) / Σ(e^(x_j))
Outputs sum to 1, all values between 0 and 1
Examples
// Multi-class classification (10 classes)
let logits = nn_linear(features, weights, bias) // [10]
let probabilities = nn_softmax(logits) // [10]
// Each element is a probability, sum = 1.0
// Example values
let x = tensor([1.0, 2.0, 3.0])
let probs = nn_softmax(x)
// Result: [0.090, 0.244, 0.665]
// Sum = 1.0
Use Cases
- Multi-class classification output layer
- Attention mechanisms in transformers
- Probability distributions over discrete choices
Best practice: Use with cross-entropy loss for classification tasks.
Complete Network Example
Building a 2-layer neural network:
// Network: 784 -> 128 -> 10 (MNIST classifier)
// Layer 1 parameters
let w1 = tensor_randn([784, 128])
let b1 = tensor_zeros([128])
// Layer 2 parameters
let w2 = tensor_randn([128, 10])
let b2 = tensor_zeros([10])
// Forward pass
fn forward(x: Tensor) -> Tensor {
// Layer 1: Linear + ReLU
let z1 = nn_linear(x, w1, b1)
let a1 = nn_relu(z1)
// Layer 2: Linear + Softmax
let z2 = nn_linear(a1, w2, b2)
let output = nn_softmax(z2)
return output
}
// Use the network
let input = tensor_randn([784]) // Flattened 28x28 image
let predictions = forward(input) // [10] class probabilities