Neural Network Functions

Function	Description
nn_linear()	Fully connected layer: y = xW + b
nn_relu()	ReLU activation: max(0, x)
nn_sigmoid()	Sigmoid: 1 / (1 + e^-x)
nn_tanh()	Hyperbolic tangent
nn_softmax()	Softmax for classification

nn_linear()

nn_linear(input: Tensor, weight: Tensor, bias: Tensor) → Tensor

Applies a linear (fully connected) transformation to the input data: y = xW + b. This is the fundamental building block of neural networks.

Parameters

Parameter	Type	Shape	Description
input	Tensor	[N]	Input features
weight	Tensor	[N, M]	Weight matrix
bias	Tensor	[M]	Bias vector

Returns

Output tensor with shape [M]

Mathematical Definition

y = xW + b

where x is input [N], W is weights [N, M], b is bias [M], and y is output [M]

Examples

// Simple linear layer: 3 inputs -> 5 outputs
let input = tensor([1.0, 2.0, 3.0])              // [3]
let weights = tensor_randn([3, 5])                // [3, 5]
let bias = tensor_zeros([5])                      // [5]

let output = nn_linear(input, weights, bias)      // [5]

// Multi-layer network
let x = tensor([1.0, 0.5])                        // Input [2]
let w1 = tensor_randn([2, 128])                   // [2, 128]
let b1 = tensor_zeros([128])                      // [128]
let w2 = tensor_randn([128, 10])                  // [128, 10]
let b2 = tensor_zeros([10])                       // [10]

let hidden = nn_linear(x, w1, b1)                 // [128]
let output_layer = nn_linear(hidden, w2, b2)      // [10]

Weight Initialization

// Xavier/Glorot initialization
let fan_in = 784
let fan_out = 128
let limit = sqrt(6.0 / (fan_in + fan_out))
let weights = tensor_mul(tensor_randn([fan_in, fan_out]), limit)
let bias = tensor_zeros([fan_out])

Note: Biases are typically initialized to zeros, while weights use random initialization to break symmetry.

nn_relu()

nn_relu(x: Tensor) → Tensor

Applies the Rectified Linear Unit activation function element-wise. ReLU is the most common activation function in modern neural networks.

Mathematical Definition

ReLU(x) = max(0, x)

Output is 0 for negative inputs, and x for positive inputs

Examples

// Apply ReLU to a tensor
let x = tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
let activated = nn_relu(x)
// Result: [0.0, 0.0, 0.0, 1.0, 2.0]

// In a neural network layer
let z = nn_linear(input, weights, bias)
let activation = nn_relu(z)  // Apply ReLU after linear layer

Properties

Non-saturating: No vanishing gradient problem for positive values
Sparse activation: Many neurons output exactly zero
Computationally efficient: Simple thresholding operation
Non-linear: Enables learning of complex patterns

nn_sigmoid()

nn_sigmoid(x: Tensor) → Tensor

Applies the sigmoid activation function element-wise. Outputs values between 0 and 1, commonly used for binary classification.

Mathematical Definition

σ(x) = 1 / (1 + e^(-x))

Maps any real number to (0, 1)

Examples

// Binary classification output
let logits = nn_linear(features, weights, bias)
let probability = nn_sigmoid(logits)
// Output is between 0 and 1

// Example values
let x = tensor([-2.0, 0.0, 2.0])
let sig = nn_sigmoid(x)
// Result: [0.119, 0.5, 0.881]

Use Cases

Binary classification (output layer)
Multi-label classification
Gating mechanisms in LSTMs/GRUs

Note: Sigmoid can suffer from vanishing gradients. Use ReLU for hidden layers.

nn_tanh()

nn_tanh(x: Tensor) → Tensor

Applies hyperbolic tangent activation. Outputs values between -1 and 1, zero-centered unlike sigmoid.

Mathematical Definition

tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

Maps input to (-1, 1)

Examples

let x = tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
let activated = nn_tanh(x)
// Result: [-0.964, -0.762, 0.0, 0.762, 0.964]

Advantages over Sigmoid

Zero-centered output (helps with gradient flow)
Stronger gradients than sigmoid
Better for hidden layers in simple networks

nn_softmax()

nn_softmax(x: Tensor) → Tensor

Applies softmax function to convert logits into a probability distribution. Essential for multi-class classification.

Mathematical Definition

softmax(x_i) = e^(x_i) / Σ(e^(x_j))

Outputs sum to 1, all values between 0 and 1

Examples

// Multi-class classification (10 classes)
let logits = nn_linear(features, weights, bias)  // [10]
let probabilities = nn_softmax(logits)           // [10]
// Each element is a probability, sum = 1.0

// Example values
let x = tensor([1.0, 2.0, 3.0])
let probs = nn_softmax(x)
// Result: [0.090, 0.244, 0.665]
// Sum = 1.0

Use Cases

Multi-class classification output layer
Attention mechanisms in transformers
Probability distributions over discrete choices

Best practice: Use with cross-entropy loss for classification tasks.

Complete Network Example

Building a 2-layer neural network:

// Network: 784 -> 128 -> 10 (MNIST classifier)

// Layer 1 parameters
let w1 = tensor_randn([784, 128])
let b1 = tensor_zeros([128])

// Layer 2 parameters
let w2 = tensor_randn([128, 10])
let b2 = tensor_zeros([10])

// Forward pass
fn forward(x: Tensor) -> Tensor {
    // Layer 1: Linear + ReLU
    let z1 = nn_linear(x, w1, b1)
    let a1 = nn_relu(z1)

    // Layer 2: Linear + Softmax
    let z2 = nn_linear(a1, w2, b2)
    let output = nn_softmax(z2)

    return output
}

// Use the network
let input = tensor_randn([784])  // Flattened 28x28 image
let predictions = forward(input)  // [10] class probabilities

Quick Reference

nn_linear()

Parameters

Returns

Mathematical Definition

Examples

Weight Initialization

nn_relu()

Mathematical Definition

Examples

Properties

nn_sigmoid()

Mathematical Definition

Examples

Use Cases

nn_tanh()

Mathematical Definition

Examples

Advantages over Sigmoid

nn_softmax()

Mathematical Definition

Examples

Use Cases

Complete Network Example

See Also

Loss Functions

Backpropagation