Neural Operations
Integer-only neural network primitives for encrypted inference. Every activation function, every arithmetic operation – zero floating-point.
Source files: ops/neural.rs, arithmetic/mq_relu.rs, arithmetic/pade_engine.rs, arithmetic/integer_softmax.rs, arithmetic/cyclotomic_phase.rs, arithmetic/mobius_int.rs.
FHENeuralEvaluator
The unified interface for neural network operations on encrypted data.
use nine65::ops::neural::{FHENeuralEvaluator, ActivationType};
let evaluator = FHENeuralEvaluator::new(modulus, plaintext_mod);
// Or with custom softmax scale
let evaluator = FHENeuralEvaluator::with_softmax_scale(modulus, plaintext_mod, 1_000_000_000_000);
ActivationType
pub enum ActivationType {
None, // Linear pass-through
ReLU, // max(0, x) via MQ-ReLU O(1) threshold
LeakyReLU, // Configurable leak coefficient
Sigmoid, // 1/(1+exp(-x)) via Pade [4/4]
Tanh, // (exp(x)-exp(-x))/(exp(x)+exp(-x)) via Pade
Softmax, // Exact sum guarantee
GELU, // x * sigmoid(1.702x) approximation
}
Dense Layer Forward Pass
let output = evaluator.dense_forward(
&input, // &[MobiusInt]
&weights, // &[Vec<MobiusInt>]
&bias, // &[MobiusInt]
ActivationType::ReLU,
);
Computes output[i] = activation(sum_j(weights[i][j] * input[j]) + bias[i]) using MobiusInt arithmetic.
MQ-ReLU: O(1) Sign Detection
Defined in arithmetic/mq_relu.rs. Replaces traditional comparison circuits with a simple threshold check: if value > q/2, it is “negative.”
Coq-verified: MQReLU.v – sign_detection_correct, mq_relu_correct, speedup_is_2000x.
Sign Enum
pub enum Sign {
Positive, // value in [1, q/2)
Negative, // value in [q/2, q)
Zero, // value == 0
}
MQReLU Type
use nine65::arithmetic::MQReLU;
let relu = MQReLU::new(modulus);
// Sign detection (constant-time via subtle crate)
let sign = relu.detect_sign(value); // -> Sign::Positive, Negative, or Zero
// ReLU: max(0, x) -- returns value if positive, 0 otherwise
let activated = relu.apply_scalar(value);
// Apply to entire polynomial
let activated_poly = relu.apply_polynomial(&coeffs);
// Leaky ReLU: positive -> value, negative -> |x| * leak_num / leak_den
let leaky = relu.leaky_relu_scalar(value, 1, 10); // leak = 0.1
let leaky_poly = relu.leaky_relu_polynomial(&coeffs, 1, 10);
// Custom threshold
let custom = MQReLU::with_threshold(modulus, custom_threshold);
Performance: ~20ns per coefficient. The sign detection uses subtle::ConstantTimeLess and subtle::ConstantTimeEq for constant-time operation on the sensitive computation path.
PadeEngine: Transcendentals via Rational Approximation
Defined in arithmetic/pade_engine.rs. Computes exp, sin, cos, ln, sigmoid, tanh using Pade [4/4] rational functions with integer coefficients.
Coq-verified: PadeEngine.v.
PADE_SCALE Constant
pub const PADE_SCALE: i128 = 1_000_000_000; // 10^9 represents 1.0
All inputs and outputs are scaled integers. A value x in the real domain is represented as x * PADE_SCALE.
Core Functions
use nine65::arithmetic::PadeEngine;
let pade = PadeEngine::default(); // uses PADE_SCALE
// Exponential: input x (scaled), output exp(x/SCALE) * SCALE
let exp_val = pade.exp_integer(500_000_000); // exp(0.5) * 10^9
// Sine and Cosine
let sin_val = pade.sin_integer(x);
let cos_val = pade.cos_integer(x);
// Natural logarithm
let ln_val = pade.ln_integer(x); // x must be > 0
// Sigmoid: 1 / (1 + exp(-x))
let sig = pade.sigmoid_integer(x);
// Tanh: (exp(x) - exp(-x)) / (exp(x) + exp(-x))
let th = pade.tanh_integer(x);
When the exact_rational feature is enabled, the engine delegates to the exact_transcendentals backend for higher accuracy. Otherwise, it uses the built-in Pade [4/4] approximation.
Pade [4/4] Coefficients for exp(x)
P(x) = 1680 + 840x + 180x^2 + 20x^3 + x^4
Q(x) = 1680 - 840x + 180x^2 - 20x^3 + x^4
exp(x) ~ P(x) / Q(x)
| Evaluation uses Horner’s method (integer only). Accuracy: error < 10^-8 for | x | < 1 in the scaled domain. |
Performance: ~200ns per evaluation, zero drift, fully reproducible.
Via FHENeuralEvaluator
let sigmoid = evaluator.sigmoid(x); // i128 -> i128
let tanh = evaluator.tanh(x);
let exp = evaluator.exp(x);
let gelu = evaluator.gelu(x); // x * sigmoid(1.702 * x)
IntegerSoftmax: Exact Sum Guarantee
Defined in arithmetic/integer_softmax.rs. Computes softmax using Pade exp() and guarantees sum(output) == SOFTMAX_SCALE exactly.
Coq-verified: IntegerSoftmax.v.
SOFTMAX_SCALE Constant
pub const SOFTMAX_SCALE: u128 = 1_000_000_000_000; // 10^12
The sum of all softmax outputs equals this value exactly – not approximately.
Usage
use nine65::arithmetic::IntegerSoftmax;
let softmax = IntegerSoftmax::new();
// Input: logits as scaled integers
let logits = vec![1000i128, 2000, 3000];
let probs = softmax.compute(&logits);
// probs[0] + probs[1] + probs[2] == SOFTMAX_SCALE exactly
let sum: u128 = probs.iter().sum();
assert_eq!(sum, 1_000_000_000_000);
// Custom scale
let softmax = IntegerSoftmax::with_scale(1_000_000);
Algorithm
- Find max logit (integer comparison)
- Shift inputs for numerical stability:
shifted[i] = logits[i] - max - Compute
exp()via Pade [4/4] for each shifted value - Sum all exp values
- Divide each by total
- Adjust rounding to guarantee exact sum (the key innovation)
The adjustment step distributes any rounding remainder across the largest output values, ensuring the sum is always exactly SOFTMAX_SCALE.
Via FHENeuralEvaluator
let probs = evaluator.softmax(&logits); // Vec<u128>
CyclotomicPhase: Ring-Native Trigonometry
Defined in arithmetic/cyclotomic_phase.rs. The ring R_q[X]/(X^N + 1) has native trigonometric properties because X^N = -1, so X^k is a phase rotation by k * (pi/N).
Coq-verified: CyclotomicPhase.v.
No polynomial approximation needed – sine and cosine are native to the ring structure.
Types
use nine65::arithmetic::cyclotomic_phase::{CyclotomicRing, CyclotomicPolynomial};
// Create ring with primitive root detection
let ring = CyclotomicRing::new(n, q);
// Fields: ring.n, ring.q, ring.psi (primitive root), ring.psi_inv
// Create polynomial in the ring
let poly = CyclotomicPolynomial::new(coeffs, ring.clone());
let zero = CyclotomicPolynomial::zero(ring.clone());
// Create a pure phase element X^k
let phase = CyclotomicPolynomial::phase(k, ring.clone());
- Odd coefficients of a polynomial represent “sine” components
- Even coefficients represent “cosine” components
Performance: ~50ns for phase extraction (vs ~3ms for traditional polynomial approximation of trig functions).
MobiusInt: Signed Arithmetic Without Threshold Failures
Defined in arithmetic/mobius_int.rs. Separates magnitude from sign to avoid the classic M/2 threshold problem in modular signed arithmetic.
Coq-verified: MobiusInt.v.
The Problem
Traditional approach: if residue > M/2, treat as negative. After multiple chained operations, the threshold check gives wrong answers because intermediate values can wrap around unpredictably.
The Solution: Polarity Separation
use nine65::arithmetic::{MobiusInt, Polarity};
pub struct MobiusInt {
pub residue: u64, // magnitude (always non-negative)
pub polarity: Polarity, // Plus or Minus
}
pub enum Polarity {
Plus,
Minus,
}
Arithmetic
// Construction
let a = MobiusInt::from_i64(-42);
let b = MobiusInt::from_i64(7);
let zero = MobiusInt::zero();
let one = MobiusInt::one();
// Arithmetic (polarity propagates correctly)
let sum = a.add(&b); // -42 + 7 = -35
let prod = a.mul(&b); // -42 * 7 = -294 (polarity: Minus XOR Plus = Minus)
let neg = a.neg(); // 42 (polarity flipped)
// Polarity operations
let p = Polarity::Plus.xor(Polarity::Minus); // Minus
let q = Polarity::Minus.flip(); // Plus
Via FHENeuralEvaluator
// Convert residue to signed representation
let signed = evaluator.to_signed(residue);
// Convert back
let residue = evaluator.from_signed(&signed);
// Batch conversion
let signed_poly = evaluator.poly_to_signed(&coeffs);
let residue_poly = evaluator.poly_from_signed(&signed_poly);
Performance: ~15ns per operation, exact, no threshold errors even after arbitrary chaining.
Performance Summary
| Component | Time | vs Standard FHE |
|---|---|---|
| MQ-ReLU sign detection | ~20ns | 2000x faster than comparison circuit (~2ms) |
| Pade exp/sigmoid/tanh | ~200ns | 250x faster than polynomial approximation (~50ms) |
| CyclotomicPhase sin/cos | ~50ns | 60x faster than Taylor approximation (~3ms) |
| IntegerSoftmax per element | ~200ns | Exact sum (standard FHE cannot guarantee sum = 1.0) |
| MobiusInt arithmetic | ~15ns | 100% accuracy (vs 0% under chained M/2 threshold) |
Combined speedup for neural inference: 1,000 to 100,000x faster than standard FHE polynomial approximation approaches.
Where to go next
- Batch and Galois – packing multiple values for SIMD-like operations
- MANA Accelerator – parallel stream processing for neural workloads
- Cookbook – complete FHE neural network example