// from 2000 GPUs to 384KB of SRAM

PyTorch: The Quiet Revolution in Python Numerical Computing

I replaced NumPy with PyTorch in my daily code. FFT is now 1700× faster. Gradient computation is 7600× faster. Here's what nobody is talking about.

NumPy has been the backbone of numerical Python for decades. But there's a shift happening that few are discussing: PyTorch is becoming a superior general-purpose numerical computing library—not just for machine learning.

The insight is simple but powerful: when you write your numerical code in PyTorch, you get GPU acceleration for free. One line of code—x = x.to('cuda')—and your existing algorithm runs on your GPU.

I ran benchmarks on an NVIDIA RTX PRO 6000 Blackwell (96GB VRAM) to quantify this. The results surprised even me.

The Benchmark Results

2D FFT (4096×4096)
1714×
1032ms → 0.60ms
Autograd vs Finite Diff
7633×
1125ms → 0.15ms
Reductions (10M elements)
223×
8.8ms → 0.04ms
Heat Equation PDE
89×
816ms → 9.1ms

These aren't cherry-picked. Here's the full breakdown:

Benchmark NumPy (CPU) PyTorch (GPU) Speedup
Matrix mult (4096×4096) 46.70 ms 1.69 ms 28×
Element-wise (100M) 157.20 ms 2.98 ms 53×
SVD (2048×2048) 997.10 ms 167.88 ms
Eigendecomp (2048×2048) 390.46 ms 19.17 ms 20×
1D FFT (4M samples) 88.73 ms 0.05 ms 1743×
2D FFT (4096×4096) 1032.31 ms 0.60 ms 1714×
Batched matmul (1024×256²) 596.04 ms 3.56 ms 167×

The Code is Nearly Identical

Here's a heat equation PDE solver. First, the NumPy version:

def heat_equation_numpy(u, alpha, dx, dt, steps):
    factor = alpha * dt / (dx ** 2)
    
    for _ in range(steps):
        laplacian = (
            np.roll(u, 1, axis=0) + np.roll(u, -1, axis=0) +
            np.roll(u, 1, axis=1) + np.roll(u, -1, axis=1) - 4 * u
        )
        u = u + factor * laplacian
    
    return u

Now the PyTorch version:

def heat_equation_pytorch(u, alpha, dx, dt, steps):
    factor = alpha * dt / (dx ** 2)
    
    for _ in range(steps):
        laplacian = (
            torch.roll(u, 1, dims=0) + torch.roll(u, -1, dims=0) +
            torch.roll(u, 1, dims=1) + torch.roll(u, -1, dims=1) - 4 * u
        )
        u = u + factor * laplacian
    
    return u

The difference? nptorch, axisdims. That's it.

And to run it on GPU:

u = u.to('cuda')  # That's it. Same code now runs on GPU.
result = heat_equation_pytorch(u, alpha, dx, dt, steps)

Result: 89× faster on a 2048×2048 grid with 100 time steps.

The Four Advantages

1. Write Once, Accelerate Anywhere

The same PyTorch code runs on:

NumPy code? CPU only. Forever.

2. Automatic Differentiation

This is the hidden superpower. PyTorch gives you exact gradients on any computation with a single call to y.backward().

With NumPy, you'd need to either derive gradients by hand or use finite differences—which is slow and numerically unstable.

I benchmarked a complex function with 16,384 gradient components:

This enables optimization problems, sensitivity analysis, physics-informed computing, and inverse problems—all for free.

3. Future-Proof

PyTorch is actively developed with massive resources:

Your code today will be faster tomorrow—automatically.

I tested torch.compile() on a simple operation chain:

4. ML-Ready

When you need to add ML to your pipeline:

No conversion friction. Your numerical computations and ML models speak the same language.

When NumPy Still Wins

To be fair, PyTorch isn't always faster on CPU. In my benchmarks, NumPy was faster for:

NumPy's CPU kernels are highly optimized (often calling into MKL/OpenBLAS). PyTorch's CPU performance is good but not always better.

The key insight: if you have a GPU, PyTorch wins almost everywhere. If you're CPU-only and have small data, NumPy is fine.

The Real Question

The question isn't "should I learn PyTorch for ML?"

It's "why am I still using NumPy for heavy computation?"

Try It Yourself

The full benchmark notebook is available on GitHub: github.com/isztld/pytorch-numpy-benchmarks

Run it on your own hardware. The specific numbers will vary, but the pattern holds: for anything computationally intensive, PyTorch + GPU is a different league.

Test Configuration