PyTorch: The Quiet Revolution in Python Numerical Computing
I replaced NumPy with PyTorch in my daily code. FFT is now 1700× faster. Gradient computation is 7600× faster. Here's what nobody is talking about.
NumPy has been the backbone of numerical Python for decades. But there's a shift happening that few are discussing: PyTorch is becoming a superior general-purpose numerical computing library—not just for machine learning.
The insight is simple but powerful: when you write your numerical code in PyTorch, you get GPU acceleration for free. One line of code—x = x.to('cuda')—and your existing algorithm runs on your GPU.
I ran benchmarks on an NVIDIA RTX PRO 6000 Blackwell (96GB VRAM) to quantify this. The results surprised even me.
The Benchmark Results
These aren't cherry-picked. Here's the full breakdown:
| Benchmark | NumPy (CPU) | PyTorch (GPU) | Speedup |
|---|---|---|---|
| Matrix mult (4096×4096) | 46.70 ms | 1.69 ms | 28× |
| Element-wise (100M) | 157.20 ms | 2.98 ms | 53× |
| SVD (2048×2048) | 997.10 ms | 167.88 ms | 6× |
| Eigendecomp (2048×2048) | 390.46 ms | 19.17 ms | 20× |
| 1D FFT (4M samples) | 88.73 ms | 0.05 ms | 1743× |
| 2D FFT (4096×4096) | 1032.31 ms | 0.60 ms | 1714× |
| Batched matmul (1024×256²) | 596.04 ms | 3.56 ms | 167× |
The Code is Nearly Identical
Here's a heat equation PDE solver. First, the NumPy version:
def heat_equation_numpy(u, alpha, dx, dt, steps): factor = alpha * dt / (dx ** 2) for _ in range(steps): laplacian = ( np.roll(u, 1, axis=0) + np.roll(u, -1, axis=0) + np.roll(u, 1, axis=1) + np.roll(u, -1, axis=1) - 4 * u ) u = u + factor * laplacian return u
Now the PyTorch version:
def heat_equation_pytorch(u, alpha, dx, dt, steps): factor = alpha * dt / (dx ** 2) for _ in range(steps): laplacian = ( torch.roll(u, 1, dims=0) + torch.roll(u, -1, dims=0) + torch.roll(u, 1, dims=1) + torch.roll(u, -1, dims=1) - 4 * u ) u = u + factor * laplacian return u
The difference? np → torch, axis → dims. That's it.
And to run it on GPU:
u = u.to('cuda') # That's it. Same code now runs on GPU. result = heat_equation_pytorch(u, alpha, dx, dt, steps)
Result: 89× faster on a 2048×2048 grid with 100 time steps.
The Four Advantages
1. Write Once, Accelerate Anywhere
The same PyTorch code runs on:
- CPU (your laptop)
- NVIDIA GPU (CUDA)
- Apple Silicon (MPS)
- AMD GPU (ROCm)
- Intel GPU (XPU)
- Google TPU
- Future accelerators (as PyTorch adds support)
NumPy code? CPU only. Forever.
2. Automatic Differentiation
This is the hidden superpower. PyTorch gives you exact gradients on any computation with a single call to y.backward().
With NumPy, you'd need to either derive gradients by hand or use finite differences—which is slow and numerically unstable.
I benchmarked a complex function with 16,384 gradient components:
- NumPy (finite differences): 1125 ms
- PyTorch autograd: 0.15 ms
- Speedup: 7633×
This enables optimization problems, sensitivity analysis, physics-informed computing, and inverse problems—all for free.
3. Future-Proof
PyTorch is actively developed with massive resources:
- Better kernels every release
- New hardware support
torch.compile()for automatic optimization- Quantization, sparsity, mixed precision
Your code today will be faster tomorrow—automatically.
I tested torch.compile() on a simple operation chain:
- Eager mode (CPU): 12.17 ms
- Compiled mode (CPU): 0.32 ms
- 38× faster with one line of code
4. ML-Ready
When you need to add ML to your pipeline:
- Data already in tensors
- Same device (no CPU↔GPU transfers)
- Seamless integration with models
No conversion friction. Your numerical computations and ML models speak the same language.
When NumPy Still Wins
To be fair, PyTorch isn't always faster on CPU. In my benchmarks, NumPy was faster for:
- Small matrix operations (< 512×512 matmul)
- Some element-wise operations at large scale on CPU
NumPy's CPU kernels are highly optimized (often calling into MKL/OpenBLAS). PyTorch's CPU performance is good but not always better.
The key insight: if you have a GPU, PyTorch wins almost everywhere. If you're CPU-only and have small data, NumPy is fine.
The question isn't "should I learn PyTorch for ML?"
It's "why am I still using NumPy for heavy computation?"
Try It Yourself
The full benchmark notebook is available on GitHub: github.com/isztld/pytorch-numpy-benchmarks
Run it on your own hardware. The specific numbers will vary, but the pattern holds: for anything computationally intensive, PyTorch + GPU is a different league.
Test Configuration
- GPU: NVIDIA RTX PRO 6000 Blackwell (96GB VRAM)
- PyTorch: 2.8.0+cu129
- NumPy: 2.1.2
- CUDA: 12.9
- Python: 3.13.5