// from 2000 GPUs to 384KB of SRAM

About

HPC and AI software engineer, started at 15 with systems at scale—from supercomputers to microcontrollers.

I started my career at the Swiss National Supercomputing Centre (CSCS), where I learned production systems from the ground up and graduated as the top technical trainee in Canton Ticino. That foundation shaped how I approach engineering: understand the hardware, optimize the stack, measure everything.

I scale distributed AI training to thousands of GPUs and deploy neural networks on devices with kilobytes of memory. At ZHAW, during my master thesis I architected PyTorch training across thousands of GPU nodes on Piz Daint and GH200 nodes on ALPS (6th fastest supercomputer globally). I've also shipped ultra-low-power inference on the MAX78002 CNN accelerator, contributing to Maxim's open-source training framework along the way.

Currently at Stadtspital Zürich, I own ML infrastructure end-to-end and develop diagnostic models for ophthalmology—including a novel attention architecture that achieved state-of-the-art performance on diabetic retinopathy segmentation.

12+
Years in HPC
2000+
GPU nodes scaled
384KB
Smallest deployment
IEEE
Published

Technical Depth

CUDA · Python · PyTorch distributed (DDP/FSDP) · NCCL optimization · quantization-aware training · Linux kernel development · bare-metal embedded C/C++ · NVIDIA H100/GH200/GB200 · Nsight Systems · MPI · OpenMP

Contact

isztl.david@gmail.com · LinkedIn · GitHub