About
HPC and AI software engineer, started at 15 with systems at scale—from supercomputers to microcontrollers.
I started my career at the Swiss National Supercomputing Centre (CSCS), where I learned production systems from the ground up and graduated as the top technical trainee in Canton Ticino. That foundation shaped how I approach engineering: understand the hardware, optimize the stack, measure everything.
I scale distributed AI training to thousands of GPUs and deploy neural networks on devices with kilobytes of memory. At ZHAW, during my master thesis I architected PyTorch training across thousands of GPU nodes on Piz Daint and GH200 nodes on ALPS (6th fastest supercomputer globally). I've also shipped ultra-low-power inference on the MAX78002 CNN accelerator, contributing to Maxim's open-source training framework along the way.
Currently at Stadtspital Zürich, I own ML infrastructure end-to-end and develop diagnostic models for ophthalmology—including a novel attention architecture that achieved state-of-the-art performance on diabetic retinopathy segmentation.
Technical Depth
CUDA · Python · PyTorch distributed (DDP/FSDP) · NCCL optimization · quantization-aware training · Linux kernel development · bare-metal embedded C/C++ · NVIDIA H100/GH200/GB200 · Nsight Systems · MPI · OpenMP