Projects

A selection of things I've built.

ML Data Platformat Apple

2022

Designed and built a data management platform for creating, versioning, and governing ML datasets at scale. Includes a large-scale ingestion service handling 100+ TB and 100M assets per job, with optimizations that improved data load performance by 10x.

ML InfrastructureSDK & APIsPySparkApache IcebergGolangAWS S3Kubernetes

Pre-Training Data Infrastructureat Luma AI

2025

Co-built the data processing infrastructure for foundation model pre-training across 3,000 GPUs. Developed an internal library for custom data processors and a multithreaded data loader achieving sub-20ms per-batch loading for multimodal datasets.

ML InfrastructureSystemsPythonRayLancePyTorch

Windows Kernel Driver Subsystemat Microsoft

2019

Contributed to the Driver Plug and Play Subsystem in the Windows Kernel, handling driver installations, device-to-driver matching, driver upgrades, and device migration across OS upgrades. Built a new PnP diagnostics module to improve debugging and observability of driver installation.

SystemsC/C++Windows Kernel

Mastering OpenCV Android Application Programming

2015

Co-authored a book on building computer vision applications for Android using OpenCV, published by Packt Publishing.

BookOpenCVAndroidJava