Currently working
as Staff Machine Learning Engineer at Samsung Research America.
(August 2018 to
current
)
tl;dr Love to and will automate your tedious tasks way, experienced with on-device AI, big gpu AI , backend engineering, frontend dev, little bit of design and lots of software engineering.
Currently deep in on-device AI. Running AI models on phone + making them fast = happiness. I've been drilling deep into On-Device AI, more specifically, running running LLMs 🦙 on mobile 📱. Extremely interested in GPU programming (NVIDIA / Qualcomm).
- ✅ Writing raw openCL kernels for ops on phone.
- ✅ Profiling cuda models to identify bottlenecks at ops level.
- ✅ Hacking llama.cpp for running our custom models.
- ✅ Add operators to executorch for mobile NPU, running LLMs on NPU.
- ✅ MLC-LLM for mobile GPUs.
Below are some highlights that gave me the most dopamine rush ⚡.
🌍 Product side highlights
- Developed the first version of the training pipeline for Samsung Bixby© assistant's deep learning based intent classifier. Rewrote our research modeling and training code for a 50% speedup in training time.
- Helped with on-device deployment of a part of Samsung Bixby© . Extracting performance with C++ code was some sweet, sweet satisfaction. Do we have a spare 10ms in our inference budget? Yes, sir, we do!
⚡ Optimization adventures ⛰️
- With some trickery and a little additional memory (tradeoff), was able to rewrite a python loop into pure torch code and voila, a 100x speedup! (Blog post coming soon!)
- When doing experiments for the Samsung Bixby© training model, Setup the entire training pipeline in Jenkins so different experiments can have their own artifacts, and is a one-click solution for deploying experiments. Being able to go through all previous experiments, and even pick a run and benchmark it's model was a productivity boost like none other.
- Automated some data filling with less than 100% accuracy (something > nothing) in a process which was otherwise a purely manual task. The mental burden and time taken for the task went down from 1 hour to <5 minutes on average. Mostly because editing some wrong info is way easier than filling it completely manually. No dopamine rush feels higher than the one of people thanking you personally for making their life easier.
Early on in my career at SRA, my day-to-day included tackling cool problems of system design, implementing and scaling machine learning models (still do), developing scalable back end services and a front end that is a pleasure to use and a "lifesaver" (Slack messages to back this up).
Also used to be an intern from August 2017 - June 2018. As a research intern, Gained lots of experience implementing machine learning solutions end to end. Developed a distributed system framework that allows switching Natural Language Processing engines in real time. Also had papers published in reputable NLP conferences like ACL 2018 and COLING 2018.