Experience
tl;dr Love to and will automate your tedious tasks way, experienced with on-device AI, big gpu AI , backend engineering, frontend dev, little bit of design and lots of software engineering.
Currently deep in on-device AI. Running AI models on phone + making them fast = happiness. I’ve been drilling deep into On-Device AI, more specifically, running running LLMs 🦙 on mobile 📱. Extremely interested in GPU programming (NVIDIA / Qualcomm).
- ✅ Writing raw openCL kernels for ops on phone.
- ✅ Profiling cuda models to identify bottlenecks at ops level.
- ✅ Hacking llama.cpp for running our custom models.
- ✅ Add operators to executorch for mobile NPU, running LLMs on NPU.
- ✅ MLC-LLM for mobile GPUs.
Below are some highlights that gave me the most dopamine rush ⚡.
🌍 Product side highlights
- Developed the first version of the training pipeline for Samsung Bixby© assistant's deep learning based intent classifier. Rewrote our research modeling and training code for a 50% speedup in training time.
- Helped with on-device deployment of a part of Samsung Bixby© . Extracting performance with C++ code was some sweet, sweet satisfaction. Do we have a spare 10ms in our inference budget? Yes, sir, we do!
⚡ Optimization adventures ⛰️
- With some trickery and a little additional memory (tradeoff), was able to rewrite a python loop into pure torch code and voila, a 100x speedup! (Blog post coming soon!)
- When doing experiments for the Samsung Bixby© training model, Setup the entire training pipeline in Jenkins so different experiments can have their own artifacts, and is a one-click solution for deploying experiments. Being able to go through all previous experiments, and even pick a run and benchmark it's model was a productivity boost like none other.
- Automated some data filling with less than 100% accuracy (something > nothing) in a process which was otherwise a purely manual task. The mental burden and time taken for the task went down from 1 hour to <5 minutes on average. Mostly because editing some wrong info is way easier than filling it completely manually. No dopamine rush feels higher than the one of people thanking you personally for making their life easier.
Early on in my career at SRA, my day-to-day included tackling cool problems of system design, implementing and scaling machine learning models (still do), developing scalable back end services and a front end that is a pleasure to use and a "lifesaver" (Slack messages to back this up).
Also used to be an intern from August 2017 - June 2018. As a research intern, Gained lots of experience implementing machine learning solutions end to end. Developed a distributed system framework that allows switching Natural Language Processing engines in real time. Also had papers published in reputable NLP conferences like ACL 2018 and COLING 2018.
Worked as a TA for Data Communications, a grad level course for the Computer Science department at FSU.
Served as a TA for Concurrent, Parallel and Distributed Programming, a grad level course for the Computer Science department at FSU.
Developed web applications for the HR department. Also helped maintain the current legacy code running website. Worked on front-end, back-end and server management for the web applications.
Developed an e-commerce website that sells paintings. Developed back-end as well as front-end for the website. Also integrated support for payment using the PayU gateway.
Publications
Accepted at ICLR 2025
In this paper we propose a novel approach for Large Language Model (LLM) compression such that there is no need for recovery finetuning. One of the highlights of my contribution here was implementing part of the model which traded a slightly higher memory usage for a 100x speedup in execution speed.
Achievements
Toastmasters Secretary at Samsung Speaks Toastmasters club. (2019 - 2020)
Won Second Prize at the “ACM Fall 2015 Programming Contest”.
Developed and published a VR (Virtual Reality) game on the Google Play Store. Look for Shiny Bikes VR on the Play Store.
Completed “Linux From Scratch”
Reputed Stack Overflow profile.
Projects
Contributed to open source c++ wrapper for huggingface tokenizers. Also have instructions for cross-compiling them for android.
Contributor for rust library rusty-celery. Helped develop support for redis broker.
Contributing to the awesome python oauthlib library. This is the library on which other python Oauth frameworks and libraries build upon. Great folks there, give them a ⭐️ for their hardwork! Love reading RFCs? Join us!
Developing a generic framework that would allow developers to incorporate voice control for their apps on PC. Also a firefox extension to control the browser. The actual voice engine is a private repo as of now.
This project ports an existing implementation of a character RNN to a variant that can be run in a distributed environment (multiple machines, multiple GPUs). The aim was to demonstrate how to port existing tensorflow implementations to a distributed mode.
Browser extension to browse Reddit with a nice coat of material design. Live on Mozilla Addons Store. Coming soon on Chrome Store (Under review).
Converts images into ASCII art. Creates a model based on a font file and then based on it, creates ASCII art for any input image.
Developed VR version of the old classic tron game. Developed it using Unity and published the android app on the Google Play Store. Check it out at Play Store
Scraps icons from Flaticon.com based on a list of search queries.
Developed an android app that suggests a restaurant based on your location. Entire back-end was developed in Laravel. Was developed as a part of a hackathon (HackFSU'16).
Developed a Peer to Peer File Sharing Program in C++. Created an algorithm for searching any file on the network without any prior knowledge of its existence or location, using keywords. Uses a combination of processes and threads.
Developed a basic MVC framework from scratch for PHP based web applications.
Published a Laravel package on packagist, called “asp/commenter” that allows you to add comments functionality to any page with just 6 lines of code, 3 in background and 3 for frontend display. The app takes care of creating and managing topic threads, comments and replies. Get it from here.