Abhishek Patel

Experience

Apple

Currently working as Sr. Machine Learning Engineer, Compilers & Frameworks at Apple. (March 2025 to current )

tl;dr Love to and will automate your tedious tasks way, experienced with on-device AI, big gpu AI , backend engineering, frontend dev, little bit of design and lots of software engineering.

Currently working on bringing all the machine learning models of the world to Apple hardware!

Extremely interested in GPU programming in general (NVIDIA / Qualcomm / Metal).

A brief run-down of the problems we are tackling is

⚡Running large models fast on Apple hardware.
🏗 Building support for importing machine learning models into the Apple ecosystem.

Samsung Research America

Worked as Staff Machine Learning Engineer at Samsung Research America. (August 2018 to March 2025 )

tl;dr Worked deeply on-device AI. Running AI models on phone + making them fast = happiness. I’ve been drilling deep into On-Device AI, more specifically, running running LLMs 🦙 on mobile 📱. Extremely interested in GPU programming (NVIDIA / Qualcomm).

✅ Writing raw openCL kernels for ops on phone.
✅ Profiling cuda models to identify bottlenecks at ops level.
✅ Hacking llama.cpp for running our custom models.
✅ Add operators to executorch for mobile NPU, running LLMs on NPU.
✅ MLC-LLM for mobile GPUs.

Below are some highlights that gave me the most dopamine rush ⚡.

🌍 Product side highlights

Developed the first version of the training pipeline for Samsung Bixby© assistant's deep learning based intent classifier. Rewrote our research modeling and training code for a 50% speedup in training time.
Helped with on-device deployment of a part of Samsung Bixby© . Extracting performance with C++ code was some sweet, sweet satisfaction. Do we have a spare 10ms in our inference budget? Yes, sir, we do!

⚡ Optimization adventures ⛰️

With some trickery and a little additional memory (tradeoff), was able to rewrite a python loop into pure torch code and voila, a 100x speedup! (Blog post coming soon!)
When doing experiments for the Samsung Bixby© training model, Setup the entire training pipeline in Jenkins so different experiments can have their own artifacts, and is a one-click solution for deploying experiments. Being able to go through all previous experiments, and even pick a run and benchmark it's model was a productivity boost like none other.
Automated some data filling with less than 100% accuracy (something > nothing) in a process which was otherwise a purely manual task. The mental burden and time taken for the task went down from 1 hour to <5 minutes on average. Mostly because editing some wrong info is way easier than filling it completely manually. No dopamine rush feels higher than the one of people thanking you personally for making their life easier.

Early on in my career at SRA, my day-to-day included tackling cool problems of system design, implementing and scaling machine learning models (still do), developing scalable back end services and a front end that is a pleasure to use and a "lifesaver" (Slack messages to back this up).

Also used to be an intern from August 2017 - June 2018. As a research intern, Gained lots of experience implementing machine learning solutions end to end. Developed a distributed system framework that allows switching Natural Language Processing engines in real time. Also had papers published in reputable NLP conferences like ACL 2018 and COLING 2018.

Florida State University, Computer Science Department

Worked as Teaching Assistant at Florida State University, Computer Science Department. (January 2017 to May 2017 )

Worked as a TA for Data Communications, a grad level course for the Computer Science department at FSU.

Florida State University, Computer Science Department

Worked as Teaching Assistant at Florida State University, Computer Science Department. (September 2016 to December 2016 )

Served as a TA for Concurrent, Parallel and Distributed Programming, a grad level course for the Computer Science department at FSU.

Florida State University, Human Resources

Worked as Web Application Developer at Florida State University, Human Resources. (September 2015 to August 2016 )

Developed web applications for the HR department. Also helped maintain the current legacy code running website. Worked on front-end, back-end and server management for the web applications.

Inkrasa

Worked as an Intern at Inkrasa. (January 2015 to May 2015 )

Developed an e-commerce website that sells paintings. Developed back-end as well as front-end for the website. Also integrated support for payment using the PayU gateway.

Publications

MoDeGPT: Modular Decomposition for Large Language Model Compression - C.Lin, S.Gao, J. Smith, A. Patel, S. Tuli, Y. Shen, H. Jin, Y. Hsu

Published

Paper Link

Accepted at ICLR 2025

In this paper we propose a novel approach for Large Language Model (LLM) compression such that there is no need for recovery finetuning. One of the highlights of my contribution here was implementing part of the model which traded a slightly higher memory usage for a 100x speedup in execution speed.

A New Concept of Deep Reinforcement Learning based Augmented General Tagging System - Y. Wang, A. Patel, Y.Shen, H. Jin

Published

Accepted at COLING 2018

In this paper, a new deep reinforcement learning(DRL) based coaching model (DCM) for slot filling in SLU tasks is proposed. Worked on the core algorithm development, and did the whole implementation including putting all the models behind an API and the android chat app to interact with the QA agent.

CRUISE: Cold-Start New Skill Development via Iterative Utterance Generation. - Y. Shen, A.Ray, A. Patel, H. Jin

Published

Paper Link

Accepted at ACL 2018

In this paper, We present a system, CRUISE, that guides developers to build a high quality natural language understanding (NLU) engine from scratch without having to manually generate and annotate a large number of utterances. instead, we design a hybrid rule based and data-driven approach with the capability to iteratively generate more and more utterances Developed a common framework for NLU engines, and back end as well as the front end of the system.

Achievements

Toastmasters Secretary at Samsung Speaks Toastmasters club. (2019 - 2020)

Won Second Prize at the “ACM Fall 2015 Programming Contest”.

Developed and published a VR (Virtual Reality) game on the Google Play Store. Look for Shiny Bikes VR on the Play Store.

Completed “Linux From Scratch”

Reputed Stack Overflow profile.

Projects

tokenizers-cpp: Huggingface tokenizers for C++

More info

Contributed to open source c++ wrapper for huggingface tokenizers. Also have instructions for cross-compiling them for android.

rusty-celery

More info

Contributor for rust library rusty-celery. Helped develop support for redis broker.

oauthlib

More info

Contributing to the awesome python oauthlib library. This is the library on which other python Oauth frameworks and libraries build upon. Great folks there, give them a ⭐️ for their hardwork! Love reading RFCs? Join us!

Voice Control for PC

More info

Developing a generic framework that would allow developers to incorporate voice control for their apps on PC. Also a firefox extension to control the browser. The actual voice engine is a private repo as of now.

Distributed training for RNN

More info

This project ports an existing implementation of a character RNN to a variant that can be run in a distributed environment (multiple machines, multiple GPUs). The aim was to demonstrate how to port existing tensorflow implementations to a distributed mode.

Neural Net controlled Ants

More info

Developed neural networks that learns in an unsupervised manner to find a path towards food. It employs genetic algorithms to learn to find a path. No external libraries were used. Check out the live demo here.

Rellit

More info

Browser extension to browse Reddit with a nice coat of material design. Live on Mozilla Addons Store. Coming soon on Chrome Store (Under review).

RetroScII

More info

Converts images into ASCII art. Creates a model based on a font file and then based on it, creates ASCII art for any input image.

Tron for VR

More info

Developed VR version of the old classic tron game. Developed it using Unity and published the android app on the Google Play Store. Check it out at Play Store

Flaticon scraper

More info

Scraps icons from Flaticon.com based on a list of search queries.

Hungrify!

More info

Developed an android app that suggests a restaurant based on your location. Entire back-end was developed in Laravel. Was developed as a part of a hackathon (HackFSU'16).

P2P File Sharing

Developed a Peer to Peer File Sharing Program in C++. Created an algorithm for searching any file on the network without any prior knowledge of its existence or location, using keywords. Uses a combination of processes and threads.

MVC framework from PHP

Developed a basic MVC framework from scratch for PHP based web applications.

lolpython

More info

Prints rainbow text on the terminal. An enhanced python port of the classic lolcat. Also available as a python package from PyPI.

Comments Package for PHP

More info

Published a Laravel package on packagist, called “asp/commenter” that allows you to add comments functionality to any page with just 6 lines of code, 3 in background and 3 for frontend display. The app takes care of creating and managing topic threads, comments and replies. Get it from here.