Hello, I'm

Abhinav Garg

LLVM Compiler Engineer | Systems Software Specialist

10+ years building GPU compilers and systems software at AMD, NVIDIA, Qualcomm & Infineon

10+ Years Exp.
4 Companies
9.3 M.E. CGPA

About Me

Results-driven Systems Software Engineer with 10+ years of experience spanning GPU compiler development, massively parallel programming, and embedded systems. Currently at AMD on the ROCm compiler team, delivering LLVM optimizations, new GPU architecture feature support, and a novel register allocator fuzzing system.

Previous experience at NVIDIA driving development of a next-generation GPU assembler and compiler optimizations. Proven ability to deliver production-quality software across semiconductor industry leaders including AMD, NVIDIA, Qualcomm, and Infineon.

Holds an M.E. in Embedded Systems from BITS Pilani (9.3 CGPA).

Technical Skills

Programming

CC++PythonCUDAHIPAssembly

Compilers & Toolchains

LLVMCUDA CompilerROCm/HIPAssemblers

Parallel Programming

CUDAHIPROCmWarp OptimizationOccupancy Tuning

Embedded Systems

RTOSBootloadersDevice DriversJTAG

Dev Tools & Debugging

GitHubVimrocprofrocgdbNsightLauterbach

Assembly Languages

SASSPTXAMD GCN8051

Work Experience

Senior Compiler Engineer

AMD - ROCm Compiler Team

Present

Key contributor to AMD's ROCm LLVM compiler pipeline, focused on performance optimization and feature enablement for next-generation GPU architectures.

  • Develop and land performance optimizations across the LLVM compiler pipeline targeting AMD GPU architectures
  • Currently enabling Global Instruction Selection support, adding new register bank select pass using uniformity analysis
  • Delivered True16 data type support and new wait counter semantics for LDS load/store operations
  • Architected a new fuzzing system for the register allocator, uncovering latent bugs that evaded traditional testing
  • Triaging compiler bugs surfaced through Triton and PyTorch workloads
  • Driving functional parity between ROCm/HIP and CUDA for seamless AI/ML workload portability
  • Lead onboarding and mentorship for new engineers on LLVM pipeline and GPU optimization
  • Delivered a 4-hour technical talk on GPU programming for AMD-University Relations program

CUDA Compiler Developer

NVIDIA - Pune, India

Dec 2019 - Dec 2023

Core contributor to NVIDIA's CUDA compiler team, driving development of a new-generation GPU assembler and delivering performance optimizations.

  • Architected and implemented a new GPU assembler from scratch, enabling support for new ISA
  • Identified and implemented optimizations in the PTX compiler pipeline targeting register allocation
  • Designed key assembler features: live range generation, call-graph construction, and new parsing support
  • Built API support for the Nsight tooling team to extract binary metadata
  • Managed continuous Machine Description integrations from GPU architecture teams
  • Championed test-driven development, significantly improving assembler robustness
  • Authored SRD and SDD documents for all major features

System Software Engineer

Qualcomm - Hyderabad, India

May 2019 - Dec 2019

Member of the Primary Bootloader (PBL) development team for Qualcomm mobile chipsets.

  • Ported and validated PBL firmware across multiple Qualcomm chipset platforms
  • Lauterbach JTAG debugging for diagnosing low-level boot failures at hardware-software boundary
  • Implemented boot media interfaces (SPI, I2C, eMMC, SD card)
  • Developed Windows-native simulation of PBL code to decouple hardware dependencies
  • Participated in bring-up activities for a major new Qualcomm chipset

Senior Application Engineer

Infineon Technologies - Bangalore, India

Jan 2018 - May 2019

Technical bridge between customers and engineering teams, specializing in USB controller firmware for embedded imaging applications.

  • Designed firmware solutions for USB 3.0 and USB 2.0 device controllers
  • Delivered end-to-end USB Video Class (UVC) implementation for 720p/1080p video streaming
  • Authored two high-impact Knowledge Base Articles on CSI MIPI parameters, both incorporated into the official CX3 product datasheet; one translated into Japanese
  • Hardware-level prototyping using logic analyzers, protocol analyzers, and JTAG tooling

Key Projects

Formal Verification

Formal Verification of AMD GPU Backend using Alive2

Leveraged Alive2 (SMT/SAT solver) to perform Machine IR equivalence checks in the AMD GPU backend. Detects miscompilations by proving semantic equivalence between pre- and post-optimization MIR, bringing mathematical correctness guarantees beyond traditional testing.

LLVMSMT SolverAlive2AMD GPU
Compiler

LLVM Analysis & Optimization Pass

Completed an intensive 15-day compiler development training led by IIT/IISc faculty. Implemented analysis and optimization passes in LLVM to develop deep understanding of the framework's IR and pass infrastructure.

LLVMOptimizationIR
Hardware

32-Bit Pipelined MIPS Processor on FPGA

Designed a synthesizable 4-stage pipelined RISC processor in Verilog HDL and deployed it on a Spartan 6 FPGA. Implemented full data hazard detection with stall insertion and forwarding logic.

VerilogFPGAMIPSPipeline
Kernel

True Random Number Generator - Linux Kernel Module

Built a loadable kernel module that harvests CPU frequency jitter as an entropy source for true random number generation. Exposed user-space access via IOCTL system calls.

Linux KernelCIOCTL

Education

2018

M.E. - Embedded Systems

BITS Pilani

CGPA: 9.3 / 10

2015

B.Tech - Electronics & Communication Engineering

HBTU Kanpur

Achievements

2022

Appreciation from NVIDIA Nsight tools team for exceptional API support delivery

2018

Winner - Apogee 2018 (BITS Pilani Annual Technical Festival)

2017

Winner - Cadence India Design Contest 2017

2012

Winner - Digital Circuit Designing Contest, AVISHKAR 2K12, MNNIT Allahabad

2011

Ranked in top 0.4% in the Uttar Pradesh State Entrance Examination

Let's Connect

Open to consulting opportunities, technical collaborations, and interesting compiler challenges.