Hello, I'm
LLVM Compiler Engineer | Systems Software Specialist
10+ years building GPU compilers and systems software at AMD, NVIDIA, Qualcomm & Infineon
Results-driven Systems Software Engineer with 10+ years of experience spanning GPU compiler development, massively parallel programming, and embedded systems. Currently at AMD on the ROCm compiler team, delivering LLVM optimizations, new GPU architecture feature support, and a novel register allocator fuzzing system.
Previous experience at NVIDIA driving development of a next-generation GPU assembler and compiler optimizations. Proven ability to deliver production-quality software across semiconductor industry leaders including AMD, NVIDIA, Qualcomm, and Infineon.
Holds an M.E. in Embedded Systems from BITS Pilani (9.3 CGPA).
AMD - ROCm Compiler Team
Key contributor to AMD's ROCm LLVM compiler pipeline, focused on performance optimization and feature enablement for next-generation GPU architectures.
NVIDIA - Pune, India
Core contributor to NVIDIA's CUDA compiler team, driving development of a new-generation GPU assembler and delivering performance optimizations.
Qualcomm - Hyderabad, India
Member of the Primary Bootloader (PBL) development team for Qualcomm mobile chipsets.
Infineon Technologies - Bangalore, India
Technical bridge between customers and engineering teams, specializing in USB controller firmware for embedded imaging applications.
Leveraged Alive2 (SMT/SAT solver) to perform Machine IR equivalence checks in the AMD GPU backend. Detects miscompilations by proving semantic equivalence between pre- and post-optimization MIR, bringing mathematical correctness guarantees beyond traditional testing.
Completed an intensive 15-day compiler development training led by IIT/IISc faculty. Implemented analysis and optimization passes in LLVM to develop deep understanding of the framework's IR and pass infrastructure.
Designed a synthesizable 4-stage pipelined RISC processor in Verilog HDL and deployed it on a Spartan 6 FPGA. Implemented full data hazard detection with stall insertion and forwarding logic.
Built a loadable kernel module that harvests CPU frequency jitter as an entropy source for true random number generation. Exposed user-space access via IOCTL system calls.
BITS Pilani
CGPA: 9.3 / 10
HBTU Kanpur
Appreciation from NVIDIA Nsight tools team for exceptional API support delivery
Winner - Apogee 2018 (BITS Pilani Annual Technical Festival)
Winner - Cadence India Design Contest 2017
Winner - Digital Circuit Designing Contest, AVISHKAR 2K12, MNNIT Allahabad
Ranked in top 0.4% in the Uttar Pradesh State Entrance Examination
Open to consulting opportunities, technical collaborations, and interesting compiler challenges.