FULL_TIME
10+
Principle Software Engineer
11/22/2025
Lead the bring-up and functional validation of LLMs on custom AI accelerators and GPUs. Collaborate with teams across Azure ML, DeepSpeed, and Maia hardware programs to deliver production-grade AI infrastructure.
Working Hours
40 hours/week
Company Size
10,001+ employees
Language
English
Visa Sponsorship
No
About The Company
Every company has a mission. What's ours? To empower every person and every organization to achieve more. We believe technology can and should be a force for good and that meaningful innovation contributes to a brighter world in the future and today. Our culture doesn’t just encourage curiosity; it embraces it. Each day we make progress together by showing up as our authentic selves. We show up with a learn-it-all mentality. We show up cheering on others, knowing their success doesn't diminish our own. We show up every day open to learning our own biases, changing our behavior, and inviting in differences. Because impact matters.
Microsoft operates in 190 countries and is made up of approximately 228,000 passionate employees worldwide.
About the Role
Model Bring-Up & Characterization Lead the bring-up and functional validation of LLMs on custom AI accelerators and GPUs. Develop and maintain detailed performance characterizations across compute, memory, and interconnect domains. Instrument and profile end-to-end training and inference workloads to identify scaling inefficiencies and performance gaps. Hardware/Software/Model Co-Design Partner with silicon and system architects, compiler/runtime engineers, and model researchers to define co-design strategies that maximize efficiency and utilization. Drive studies and experiments across quantization formats, tensor parallelism, activation checkpointing, memory layouts, and communication topologies. Performance Optimization Analyze kernel- and system-level traces to identify limiting factors in compute, memory, and interconnect. Propose and implement optimizations in scheduling, fusion, and data movement to improve throughput and power efficiency. Guide runtime and compiler improvements informed by workload analysis. Cross-Functional Leadership Collaborate with teams across Azure ML, DeepSpeed, and Maia hardware programs to deliver production-grade AI infrastructure. Present architectural findings and recommendations to senior engineering leadership. Mentor and technically guide engineers working in performance, compiler, and system bring-up domains. Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. 10+ years of experience in AI systems, hardware/software co-design, or performance engineering. Deep understanding of AI accelerator and GPU architectures, including compute pipelines, memory hierarchies, and interconnects. Proficiency with PyTorch, CUDA, Triton, or similar frameworks for performance tuning and kernel development. Proven track record of cross-disciplinary collaboration between hardware, software, and ML model teams. Experience profiling and optimizing large-scale distributed AI workloads. Familiarity with DeepSpeed, Megatron-LM, SGLang, or vLLM training and inference pipelines. Deep understanding of transformer-based model architectures and scaling behaviors. Hands-on experience with AI performance modeling, benchmarking, or workload simulation. Demonstrated technical leadership and communication skills in highly collaborative environments.
Key Skills
Software EngineeringAI SystemsPerformance EngineeringHardware/Software Co-DesignPerformance OptimizationCross-Functional CollaborationCC++PythonPyTorchCUDATritonDistributed AI WorkloadsTransformer-Based ModelsBenchmarkingWorkload Simulation
Categories
TechnologyEngineeringSoftwareData & AnalyticsConsulting
Apply Now
Please let Microsoft know you found this job on PrepPal. This helps us grow!
Get Ready for the Interview!
Do you know that we have special program that includes "Interview questions that asked by Microsoft?"
Elevate your application
Generate a resume, cover letter, or prepare with our AI mock interviewer tailored to this job's requirements.