FULL_TIME
5-10
Principal Software Engineer - Azure AI Inferencing
11/23/2025
Lead the design and implementation of core inference infrastructure for serving frontier AI models in production. Scale the platform to support the growing inferencing demand and maintain high availability.
Working Hours
40 hours/week
Company Size
10,001+ employees
Language
English
Visa Sponsorship
No
About The Company
Every company has a mission. What's ours? To empower every person and every organization to achieve more. We believe technology can and should be a force for good and that meaningful innovation contributes to a brighter world in the future and today. Our culture doesn’t just encourage curiosity; it embraces it. Each day we make progress together by showing up as our authentic selves. We show up with a learn-it-all mentality. We show up cheering on others, knowing their success doesn't diminish our own. We show up every day open to learning our own biases, changing our behavior, and inviting in differences. Because impact matters.
Microsoft operates in 190 countries and is made up of approximately 228,000 passionate employees worldwide.
About the Role
Lead the design and implementation of core inference infrastructure for serving frontier AI models in production. Identify and drive improvements to end-to-end inference performance and efficiency of OpenAI and other state-of-the-art LLMs. Lead the design and implementation of efficient load scheduling and balancing strategies, by leveraging key insights and features of the model and workload. Scale the platform to support the growing inferencing demand and maintain high availability. Deliver critical capabilities required to serve the latest and greatest Gen AI models such as GPT5, Realtime audio, Sora, and enable fast time to market for them. Collaborate with our partners both internal and external. Mentor engineers on distributed inference best practices. Bachelor's degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Golang OR equivalent experience. These requirements include, but are not limited to the following specialized security screenings: 4+ years' practical experience working on high scale, reliable online systems. Technical background and foundation in software engineering principles, distributed computing and architecture. Experience in real-time online services with low latency and high throughput. Experience working with L7 network proxies and gateways. Knowledge in Network architecture and concepts (HTTP and TCP Protocols, Authentication and Sessions etc). Knowledge and experience in OSS, Docker, Kubernetes, C++, Golang, or equivalent programming languages. Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers. Ability to independently lead projects.
Key Skills
CC++C#JavaGolangDistributed ComputingSoftware EngineeringReal-Time ServicesLow LatencyHigh ThroughputNetwork ArchitectureHTTP ProtocolsTCP ProtocolsDockerKubernetesCross-Team Collaboration
Categories
TechnologyEngineeringSoftware
Apply Now
Please let Microsoft know you found this job on PrepPal. This helps us grow!
Get Ready for the Interview!
Do you know that we have special program that includes "Interview questions that asked by Microsoft?"
Elevate your application
Generate a resume, cover letter, or prepare with our AI mock interviewer tailored to this job's requirements.