FULL_TIME
5-10
Senior Software Engineer
11/27/2025
Design and develop networking solutions for large-scale AI training infrastructure. Debug and resolve complex networking issues while enhancing scalability and reliability of systems.
Working Hours
40 hours/week
Company Size
10,001+ employees
Language
English
Visa Sponsorship
No
About The Company
Every company has a mission. What's ours? To empower every person and every organization to achieve more. We believe technology can and should be a force for good and that meaningful innovation contributes to a brighter world in the future and today. Our culture doesn’t just encourage curiosity; it embraces it. Each day we make progress together by showing up as our authentic selves. We show up with a learn-it-all mentality. We show up cheering on others, knowing their success doesn't diminish our own. We show up every day open to learning our own biases, changing our behavior, and inviting in differences. Because impact matters.
Microsoft operates in 190 countries and is made up of approximately 228,000 passionate employees worldwide.
About the Role
Design, develop, and optimize networking solutions tailored for large-scale AI training infrastructure. Architect and implement high-performance, low-latency, and low-jitter communication frameworks for distributed systems. Benchmark, analyze, and enhance the scalability and reliability of networking systems to handle petabyte-scale data transfer. Debug and resolve complex networking issues in large-scale, high-performance environments. Drive identification of dependencies and the development of design documents for a product, application, service, or platform. Create, implement, optimize, debug, refactor, and reuse code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI). Act as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate. Proactively seeks new knowledge and adapts to new AI trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale. Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, OR Java, JavaScript, or Python 1+ years Networking OR High Performance Computing experience. Bachelor's Degree in Computer Science OR related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, OR Python OR Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. Familiarity with Machine Learning, AI Infrastructure, Operating Systems fundamentals and virtualization technologies 1+ years experience on Distributed Systems 1+ years experience on High Performance Computing / Machine Learning middleware
Key Skills
NetworkingAI InfrastructureHigh Performance ComputingDistributed SystemsMachine LearningCC++C#JavaJavaScriptPythonDebuggingScalabilityReliabilityPerformance OptimizationObservability
Categories
TechnologyEngineeringSoftwareData & Analytics
Apply Now
Please let Microsoft know you found this job on PrepPal. This helps us grow!
Get Ready for the Interview!
Do you know that we have special program that includes "Interview questions that asked by Microsoft?"
Elevate your application
Generate a resume, cover letter, or prepare with our AI mock interviewer tailored to this job's requirements.