Question
10+

Staff SRE Engineer

11/26/2025

The Staff SRE Engineer will shape the reliability, observability, and operational excellence of the platform infrastructure. This role involves designing and maintaining AWS infrastructure, establishing SLIs and SLOs, and driving cost optimization initiatives.

Working Hours

40 hours/week

Company Size

1,001-5,000 employees

Language

English

Visa Sponsorship

No

About The Company
Realtor.com® is the trusted resource for home buyers, sellers and dreamers, offering the most comprehensive source of for-sale properties, among competing national sites, and the information, tools and professional expertise to help people move confidently through every step of their home journey. It pioneered the world of digital real estate 20 years ago, and today helps make all things home simple, efficient and enjoyable. Realtor.com® is operated by News Corp [NASDAQ: NWS, NWSA] [ASX: NWS, NWSLV] subsidiary Move, Inc. under a perpetual license from the National Association of REALTORS®. For more information, visit realtor.com®.
About the Role
<div class="content-intro"><p>Recognized as the No. 1 site trusted by real estate professionals, Realtor.com® has been at the forefront of online real estate for over 25 years, connecting buyers, sellers, and renters with trusted insights and expert guidance to find their perfect home. Through its robust suite of tools, Realtor.com® not only makes a significant impact on the real estate industry at large, but for consumers, navigating the biggest purchase they will make in their life, by providing a user experience that is easy to use, easy to understand, and most of all, easy to make decisions.</p> <p>Join us on our mission to empower more people to find their way home by breaking barriers to entry, making the right connections, and building confidence through expert guidance.</p></div><p>About the Role</p> <p style="text-align: left;"><br>We are seeking a Staff Site Reliability Engineer to join our newly formed Operations Excellence&nbsp;organization, reporting to the Director, Operations Excellence. This foundational role will shape&nbsp;the reliability, observability, and operational excellence of our platform infrastructure serving&nbsp;millions of users. As a Staff SRE, you will be a technical leader and mentor who establishes&nbsp;best practices, drives architectural decisions, and enables our 600+ engineers to deliver&nbsp;exceptional customer experiences.</p> <p><br>You will work on critical platform systems including EKS infrastructure, Skyway (CI/CD),&nbsp;Frontdoor (Tyk API Gateway), Pantheon (Apollo GraphQL Federation), and our observability&nbsp;stack, while establishing chaos engineering practices and driving cost optimization initiatives&nbsp;with measurable ROI.</p> <p><br><strong>What You'll Do</strong></p> <p><strong><br></strong><strong>Platform Reliability &amp; Infrastructure</strong></p> <ul> <li>Design and maintain highly available AWS infrastructure including EKS clusters, Fargate&nbsp;(ECS), and multi-region architectures</li> <li>Own reliability of critical services: Skyway (CI/CD), Frontdoor (Tyk), Pantheon (Apollo&nbsp;GraphQL), and supporting infrastructure</li> <li>Establish SLIs, SLOs, and error budgets for Tier 1/2/3 systems; lead architectural&nbsp;reviews for reliability and cost-efficiency</li> <li>Drive adoption of reliability patterns including circuit breakers, graceful degradation, and<br>automated failover&nbsp;Observability &amp; Cost Optimization</li> <li>Build comprehensive observability using NewRelic for APM, distributed tracing, metrics,&nbsp;and logging for rapid troubleshooting</li> <li>Create actionable dashboards and alerts that reduce MTTD and MTTR; establish&nbsp;observability standards across teams</li> <li>Analyze infrastructure spend and implement FinOps practices including rightsizing,&nbsp;reserved capacity, and resource lifecycle management</li> <li>Drive cost-conscious architecture decisions and optimize CI/CD spend (CircleCI, Argo&nbsp;CD optimization)&nbsp;Chaos Engineering &amp; Incident Response</li> <li>Design chaos engineering experiments to identify system weaknesses; build frameworks&nbsp;for safe production testing</li> <li>Lead game day exercises and disaster recovery simulations; create runbooks and&nbsp;automation for resilience</li> <li>Participate in on-call rotation for critical systems; lead post-incident reviews and drive&nbsp;systemic improvements</li> <li>Mentor engineers on incident response, communication, and escalation; contribute to&nbsp;System Health Scorecard&nbsp;Technical Leadership</li> <li>Serve as technical leader and mentor for the growing Operations Excellence team;&nbsp;establish SRE principles and culture</li> <li>Partner with Platform Engineering, Quality Engineering, and product teams on reliability&nbsp;initiatives</li> <li>Support security initiatives including AWS Secrets Manager migration and compliance&nbsp;requirements (SOC 2, PCI, GDPR)</li> <li>Contribute to Developer Experience metrics and platform adoption goals&nbsp;What You'll Bring&nbsp;Experience &amp; Expertise</li> <li>8+ years in Site Reliability Engineering, DevOps, or Infrastructure Engineering with&nbsp;proven track record improving system reliability</li> <li>Bachelor’s degree or equivalent experience</li> <li>5+ years hands-on experience with AWS (EKS, EC2, RDS, S3, CloudWatch, IAM) and&nbsp;Kubernetes including multi-cluster management</li> <li>Strong programming skills (Python, Go, or Java) with infrastructure automation and&nbsp;Infrastructure as Code experience (Terraform, CloudFormation)</li> <li>Production experience with observability tools (NewRelic, Datadog, Prometheus,&nbsp;Grafana, Splunk) and distributed systems architecture</li> <li>Experience with CI/CD platforms and GitOps workflows (CircleCI, Argo CD, Jenkins);&nbsp;on-call rotation and high-severity incident response</li> <li>Preferred: Chaos engineering tools, API Gateway technologies (Tyk/Kong), GraphQL<br>federation (Apollo), cost optimization initiatives with measurable ROI, FinOps principles</li> </ul> <p><strong>Technical Skills</strong></p> <ul> <li>Cloud &amp; Infrastructure: AWS (EKS, Fargate, Lambda, VPC, Route53, CloudFront),&nbsp;Kubernetes, Docker, Istio Service Mesh</li> <li>CI/CD &amp; GitOps: Argo CD, CircleCI, Jenkins, GitHub Actions</li> <li>Observability: NewRelic - APM, distributed tracing, metrics &amp; logging; Splunk - logging</li> <li>IaC &amp; Automation: Terraform, CloudFormation, Helm, Kustomize, Python/Go/Bash</li> <li>Platform Services: Tyk Gateway, Apollo GraphQL, AWS Secrets Manager, Vault</li> <li>Incident Management: OpsGenie, PagerDuty, ServiceNow&nbsp;Leadership Qualities</li> <li>Excellent communication with ability to explain complex technical concepts to diverse&nbsp;audiences</li> <li>Proven mentorship and collaboration skills across engineering, product, and business&nbsp;teams</li> <li>Self-motivated and autonomous with systems thinking mindset focused on long-term sustainability</li> <li>Data-driven decision making with customer-centric approach and empathy for developer experience</li> </ul><div class="content-conclusion"><p>Do the best work of your life at Realtor.com®</p> <p>Here, you’ll partner with a diverse team of experts as you use leading-edge tech to empower everyone to meet a crucial goal: finding their way home. And you’ll find your way home too. At Realtor.com®, you’ll bring your full self to work as you innovate with speed, serve our consumers, and champion your teammates. In return, we’ll provide you with a warm, welcoming, and inclusive culture; intellectual challenges; and the development opportunities you need to grow.</p> <p>Diversity is important to us, therefore, Realtor.com® is an Equal Opportunity Employer regardless of age, color, national origin, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, marital status, status as a disabled veteran and/or veteran of the Vietnam Era or any other characteristic protected by federal, state or local law. In addition, Realtor.com® will provide reasonable accommodations for otherwise qualified disabled individuals.</p></div>
Key Skills
Site Reliability EngineeringDevOpsInfrastructure EngineeringAWSKubernetesPythonGoJavaTerraformCloudFormationNewRelicCI/CDChaos EngineeringIncident ResponseObservabilityCost Optimization
Categories
TechnologyEngineeringData & AnalyticsSoftwareManagement & Leadership
Apply Now

Please let Realtor.com Careers know you found this job on PrepPal. This helps us grow!

Apply Now
Get Ready for the Interview!

Do you know that we have special program that includes "Interview questions that asked by Realtor.com Careers?"

Elevate your application

Generate a resume, cover letter, or prepare with our AI mock interviewer tailored to this job's requirements.