Senior DevOps Engineer (remote) - AWS Cloud Hosting Platform

12/1/2025

The Senior DevOps Engineer will own production reliability on AWS, focusing on availability, latency, and incident response. They will also architect scalable infrastructure and improve CI/CD pipelines while ensuring strong observability and security practices.

Working Hours

40 hours/week

Company Size

51-200 employees

Language

English

Visa Sponsorship

About The Company

Platinumlist is the leading entertainment discovery and monetisation platform, and the premier ticketing platform in the Middle East, providing seamless booking solutions for events, attractions and experiences in the UAE, Saudi Arabia, Bahrain, Qatar, Oman, Kuwait and beyond. We empower organisers with advanced ticketing technology, marketing tools, and data insights to maximize reach and revenue. From live entertainment to travel experiences, Platinumlist connects customers with top events while ensuring a smooth and secure booking process. Our platform simplifies ticket sales, access control, and audience engagement.

About the Role

About Us: Platinumlist.net, a pioneering leader in the online event guide and ticketing solution industry, has been revolutionizing the event landscape in the Gulf region since 2009. As the largest ticketing provider in the GCC, we proudly serve an extensive array of events across the United Arab Emirates, Saudi Arabia, Oman, Bahrain, Qatar, and Kuwait from our Dubai-based headquarters.

About the Role: We’re looking for a Senior DevOps / SRE Engineer to own and evolve our AWS infrastructure with a strong focus on reliability, scalability, performance under peak load, and safe delivery of new AWS capabilities. You’ll partner with engineering teams to ensure our platform stays fast and resilient during traffic spikes while continuously improving automation, observability, security, and cost efficiency.

Key Responsibilities:

Own production reliability on AWS: availability, latency, throughput, capacity, and incident response.
Architect and operate scalable infrastructure (multi-AZ as a baseline; DR strategy and regular testing).
Build and maintain Infrastructure as Code (Terraform / CloudFormation / CDK) and Git-based workflows.
Improve CI/CD pipelines and deployment strategies (blue/green, canary, progressive delivery).
Implement strong observability: metrics, logs, traces, alerting, dashboards; define SLO/SLI and reduce noise.
Own database operations on AWS (Aurora/RDS MySQL): backups/restores (including restore drills), read replicas, performance troubleshooting, and capacity planning.
Improve caching and traffic handling (CDN, Redis/ElastiCache, queues) to sustain peak demand.
Harden security posture: IAM least privilege, secrets management, patching, WAF, audit trails.
Drive adoption of relevant AWS managed services (where it increases reliability and reduces ops burden).
Drive cloud cost efficiency (FinOps): cost visibility, tagging, budgets/alerts, rightsizing, and smart usage of AWS pricing models without compromising reliability.
Lead post-incident reviews (RCA, corrective actions, prevention), and ensure improvements are implemented and verified.

10+ years of experience in similar role.
Strong hands-on AWS in production (typical stack: VPC, IAM, EC2, ALB/NLB, Auto Scaling, S3, CloudFront, Route53, CloudWatch/CloudTrail, WAF; plus Aurora/RDS).
Proven experience designing/operating high-load web systems with strict uptime requirements.
IaC and automation mindset (Terraform/CloudFormation/CDK + scripting Bash/Python).
Production MySQL on AWS (Aurora/RDS): backups & restores (including restore drills), read replicas, monitoring, and performance troubleshooting.
Ability to troubleshoot production web stacks (Nginx + PHP-FPM) and identify bottlenecks across app ↔ DB ↔ infrastructure.
Containers and deployment automation (ECS/EKS, Docker; understanding of scaling and rollout patterns).
Solid Linux + networking fundamentals (DNS, TLS, routing, LB, troubleshooting).
Observability practices and incident management experience.
Must be reachable for critical production incidents; occasional after-hours support may be required (critical-only).

Nice-to-have:

PHP ecosystem familiarity (PHP-FPM/Nginx, Composer; Laravel/Symfony is a plus).
MySQL internals/performance tuning and advanced replication/proxying (e.g., ProxySQL).
Serverless & event-driven AWS (Lambda, SQS/SNS, EventBridge, Step Functions).
Security & compliance frameworks; chaos testing/load testing.

Competitive salary.
Remote-friendly work setup.
A chance to make a real impact in a fast-growing market.

Space to grow, experiment, and push boundaries.

Key Skills

AWSInfrastructure As CodeTerraformCloudFormationCI/CDObservabilityDatabase OperationsMySQLCachingSecurityCost EfficiencyIncident ManagementContainersLinuxNetworkingPHP