Senior Linux & Infrastructure IT Engineer

2/7/2026

Operate and scale hybrid AWS and on-prem Linux compute infrastructure for chip design and verification workloads. Own day-to-day reliability, performance tuning, capacity planning, and incident response.

Working Hours

40 hours/week

Company Size

51-200 employees

Language

English

Visa Sponsorship

About The Company

AI is transforming industries at an unprecedented scale - but today’s data center infrastructure wasn’t built to keep up. As AI workloads grow more complex and data volumes double every two years, connectivity - not compute - has become the bottleneck. Retym is solving this challenge by delivering next-generation Coherent DSP solutions that provide high-performance, low-latency connectivity for AI infrastructure and data center interconnects. We are a semiconductor company driven by innovation, bringing together a world-class team of chip designers, optical networking experts, and leading investors to rethink how data moves in the AI era. Our purpose-built DSP technology delivers: Scalable, high-bandwidth interconnects for AI-driven data centers Power-efficient, high-performance networking that removes bottlenecks A coherent DSP provider that gives module makers more control and builds toward a more open, vibrant ecosystem With hyperscalers deploying AI across multiple locations and AI infrastructure requirements rapidly evolving, Retym is building the connectivity backbone for the future of AI. The future of AI isn’t just about compute - it’s about how we move data. Together, we are building a novel semiconductor technology that will transform the datacenter and telecommunications industries.

About the Role

About the Role

We are a fast-growing semiconductor startup building next-generation silicon. Our design and verification pipelines rely on large-scale Linux compute infrastructure spanning AWS and on-prem environments.

We are seeking a senior, hands-on Cloud & Infrastructure IT Engineer to own the reliability, performance, and automation of our mission-critical EDA platforms. You will work directly with chip design teams to ensure our compute environments are fast, stable, secure, and ready to scale.

Requirements

What You’ll Do

Operate and scale hybrid AWS + on-prem Linux compute infrastructure for chip design and verification workloads.
Own day-to-day reliability, performance tuning, capacity planning, and incident response.
Build and maintain AWS environments using Terraform and Ansible.
Automate provisioning of VPCs, IAM, EC2, FSx, EBS, S3, VPNs, and security controls.
Tune Linux systems for CPU-, memory-, and I/O-intensive EDA workloads.
Operate and optimize grid / job scheduling platforms such as Slurm, LSF, or Grid Engine.
Design and manage high-throughput storage solutions for simulation pipelines.
Develop automation and self-service tooling using Python and Bash.
Implement observability and alerting using Prometheus and Grafana.
Participate in on-call rotation and lead root-cause analysis for production incidents.

Required Qualifications

AWS: VPC, EC2, IAM, FSx, EBS, S3, VPN, security controls
Infrastructure as Code: Terraform, Ansible
Linux / HPC: Kernel, filesystem, and network performance tuning
Schedulers: Slurm / LSF / Grid Engine
Automation: Python, Bash
Observability: Prometheus, Grafana
CI/CD: GitHub Actions / GitLab CI

Requirements

7+ years of hands-on experience operating large-scale Linux infrastructure.
Strong experience managing AWS production environments.
Advanced proficiency with Terraform, Ansible, Python, and Bash.
Deep understanding of networking, storage, and Linux internals.
Comfortable owning business-critical systems in a fast-moving startup.
Experience supporting semiconductor / EDA / HPC workloads.

Preferred

Exposure to Azure or GCP.
Experience with cloud cost optimization / FinOps.

Key Skills

AWSTerraformAnsibleLinuxHPCSlurmLSFGrid EnginePythonBashPrometheusGrafanaGitHub ActionsGitLab CINetworkingStorage