Senior ML Engineer - Sovereign AI Engineering
4/19/2026
You will architect and operate the ML platform, including distributed training pipelines, model serving, and experiment tracking. You are responsible for ensuring the reliability, scalability, and observability of production-grade intelligence across cloud and air-gapped environments.
Working Hours
40 hours/week
Company Size
51-200 employees
Language
English
Visa Sponsorship
No
At Dream, we redefine cyber defense vision by combining AI and human expertise to create products that protect nations and critical infrastructure. This is more than a job; it’s a Dream job. Dream is where we tackle real-world challenges, redefine AI and security, and make the digital world safer. Let’s build something extraordinary together.
Dream's AI cybersecurity platform applies a new, out-of-the-ordinary, multi-layered approach, covering endless and evolving security challenges across the entire infrastructure of the most critical and sensitive networks. Built as part of a broader sovereign AI platform, our technology is designed to operate in on-premise, private cloud, and air-gapped environments, enabling nations to maintain full control over their data, infrastructure, and AI capabilities. Central to Dream's proprietary Cyber Language Models are innovative technologies that provide contextual intelligence for the future of cybersecurity.
At Dream, our talented team, driven by passion, expertise, and innovative minds, inspires us daily. We are not just dreamers, we are dream-makers.
The Dream Job
It starts with you - an engineer driven to build the ML platform that turns research into reliable, production-grade intelligence. You care about reproducibility, low-friction experimentation, and infrastructure that earns the trust of the scientists and researchers who depend on it daily. You'll architect and ship Dream's ML platform - training pipelines, model serving, feature stores, experiment tracking, and compute orchestration - turning models into production capabilities across cloud and on-prem, including air-gapped deployments. A significant part of the platform supports large language models, with unique challenges across training, evaluation, and inference in mission-critical environments.
If you want to make a meaningful impact, join Dream's mission and build the ML platform that drives Sovereign AI products - this role is for you.
The Dream-Maker Responsibilities
- Build and operate ML training infrastructure - distributed training pipelines, compute scheduling, and reproducible experiment workflows that data scientists rely on daily.
- Own model serving and inference systems - packaging, deployment, autoscaling, A/B testing, canary rollouts, and latency/cost optimization for production models.
- Run feature stores, model registries, and dataset versioning - enabling self-serve feature engineering, model lineage, and reproducible experiments across teams.
- Build experiment tracking and evaluation infrastructure - automated evals, comparison dashboards, drift detection, and monitoring that give teams visibility into model behavior and performance.
- Build and maintain production pipelines for training, fine-tuning workflows, and serving domain models - owning reliability, reproducibility, and scale.
- Build and maintain the monitoring and observability layer - model performance tracking, data and prediction drift detection, data quality validation, and alerting.
- Improve performance and cost across the ML stack - training throughput, inference latency, batch vs. real-time tradeoffs, and compute cost management.
- Ship shared tooling - libraries, templates, CI/CD for models, IaC, and runbooks - while collaborating across Data Platform, AI, Data Science, Engineering, and DevOps. Own architecture, documentation, and operations end-to-end.
The Dream Skill Set
- 5+ years in software engineering, with 2+ years focused on ML infrastructure, MLOps, or data-intensive systems
- Engineering craft - Strong Python, distributed systems design, testing, secure coding, API design, CI/CD discipline, and production ownership.
- ML platform & serving - Model serving frameworks (e.g., Triton, TorchServe, vLLM, Ray Serve); model packaging, deployment pipelines, and inference optimization
- Training infrastructure - Distributed training pipelines (e.g., frameworks like PyTorch, JAX) experiment orchestration and reproducibility
- ML lifecycle tooling - Feature stores, model registries, experiment tracking (e.g., MLflow, Weights & Biases); dataset versioning and lineage
- Data pipelines - Building training and inference data pipelines; familiarity with tools like Spark, Airflow/Dagster, and streaming ingestion
- Comfortable with AI coding tools like Cursor, Claude Code, or Copilot
Nice to Have:
- Experience operating in constrained environments - on-premise, private cloud, or air-gapped deployments
- Hands-on experience with simulation environments, synthetic data generation, or reinforcement learning workflows
- Platform & infra - Kubernetes, AWS, Terraform or similar IaC, CI/CD, observability, incident response
- Hands-on data science or applied ML experience
Never Stop Dreaming...
If you think this role doesn't fully match your skills but are eager to grow and break glass ceilings, we’d love to hear from you!
Requirements
nullPlease let Dream know you found this job on InterviewPal. This helps us grow!
We scan and aggregate real interview questions reported by candidates across thousands of companies. This role already has a tailored question set waiting for you.
Generate a resume, cover letter, or prepare with our AI mock interviewer tailored to this job's requirements.