Senior ML Engineer - Sovereign AI Engineering

4/19/2026

You will architect and operate the ML platform, including distributed training pipelines, model serving, and experiment tracking. You are responsible for ensuring the reliability, scalability, and observability of production-grade intelligence across cloud and air-gapped environments.

Working Hours

40 hours/week

Company Size

51-200 employees

Language

English

Visa Sponsorship

About The Company

Dream is a pioneering AI cybersecurity company delivering revolutionary defense through artificial intelligence. Our proprietary AI platform creates a unified security system safeguarding assets against existing and emerging generative cyber threats. Dream's advanced AI automates discovery, calculates risks, performs real-time threat detection, and plans an automated response. With a core focus on the "unknowns," our AI transforms data into clear threat narratives and actionable defense strategies. Dream's AI cybersecurity platform represents a paradigm shift in cyber defense, employing a novel, multi-layered approach across all organizational networks in real-time. At the core of our solution is Dream's proprietary Cyber Language Model, a groundbreaking innovation that provides real-time, contextualized intelligence for comprehensive, actionable insights into any cyber-related query or threat scenario.

About the Role

At Dream, we redefine cyber defense vision by combining AI and human expertise to create products that protect nations and critical infrastructure. This is more than a job; it’s a Dream job. Dream is where we tackle real-world challenges, redefine AI and security, and make the digital world safer. Let’s build something extraordinary together.

Dream's AI cybersecurity platform applies a new, out-of-the-ordinary, multi-layered approach, covering endless and evolving security challenges across the entire infrastructure of the most critical and sensitive networks. Built as part of a broader sovereign AI platform, our technology is designed to operate in on-premise, private cloud, and air-gapped environments, enabling nations to maintain full control over their data, infrastructure, and AI capabilities. Central to Dream's proprietary Cyber Language Models are innovative technologies that provide contextual intelligence for the future of cybersecurity.

At Dream, our talented team, driven by passion, expertise, and innovative minds, inspires us daily. We are not just dreamers, we are dream-makers.

The Dream Job

It starts with you - an engineer driven to build the ML platform that turns research into reliable, production-grade intelligence. You care about reproducibility, low-friction experimentation, and infrastructure that earns the trust of the scientists and researchers who depend on it daily. You'll architect and ship Dream's ML platform - training pipelines, model serving, feature stores, experiment tracking, and compute orchestration - turning models into production capabilities across cloud and on-prem, including air-gapped deployments. A significant part of the platform supports large language models, with unique challenges across training, evaluation, and inference in mission-critical environments.

If you want to make a meaningful impact, join Dream's mission and build the ML platform that drives Sovereign AI products - this role is for you.

The Dream-Maker Responsibilities

Build and operate ML training infrastructure - distributed training pipelines, compute scheduling, and reproducible experiment workflows that data scientists rely on daily.
Own model serving and inference systems - packaging, deployment, autoscaling, A/B testing, canary rollouts, and latency/cost optimization for production models.
Run feature stores, model registries, and dataset versioning - enabling self-serve feature engineering, model lineage, and reproducible experiments across teams.
Build experiment tracking and evaluation infrastructure - automated evals, comparison dashboards, drift detection, and monitoring that give teams visibility into model behavior and performance.
Build and maintain production pipelines for training, fine-tuning workflows, and serving domain models - owning reliability, reproducibility, and scale.
Build and maintain the monitoring and observability layer - model performance tracking, data and prediction drift detection, data quality validation, and alerting.
Improve performance and cost across the ML stack - training throughput, inference latency, batch vs. real-time tradeoffs, and compute cost management.
Ship shared tooling - libraries, templates, CI/CD for models, IaC, and runbooks - while collaborating across Data Platform, AI, Data Science, Engineering, and DevOps. Own architecture, documentation, and operations end-to-end.

The Dream Skill Set

5+ years in software engineering, with 2+ years focused on ML infrastructure, MLOps, or data-intensive systems
Engineering craft - Strong Python, distributed systems design, testing, secure coding, API design, CI/CD discipline, and production ownership.
ML platform & serving - Model serving frameworks (e.g., Triton, TorchServe, vLLM, Ray Serve); model packaging, deployment pipelines, and inference optimization
Training infrastructure - Distributed training pipelines (e.g., frameworks like PyTorch, JAX) experiment orchestration and reproducibility
ML lifecycle tooling - Feature stores, model registries, experiment tracking (e.g., MLflow, Weights & Biases); dataset versioning and lineage
Data pipelines - Building training and inference data pipelines; familiarity with tools like Spark, Airflow/Dagster, and streaming ingestion
Comfortable with AI coding tools like Cursor, Claude Code, or Copilot

Nice to Have:

Experience operating in constrained environments - on-premise, private cloud, or air-gapped deployments
Hands-on experience with simulation environments, synthetic data generation, or reinforcement learning workflows
Platform & infra - Kubernetes, AWS, Terraform or similar IaC, CI/CD, observability, incident response
Hands-on data science or applied ML experience

Never Stop Dreaming...

If you think this role doesn't fully match your skills but are eager to grow and break glass ceilings, we’d love to hear from you!

Requirements

null

Key Skills

PythonMLOpsDistributed systemsKubernetesPyTorchJAXTritonTorchServevLLMRay ServeMLflowWeights & BiasesSparkAirflowTerraformCI/CD