2-5

DevOps & AI Infrastructure Engineer

1/18/2026

The role involves maintaining and optimizing the organization's GPU farm while providing computing resources for various AI projects. Responsibilities include onboarding projects, building Docker images, managing GPU operations, optimizing resource usage, and automating CI/CD pipelines.

Working Hours

40 hours/week

Company Size

501-1,000 employees

Language

Hebrew

Visa Sponsorship

About The Company

Commit is a global tech services company with offices in Israel, US, Canada, UK, and Europe. The company was founded in 2005 and has over 700 multi-disciplinary innovation experts who serve a broad range of companies, from small startups to large enterprises in multiple business sectors. Commit specializes in advanced technologies and applications with dedicated practices in Cloud, GenAI, Software, IoT, Big Data, Cyber, Collaboration, Data center migration projects, and more. Commit offers innovative, end-to-end technology solutions by developing custom software and IoT platforms for clients looking to build their next-gen products within the modern ICT world. Commit’s complete and comprehensive engineering powerhouse of resources, and proprietary Flexible R&D methodology helps transform its clients’ technology visions into high-quality products while reducing costs and improving time-to-market.

About the Role

חברת Commit מחפשת DevOps & AI Infrastructure Engineer לתפקיד מאתגר ומרתק בחזית הטכנולוגיה בצפון הארץ.

במסגרת התפקיד, תחזוקה ואופטימיזציה של חוות ה-GPU הארגונית, תוך הנגשת משאבי המחשוב לפרויקטי ה-AI השונים בארגון.

הצטרפו אלינו לעשייה בעלת משמעות אמיתית והשפעה רחבה.

תחומי אחריות:

Onboarding לפרויקטים: ליווי צוותי פיתוח ו-Data Science בתהליך הכניסה לעבודה בחווה, החל מהגדרת הדרישות ועד להרצה מלאה.
בניית Docker Images: יצירה ותחזוקה של Image-ים מורכבים המותאמים לעבודה עם GPU (שימוש ב-NVIDIA Docker, CUDA, וכדומה) המותאמים לסטנדרטים הארגוניים.
ניהול ותפעול שוטף: ניהול ותפעול חוות ה-GPU על גבי סביבת OpenShift, כולל ניטור ביצועים, הקצאת משאבים ופתרון תקלות מורכבות.
אופטימיזציית משאבים: הטמעה וניהול של פתרונות תזמון וניהול תור (כמו Run:ai) למיקסום הניצולת של כרטיסי ה-GPU היקרים.
אוטומציה ו-CI/CD: בניית Pipelines להפצה מהירה של מודלים וסביבות עבודה.

Requirements

דרישות סף :

ניסיון מוכח ב-OpenShift: שליטה מעמיקה בניהול קלאסטרים, Deployment, וניהול Storage/Networking בסביבת OpenShift (או Kubernetes ברמה גבוהה מאוד).

מומחיות ב-Docker: ניסיון מעשי בכתיבת Dockerfiles מורכבים, ניהול Multi-stage builds ואופטימיזציה של גדלי Images.

הכרות עם עולם ה-Linux: שליטה מלאה במערכות הפעלה Linux (RHEL/Ubuntu) ברמת ה-Kernel והדרייברים (בדגש על NVIDIA Drivers).

ניסיון ב-Automation: עבודה עם כלי CI/CD (כגון Jenkins, GitLab CI, או ArgoCD) וכלי Configuration Management (כגון Ansible).

יתרונות משמעותיים:

Run:ai: ניסיון קודם בעבודה עם מערכת Run:ai לניהול והקצאת GPU – יתרון גדול מאוד.

AI/MLOps Background: הכרות עם ספריות ו-Frameworks כמו PyTorch, TensorFlow, ו-KubeFlow.

ניטור (Monitoring): ניסיון בעבודה עם Prometheus ו-Grafana בדגש על ניטור GPU Metrics (NVML).

Python: יכולת כתיבת סקריפטים לאוטומציה ואינטגרציה של כלים.

Key Skills

OpenShiftDockerLinuxAutomationCI/CDRun:aiAI/MLOpsMonitoringPython