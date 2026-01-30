Role Overview
Scale’s rapidly growing International Public Sector team is focused on using AI to address critical challenges facing the public sector around the world. Our core work consists of:
- Creating custom AI applications that will impact millions of citizens
- Generating high-quality training data for national LLMs
- Upskilling and advisory services to spread the impact of AI
As a Production AI Ops Lead, you will design and develop the production lifecycle of full-stack AI applications, while supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and the resilient cloud infrastructure required for our international government partners.
At Scale, we’re not just building AI solutions—we’re enabling the public sector to transform their operations and better serve citizens through cutting-edge technology. If you’re ready to shape the future of AI in the public sector and be a founding member of our team, we’d love to hear from you.
You will:
- Own the production outcome: Take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies.
- Ensure Full-Stack integrity: Oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment.
- Scale the feedback loop: Build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability.
- Navigate global compliance: Manage the technical lifecycle within diverse regulatory frameworks.
- Incident command: Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building the guardrails to prevent them from happening again.
- Bridge the gap: Translate deep technical performance metrics into clear insights for senior international government officials.
- Drive product evolution: Partner with our Engineering and ML teams to ensure the lessons learned in the field directly influence the technical architecture and decisions of future use cases.
Ideally, you have:
- Experience: 6+ years in a high-impact technical role (SRE, FDE or MLOps) with experience in the public sector.
- Global perspective: Familiarity with international government security standards and the complexities of deploying sovereign AI.
- System architecture proficiency: Proven experience maintaining production-grade applications with a deep understanding of the full request lifecycle-connecting frontend/API layers to the backend and AI core.
- Modern AI Stack expertise: Proficiency in coding and the modern AI infrastructure, including Kubernetes, vector databases, agentic development, and LLM observability tools.
- Ownership: You treat every production deployment as your own. You race toward solving hard problems before the customer even sees them.
- Reliability: You understand that in the public sector, a model failure may be a risk to public safety or privacy.
- Customer communication: The ability to explain to a high-ranking official why the performance of the system has degraded and how we are fixing it.
About Us:
At Scale, our mission is to develop reliable AI systems for the world's most important decisions. Our products provide the high-quality data and full-stack technologies that power the world's leading models, and help enterprises and governments build, deploy, and oversee AI applications that deliver real impact. We work closely with industry leaders like Meta, Cisco, DLA Piper, Mayo Clinic, Time Inc., the Government of Qatar, and U.S. government agencies including the Army and Air Force. We are expanding our team to accelerate the development of AI applications.
