AI Engineer – Applied LLMs, Workflows & Evals
Build the brains of Delvo's procurement intelligence — reliable LLM workflows and agent systems with strong evaluation at their core.
About Delvo
Delvo is the agentic intelligence layer for strategic procurement — helping enterprise teams turn $50 trillion in global spend into a strategic advantage. Our AI agents combine supplier data, price benchmarks, and risk signals with human judgment to deliver 10x faster preparation and measurable savings.
You'll design the systems that make this reliable in production — building workflows, guardrails, and evaluation loops that improve continuously. Not demos. Real enterprise environments.
What you'll build
- LLM workflows for retrieval, tool use, structured outputs, and multi-step reasoning.
- Agent orchestration with tools, control flows, retries, safety checks, and graceful degradation.
- Evaluation & monitoring infrastructure — golden datasets, online metrics, tracing, regression detection.
- Enterprise integrations with ERPs and data sources, with strong observability throughout.
- Cost & latency optimisation via caching, streaming, batching, and intelligent model routing.
- Collaborate with design and forward-deployed engineers to ship AI that works for real users.
Your qualifications
- TypeScript & Python across backend and product-adjacent work (Next.js, workers, APIs).
- Hands-on LLMs: RAG, function calling, tools, structured parsing, guardrails, streaming.
- Modern AI stack: Vercel AI SDK, OpenAI/Azure, embeddings, vector stores, Langfuse or similar.
- Evals & quality mindset: define tasks, gold data, and success criteria; prevent regressions systematically.
- Bonus: data pipelines, ERP integrations, procurement domain knowledge.
What you'll get
- Competitive salary + meaningful equity — real ownership in a company you'll shape.
- AI-first tooling — unlimited tokens, best-in-class tools, investment in your setup.
- Deep technical growth in agentic systems, reliability engineering, and production AI.
- Direct founder access — small team, visible outcomes, no layers.
- CIC Berlin, Kreuzberg — office in the heart of Berlin's startup ecosystem.
- Path to AI/ML leadership as we scale from founding team to engineering org.
Ready to build reliable AI systems?
You like measurable progress: tracing, evals, and clean abstractions. You ship, learn, and improve systems week over week.
This is a founding role — you'll define how AI reliability works at Delvo.
Apply
Apply for this role
Share your details and we'll reach out personally. Every application is reviewed by our founding team.
Questions about this role? hello@delvo.ai