An Experiment in reliable local LLM-driven agents via deterministic orchestration
Getting locally-run LLMs to do useful work reliably is hard. Models are fickle: outputs vary, reasoning drifts, long agentic loops accumulate errors. Most frameworks respond by making the LLM responsible for more — tool selection, planning, retry logic. Ostinato goes the other way.
The idea: LLM calls are leaf nodes in an otherwise conventional async program. A heterarchy of goals — written as normal procedural code — controls flow, state, and sequencing. The model is called only to do what models are genuinely good at: managing language, conducting dialogue, and making fuzzy categorizations. Everything else is deterministic.
A typical node poses a structured question to the model and dispatches on the result. The program decides what to ask and what to do with the answer; the model decides how to phrase things and what the answer actually is:
def get_answer_to_question(thread, question, valid_answers, instructions): while true: reply = agent(build_prompt(thread, question, instructions)) if reply.action == 'answer': return reply.text -- program resumes with the answer else if reply.action == 'confer': send_message_to_thread(thread, reply.text) wait_for_new_messages_on_thread(thread) -- suspends to disk else if reply.action == 'cancel': return none
The model decides what to say; the program decides what happens next. (And, thanks to Largo, if the user goes silent for three days, the task simply waits on disk — no timeout, no lost state, no restart needed.)
Pure agent loops can fail because they ask the model to maintain coherent long-range intent across many steps, recover gracefully from its own errors, and manage state that isn't in its context window. These are things models are bad at. Deterministic programs are good at all of them.
The tradeoff is that you have to think carefully about what you're asking the model to decide — but that discipline tends to produce cleaner applications anyway. The goal stack is explicit, auditable, and restartable. Individual LLM calls are small, focused, and easy to test.
Because the goal stack is a real program stack, it can be inspected and
summarized for the model at any call site. Each function's doc string —
the {| … |} block above its definition, like "Track what
the user eats and drinks." in the example below — serves as its goal
summary. format_goal_stack walks the call stack collecting
these, so the model always knows where it sits in the overall task without
that context having to live in a long conversation history:
def format_goal_stack(): stack = [] frame = __caller__ while frame?: if (blurb = frame.__goal__.summary)?: append(stack, blurb) frame = frame.__caller__ return joinlines(reversed(stack))
Diet logging illustrates the pattern end-to-end. The skill gathers what the user ate in a single natural message, then resolves any ambiguous meal times through dialogue, and finally persists structured log entries — all in straightforward procedural code:
from channels import send_message_to_thread, User from core_skills import format_goal_stack, get_answer_to_question from llm import agent type DietLogEntry: val when -- ISO time string val when_text -- How the user described the time (e.g., "lunch") val items -- List of items eaten add_property(User, 'one', 'diet_log') def diet_log(user): if not user.diet_log?: user.diet_log = [] return user.diet_log format_diet_log_entry = <e: "[{e.when}] ({e.when_text}) {sjoin(e.items, ', ')}"> format_diet_log = <l: joinlines([format_diet_log_entry(e) for e in l])> {| | Track what the user eats and drinks. |} def track_diet(thread, user): ate_what = get_answer_to_question(thread, "What did you eat and (roughly) when?", instructions="Be sure to list _everything_ the user mentioned eating in this thread, even if at different times, and be sure to include the respective times!") items = agent($"Re-format "{ate_what}" as a json object mapping each time to a list of items consumed then. Return action='itemize', items=<the mapping>"$).items if not items?: send_message_to_thread(thread, "Sorry, having trouble today. Let's start over...") return for when:item_list in items: when_iso = get_answer_to_question(thread, "When, roughly, was {quote(when)}?", instructions="- Don't make guesses for when, e.g., 'dinner' is--just ask - Conversely, don't pester! Once you have an approximate time of day, that's good enough. - Assume the user's timezone is as shown in the chat log timestamps. - Convert their answer to ISO format in the same timezone before returning.") append(diet_log(user), DietLogEntry(when=when_iso, when_text=when, items=item_list)) send_message_to_thread(thread, "Your diet log to date: {format_diet_log(user.diet_log)} ")
The code asks the model two kinds of questions: a broad one to extract everything
the user mentioned eating, and a narrow one per meal to pin down the time.
The dialogue loop in get_answer_to_question handles the back-and-forth;
track_diet just awaits answers and logs entries. Here is an actual
Telegram session from that code running now (local LLM, dolphin24b):
Note that for the hot chocolate, the agent somewhat invalidly guessed "afternoon" (and 2:36pm) from the message context, when it should have asked or at least anchored it closer to lunch. Hopefully kinks like that can be worked out with prompt and process engineering, but that's part of what this experiment seeks to find out.
On the plus side, in the second exchange "cookie just now" resulted in a complete log entry with no questions at all as we would hope — the LLM was able to resolve the time from "just now" and the message timestamp, so the code proceeded straight to logging.
In general note that it's very easy here to add double-checking and other such diligence where needed or in low-level routines that get reused. The process is fully controllable.
TODO: Next up will be maintaining a recipe/ingredient file, and matching that to logged items on the fly, so that the diet log can be automatically broken down by ingredient and approximate quantities. Mentioning unknown items will trigger a sub-goal to get the recipe or ingredients for those, and so on. (The hope is that once the patterns for this sort of thing are established, any particular rigorous process of this kind can be set up in just a few lines of code.)
These are the current working sources, syntax-highlighted by vim:
get_answer_to_question, goal stack formatting
skills.largo
Top-level skill registry — maps goals to handler functions
diet_tracker.largo
Diet logging skill — the worked example above
Tarball of all source files here.
Ostinato is implemented in Largo, a scripting language whose runtime is a persistent transactional object database.
The current implementation targets locally-run models via Ollama. LLM calls are serialized; multiple concurrent user dialogues are handled as separate Largo tasks sharing a single model.
Discuss on Telegram.