Ostinato

An Experiment in reliable local LLM-driven agents via deterministic orchestration

Getting locally-run LLMs to do useful work reliably is hard. Models are fickle: outputs vary, reasoning drifts, long agentic loops accumulate errors. Most frameworks respond by making the LLM responsible for more — tool selection, planning, retry logic. Ostinato goes the other way.

The idea: LLM calls are leaf nodes in an otherwise conventional async program. A heterarchy of goals — written as normal procedural code — controls flow, state, and sequencing. The model is called only to do what models are genuinely good at: managing language, conducting dialogue, and making fuzzy categorizations. Everything else is deterministic.

The pattern

A typical node poses a structured question to the model and dispatches on the result. The program decides what to ask and what to do with the answer; the model decides how to phrase things and what the answer actually is:

def get_answer_to_question(thread, question, valid_answers, instructions):
    while true:
        reply = agent(build_prompt(thread, question, instructions))

        if reply.action == 'answer':
            return reply.text          -- program resumes with the answer

        else if reply.action == 'confer':
            send_message_to_thread(thread, reply.text)
            wait_for_new_messages_on_thread(thread)  -- suspends to disk

        else if reply.action == 'cancel':
            return none

The model decides what to say; the program decides what happens next. (And, thanks to Largo, if the user goes silent for three days, the task simply waits on disk — no timeout, no lost state, no restart needed.)

Why this works

Pure agent loops can fail because they ask the model to maintain coherent long-range intent across many steps, recover gracefully from its own errors, and manage state that isn't in its context window. These are things models are bad at. Deterministic programs are good at all of them.

The tradeoff is that you have to think carefully about what you're asking the model to decide — but that discipline tends to produce cleaner applications anyway. The goal stack is explicit, auditable, and restartable. Individual LLM calls are small, focused, and easy to test.

Goal context

Because the goal stack is a real program stack, it can be inspected and summarized for the model at any call site. Each function's doc string — the {| … |} block above its definition, like "Track what the user eats and drinks." in the example below — serves as its goal summary. format_goal_stack walks the call stack collecting these, so the model always knows where it sits in the overall task without that context having to live in a long conversation history:

def format_goal_stack():
    stack = []
    frame = __caller__
    while frame?:
        if (blurb = frame.__goal__.summary)?:
            append(stack, blurb)
        frame = frame.__caller__
    return joinlines(reversed(stack))

A worked example

Diet logging illustrates the pattern end-to-end. The skill gathers what the user ate in a single natural message, then resolves any ambiguous meal times through dialogue, and finally persists structured log entries — all in straightforward procedural code:

from channels    import send_message_to_thread, User
from core_skills import format_goal_stack, get_answer_to_question
from llm         import agent

type DietLogEntry:
    val when                    -- ISO time string
    val when_text               -- How the user described the time (e.g., "lunch")
    val items                   -- List of items eaten

add_property(User, 'one', 'diet_log')

def diet_log(user):
    if not user.diet_log?:
        user.diet_log = []
    return user.diet_log

format_diet_log_entry = <e: "[{e.when}] ({e.when_text}) {sjoin(e.items, ', ')}">
format_diet_log       = <l: joinlines([format_diet_log_entry(e) for e in l])>

{|
 | Track what the user eats and drinks.
 |}
def track_diet(thread, user):
    ate_what = get_answer_to_question(thread, "What did you eat and (roughly) when?",
        instructions="Be sure to list _everything_ the user mentioned eating in this thread,
                      even if at different times, and be sure to include the respective times!")

    items = agent($"Re-format "{ate_what}" as a json object mapping each time
                    to a list of items consumed then.
                    Return action='itemize', items=<the mapping>"$).items
    if not items?:
        send_message_to_thread(thread, "Sorry, having trouble today.  Let's start over...")
        return

    for when:item_list in items:
        when_iso = get_answer_to_question(thread, "When, roughly, was {quote(when)}?",
            instructions="- Don't make guesses for when, e.g., 'dinner' is--just ask
                          - Conversely, don't pester!  Once you have an approximate time of day, that's good enough.
                          - Assume the user's timezone is as shown in the chat log timestamps.
                          - Convert their answer to ISO format in the same timezone before returning.")
        append(diet_log(user), DietLogEntry(when=when_iso, when_text=when, items=item_list))

    send_message_to_thread(thread,
        "Your diet log to date:
            {format_diet_log(user.diet_log)}
         ")

The code asks the model two kinds of questions: a broad one to extract everything the user mentioned eating, and a narrow one per meal to pin down the time. The dialogue loop in get_answer_to_question handles the back-and-forth; track_diet just awaits answers and logs entries. Here is an actual Telegram session from that code running now (local LLM, dolphin24b):

[5/11/26 2:36 PM] User: Smoothie and bacon for breakfast. Salmon and rice and green beans for lunch. Hot chocolate after that.

[5/11/26 2:37 PM] Agent: When roughly was your breakfast? Just give me an approximate time.

[5/11/26 2:37 PM] User: pretty late -- 11am?

[5/11/26 2:37 PM] Agent: When roughly was your lunch?

[5/11/26 2:37 PM] User: ~130

[5/11/26 2:37 PM] Agent: Your diet log to date: [2026-05-11T14:36:48-07:00] (afternoon) hot chocolate [2026-05-11T11:00:00-07:00] (breakfast) smoothie, bacon [2026-05-11T13:30:00-07:00] (lunch) salmon, rice, green beans

[5/11/26 2:38 PM] User: cookie just now

[5/11/26 2:38 PM] Agent: Your diet log to date: [2026-05-11T14:36:48-07:00] (afternoon) hot chocolate [2026-05-11T11:00:00-07:00] (breakfast) smoothie, bacon [2026-05-11T13:30:00-07:00] (lunch) salmon, rice, green beans [2026-05-11T14:38:00-07:00] (14:38) cookie

Note that for the hot chocolate, the agent somewhat invalidly guessed "afternoon" (and 2:36pm) from the message context, when it should have asked or at least anchored it closer to lunch. Hopefully kinks like that can be worked out with prompt and process engineering, but that's part of what this experiment seeks to find out.

On the plus side, in the second exchange "cookie just now" resulted in a complete log entry with no questions at all as we would hope — the LLM was able to resolve the time from "just now" and the message timestamp, so the code proceeded straight to logging.

In general note that it's very easy here to add double-checking and other such diligence where needed or in low-level routines that get reused. The process is fully controllable.

TODO: Next up will be maintaining a recipe/ingredient file, and matching that to logged items on the fly, so that the diet log can be automatically broken down by ingredient and approximate quantities. Mentioning unknown items will trigger a sub-goal to get the recipe or ingredients for those, and so on. (The hope is that once the patterns for this sort of thing are established, any particular rigorous process of this kind can be set up in just a few lines of code.)

Source

These are the current working sources, syntax-highlighted by vim:

init.largo Primitives and builtins — the bootstrap layer llm.largo LLM agent wrapper — model config, system prompt, call/response logging channels.largo Message routing, thread management, channel abstraction telegram.largo Telegram adapter — polling loop, users, inbound dispatch core_skills.largo Reusable skill utilities — get_answer_to_question, goal stack formatting skills.largo Top-level skill registry — maps goals to handler functions diet_tracker.largo Diet logging skill — the worked example above

Tarball of all source files here.

Built on Largo

Ostinato is implemented in Largo, a scripting language whose runtime is a persistent transactional object database.

The current implementation targets locally-run models via Ollama. LLM calls are serialized; multiple concurrent user dialogues are handled as separate Largo tasks sharing a single model.

Discuss on Telegram.