Ollama: Structured Input and Output

Getting parseable output back from an LLM has two halves: shaping the request, then parsing the response. Even with Ollama's format: parameter set, models leak conversational preamble like Sure! Here is the JSON: and invent fields, so parse-and-repair is mandatory.

Input

Message types

Ollama's chat endpoint Show archive.org snapshot takes a messages array with four roles: system, user, assistant and tool. The following message pattern works well for us:

one system message with instructions
optional alternating user and assistant messages for the chat history
a final user message with the optional retrieved RAG context and the actual question or task

Putting the actual task last primes the model to focus on the action, not the surrounding context. As models are trained on conversational patterns, we avoid consecutive user messages for the RAG context and user question.

Choosing an input format

The final user message has to bundle the retrieved RAG context with the actual question. How you encode that bundle matters: it sets how cleanly the model can separate background from the task. A few options:

Plain text

The easiest format. Raw text works for simple prompts without nested structure:

Here is the relevant context to help answer my next question:

[Insert RAG Context Here]

Based on the context above, please answer this question: [Insert Final Question Here]

An easy iteration to this approach is to use Markdown. Because LLMs are heavily trained on it, using headers gives the model stronger semantic boundaries.

# RAG Context
[Insert RAG Context Here]

# Question
[Insert Final Question Here]

However, neither of those solutions offers a way to fence user content off from your own instructions. Switch to JSON or XML once that starts to bite.

JSON

Battle-tested and familiar to every model, if a bit verbose. Wrap the retrieved context and the user's question into a single JSON object that escapes unbalanced quotes or braces in the dynamic content:

documents = search_results.map.with_index(1) do |result, index|
  {
    index:,
    source: result.file_name,
    matches: result.matching_snippets,
  }
end

user_message = JSON.generate({
  instructions: 'Use the documents as context to answer the question.',
  documents:,
  question: user_query,
})

XML

Each part of the structure is named by its tag, giving the model stronger semantic anchors than JSON's keys. That extra clarity is part of why XML sometimes works better than JSON, especially with smaller models. It can also be more compact for nested data. CDATA is used to escape dynamic content, NO_DECLARATION to save some tokens:

builder = Nokogiri::XML::Builder.new(encoding: 'UTF-8') do |xml|
  xml.prompt do
    xml.instructions 'Use the documents as context to answer the question.'
    xml.documents do
      search_results.each.with_index(1) do |result, index|
        xml.document(index:) do
          xml.source result.file_name
          Array(result.matching_snippets).each do |snippet|
            xml.match { xml.cdata snippet }
          end
        end
      end
    end
    xml.question { xml.cdata user_query }
  end
end

user_message = builder.doc.root.to_xml(
  save_with: Nokogiri::XML::Node::SaveOptions::NO_DECLARATION,
)

TOON

TOON Show archive.org snapshot is another compact format designed to save tokens. Worth a look on the input side when you are running into long generation times due to inherently large context sizes. Do not ask the model to produce TOON, though: LLMs are not trained on it, and unfamiliar output formats degrade quality more than the token savings buy you.

Input Recommendations

Match the input format to the output format. Do not make the model translate between two structures in one shot.
Always escape user content that could break structure
Try stripping whitespace from the input. Output tokens dominate latency since they are generated sequentially, so compacting the response shape pays off most, but compact input still saves context budget. Note however that sometimes whitespace carries semantics (like indentation), so be ready to revert this change if you notice a quality regression.

Output

Choosing an output format

Both Ollama's and OpenAI's APIs only officially support JSON as structured output. Start with JSON; build an XML response parsing harness only if the JSON results are not good enough.

JSON

Ollama's API takes a format: parameter. Two options:

set it to 'json' for loose JSON mode. The model will probably respond with JSON, but it might not always follow your instructed schema.
pass a JSON schema instead. This constrains generation token-by-token and is the recommended path for production use.

See https://docs.ollama.com/capabilities/structured-outputs#generating-structured-json-with-a-schema Show archive.org snapshot .

schema = {
  type: 'object',
  required: %w[reasoning text],
  additionalProperties: false,
  properties: {
    reasoning: { type: 'string' },
    text: { type: 'string' },
  },
}

Ollama.chat(
  model: 'qwen3.5:35b',
  format: schema,
  messages: [
    { role: 'system', content: 'Respond with JSON: { "reasoning": "...", "text": "..." }' },
    { role: 'user', content: prompt },
  ],
)

The JSON Schema's description is not being passed to the model!

Ollama uses the schema for grammar-constrained decoding, but the property description is currently being silently ignored - the model never sees it. If a description matters, also describe the field in your prompt prose or via a worked example.

As of April 2026, Ollama has an open issue Show archive.org snapshot about JSON schema adherence in the Qwen 3.5 and 3.6 series. If you use them, you might be better off inlining the schema definition in the prompt instead of passing it via format:.

XML

Ollama's format: parameter only supports JSON, so request XML via prose in the prompt and parse with Nokogiri afterwards. The sanitize and validate steps below apply the same way; only the parser changes.

Define a few valid examples

Two or three example responses anchor the desired output shape. Place them in a dedicated "examples" block before the user's query, and keep them abstract: concrete-looking values pull the model's attention toward surface details of the example rather than the structural shape you want. The nanonets few-shot guide Show archive.org snapshot has more on this topic.

Sanitize and parse the response

Parse-and-repair is necessary even when format: is set. You may use this simple helper:

def parse_json_content(content)
  return {} if content.blank?

  clean = content.strip
  clean = clean.sub(/\A```(?:json)?\s*/i, '').sub(/\s*```\z/, '')

  JSON.parse(clean, decimal_class: BigDecimal)
end

Real responses contain ```json fences, leading Sure! lines, or a trailing comment. Strip the fence and consider retrying 1-3 times if the response is still not valid JSON.

Validate the schema

As we already define a JSON Schema for the LLM, we can re-use it for response validation with the json-schema Show archive.org snapshot gem. Pass the same schema from above and the parsed response:

require 'json-schema'

parsed = parse_json_content(response.message.content)
errors = JSON::Validator.fully_validate(schema, parsed)
raise "LLM response did not match schema: #{errors.join(', ')}" if errors.any?

As with parse failures, you can feed the validation errors back to the LLM and retry.

Output Recommendations

Pass a JSON schema to format:, not just 'json'. Token-by-token constraints are much stronger than loose mode.
Repeat the "respond with JSON" or "respond with XML" instruction in prompt prose. format: alone is unreliable on smaller models.
Sanitize before parsing. Strip markdown fences and prose preambles even when format: is set.
Validate parsed responses against your schema.
Fail loudly if anything is wrong.

Michael Leimstädtner

Say thanks

License

Source code in this card is licensed under the MIT License.

Posted by Michael Leimstädtner to makandra dev (2026-04-29 09:43)

Ollama: Structured Input and Output

Input

Message types

Choosing an input format

Plain text

JSON

XML

TOON

Input Recommendations

Output

Choosing an output format

JSON

The JSON Schema's description is not being passed to the model!

XML

Define a few valid examples

Sanitize and parse the response

Validate the schema

Output Recommendations

The JSON Schema's `description` is not being passed to the model!