OllamaClient.generateStream

Streaming text generation — calls onChunk for every response token.

Each call to onChunk receives one NDJSON chunk. The chunk contains a "response" string token and a boolean "done". The final chunk has "done": true and carries usage/timing metadata.

class OllamaClient
@safe
void
generateStream
(
string model
,
string prompt
,,
string system = null
,
string[] images = null
,
JSONValue format = JSONValue.init
,
string keepAlive = null
,)

Parameters

model string

Model name (e.g. "llama3.1:8b").

prompt string

Input prompt.

onChunk StreamCallback

Callback invoked per chunk; must be @safe.

system string

Optional system prompt.

images string[]

Optional base64-encoded images (multimodal).

format JSONValue

Structured output: JSONValue("json") or JSON Schema.

keepAlive string

How long to keep the model loaded.

opts OllamaOptions

Typed generation options.

Meta