Reliable and efficient
Your app does not stall at one million users. More devices means more distributed compute at the edge.
Run your AI workflows locally on Apple silicon devices running iOS, macOS, tvOS, visionOS, or watchOS.
Start with the Swift SDK, the Dart SDK, or Onde CLI if you want to test model downloads and GGUF output before you wire inference into app code.
use onde::inference::{ChatEngine, GgufModelConfig}; let engine = ChatEngine::new();engine.load_gguf_model( GgufModelConfig::platform_default(), Some("You are a helpful assistant.".into()), None,) .await?; let result = engine.send_message("Hello!").await?;println!("{}", result.text);// completed in 85ms — 100% on deviceBenchmark
| Inference Layer | Latency (ms) | Server Cost | Privacy |
|---|---|---|---|
| Cloud API | 1,200ms+ | $$$$ | Leaves device |
| Onde on-device | 85ms | $0 | Stays on device |
First-class citizen
Your app does not stall at one million users. More devices means more distributed compute at the edge.
Two lines to load a model. One line to run it. The SDK gets out of your way so you can ship.
Hand-tuned for Apple Neural Engine and Apple silicon. Onde speaks closer to the metal than generic wrappers.
In production
Onde ships in real apps from Splitfire AB. If you want to see the network side of the system, open Onde Inference Pulse.
One More Thing
Apple, App Store, iOS, and macOS are trademarks of Apple Inc.