Every quarter we have a version of the same conversation with someone planning a field-deployable radio kit. They want to add an LLM-driven feature — voice transcription, semantic search across a local document corpus, structured-data extraction from forms — and they want to use a cloud API because the cloud API is good and the on-device alternatives feel like more work. Every time, we walk them through the latency budget, and every time, the conversation ends with the same answer.
The constraint that nobody wants to hear is this: a field radio kit operating where it's likely to operate has a connectivity story that does not include a reliable, low-latency uplink to a hyperscaler. Iridium short-burst data is roughly 47 seconds round-trip best case for a payload of any useful size. Geostationary satellite is 700 ms RTT and that's before you account for the ten percent of the time the link drops mid-packet. LTE at the edge of coverage is closer to 4 seconds RTT and unreliable. None of these is compatible with a UI target measured in hundreds of milliseconds.
So you can't put the model in the cloud. The model has to live on the device. That sounds like the end of the conversation — except that nobody wants it to be, because cloud models are good and on-device models, until quite recently, were not.
If your link's RTT exceeds your UX latency budget, the model lives on the device. There is no third option.
The good news is that the on-device situation has changed dramatically in the last two years. A modern phone-class SoC can run an 8B-parameter quantised model at 15-20 tokens per second. A modest workstation-class SoC like an RK3588 can run a 7B model at similar speeds with comparable accuracy. The model weights are 3-4 GB, which fits comfortably on the storage in any radio kit you'd actually deploy. Cold-load time is under a second with memory-mapping. None of this would have been true three years ago.
The thing that still surprises people is the operational simplicity. There is no API key to rotate, no quota to monitor, no rate-limit to design around, no cloud bill that grows with usage. The model is a binary that ships with the firmware. The radio kit is fully functional in airplane mode, in the desert, in a basement. That is a meaningful operational difference that is worth a lot more than the marginal accuracy gap to a frontier cloud model.
Three takeaways:
If your link RTT exceeds your UX latency budget, on-device is the only honest answer. Don't pretend otherwise — your users will feel it.
The on-device gap has closed faster than most people realise. Models from 18 months ago were toys; models from 6 months ago are production-viable for most radio-kit tasks.
Operational simplicity is worth real accuracy. No keys, no quotas, no bills, no rate limits — that's not a small thing for a deployed radio kit.
Subscribe at edgesignal.example — see also our deep-dive on quantising for Snapdragon-class hardware.