The case for delay-tolerant networking in cross-border deployments

Last year we ran a pilot IoT deployment that crossed two borders, aggregating sensor data from agricultural monitoring stations spread across parts of three countries. The project was technically straightforward: ruggedised sensors, solar power, LTE where coverage existed, satellite where it didn't. The architecture worked fine in the lab. In the field, it collapsed within two weeks because of an assumption we had not articulated clearly enough: we assumed TCP could survive the conditions the devices would actually encounter.

TCP's assumption that both endpoints are simultaneously reachable seems so fundamental that it barely registers as an assumption. It only becomes visible when you are operating in environments where connectivity is intermittent, asymmetric, or politically complicated. Cross-border links that transit through a country experiencing routing instability, satellite links with 700 ms round-trip times and frequent loss events, and LTE cells that disappear for six hours when the generator runs out of fuel — all of these break TCP in ways that are hard to detect and even harder to debug remotely.

The specific failure mode that killed our original architecture was not dropped connections per se — those recovered. It was TCP's congestion control interacting badly with long-delay satellite links. A connection that was technically alive would enter a congestion-avoidance state triggered by a single dropped packet on a 700 ms RTT link, and then transmit at something like 80 bytes per second for minutes before recovering.

A "live" TCP connection collapsing to 80 bytes per second on a 700 ms link looks healthy in your monitoring. It just doesn't move data.

We rebuilt the data transport layer around the Bundle Protocol (RFC 9171) with a custom store-and-forward gateway at each border crossing point. Each sensor buffers readings locally. When connectivity is available to the nearest gateway, it forwards a bundle; the gateway holds the bundle until it can forward it to the next hop. End-to-end latency went from "theoretically low, practically unreliable" to "consistently 4-12 hours, never lost." For agricultural monitoring, 12-hour latency is fine.

Each hop holds bundles in custody until the next link is reachable. Latency rises but data integrity is absolute.

The gateway hardware was the part that surprised us most. We had assumed we would need something substantial — a small server, reliable power, climate control. In practice, a Raspberry Pi 5 with a 2 TB USB SSD ran the DTN daemon comfortably for the traffic volumes we were seeing (roughly 800 MB per day per border point), including checksumming and bundle custody transfer. Power draw at idle was around 4W, which a modest solar panel could sustain even through cloudy periods. The main operational burden was not the hardware but the key management for the bundle authentication and integrity checks.

Three takeaways for anyone considering a similar architecture:

If your link can disappear for hours, TCP is the wrong abstraction. Bundle protocol with store-and-forward gives you actual delivery guarantees in environments where connection-oriented protocols would silently fail.

Hardware can be modest. A Raspberry Pi-class device is enough for surprisingly high traffic volumes if you're not also asking it to do heavy compute.

Key management is the hard part. You will spend more time on bundle authentication operations than on the protocol itself.

Subscribe at edgesignal.example — see also our follow-up on why we abandoned WireGuard for the same use case.