Internet Latency and Multi-Master Database Transactions

There’s 2 common misconceptions in engineering West Coast – East Coast data centers:

that packets travel at the speed of light
that database transactions must be 2-phase commit, and two masters cannot be located very far apart because of #1.

Most engineers halt with those rules-of-thumb in mind, but what happens when we look at the actual numbers with ecommerce/advertising applications in mind?

Cross-USA Internet Packet Latency (One-Way)

Figure 1: SF-NY (4,700 km geographic distance)

Transmission Method	End-End Speed	Time SF-NY	Note
Light in vacuum	299,792 kps	16 ms	similar speed in air
Microwave in air	235,000 kps	20 ms	Repeaters every 48 km (actually built in 1950s in both USA and Canada! Currently HFT applications use 15+ microwave routes from Chicago to New York.)
“Electricity” in wire	235,000 kps	20 – 30 ms	Depends on how wire is constructed and what electromagnetic property is used to transmit information. As a rule of thumb, electrons flow in a typical electronics wire at c/2 while an electromagnetic wave propagated in an air gap is up to .99c.
Light in silica fiber (theoretical)	204,081 kps	22 ms	Index of refraction is 1.45
Oceanic cable for comparison	156,666 kps	30 ms	Including amplification and switching
Google Routing in silica fiber	150,000 kps	31 ms	Extrapolated from The Dalles to Ashburn (4,350 km) at 29 ms
AT&T Routing in silica fiber	150,000 kps	31 ms
Public Internet Packets in silica fiber	137,000 kps	35 – 36 ms	Public Internet already using MPLS
Transmission Method	End-End Speed	Time SF-NY	Note

From Figure 1 above, we see that light can travel from SF to NY in 16 ms, yet the public Internet averages 35 ms. That’s 2.2x longer than expected if packets are supposed to travel at the speed of light in a vacuum.

Ever watched the expression on somebody’s face when you tell them, “Packets don’t travel at the speed of light.”? It’s priceless.

Now that we’ve described numerically the latency limits, there’s some very interesting things to investigate:

A serious enterprise could construct microwave towers across the USA again for latency-sensitive traffic with 20 ms latency in good weather, with fiber backup. (“It’s better to be fast 99% of the time than slow 99.999% of the time” – mckay-brothers.com )
If SF – NY is too ambitious (after all, SF is earthquake-prone) “pinch” the west and east-most locations by using a central location. (See below.)

Figure 2: Instead of SF-NY (~31 ms today) Data Center locations, “pinch” the network topology of the master database pair to LAS or SLC and ATL or ORD (~20 ms today). (Map of USA Population Centers According to Major Airport Traffic)

Figure 3: Another interesting topology, using near speed-of-light microwave links from the East-most master database Chicago to NY (8.5 ms). Instead of spending a few billion dollars on a nation-wide microwave chain, one of the 15+ existing providers in Chicago can be leveraged for 1,300 km for low-bandwidth transaction traffic.

Wide-Area Multi-Master Database Transactions

So how does that help us with multi-master database latency?

for 2-phase/sync commit, 31-35 ms for a medium to high volume of OLTP transactions isn’t workable, especially over the Public Internet. But 17-20 ms of reliable latency is fundamentally different. (10 ms is the same as public Internet latency from San Jose to Las Vegas!) An optimized ecommerce store application would work with a reliable latency near 20 ms. (Confirmed with Percona Consulting.)
if that’s not workable, think beyond 2-phase commit. Lamport/vector clock algorithms have been available since 1988, and have been implemented in Voldemort and soon in Redis (soon you can delegate database session handling, etc. to Redis if you need cross-DC availability.) Cassandra uses last-write wins and is DC-aware. Use NTP/GPS/optimization like Google Spanner does.
#1 can be modified by “pinching” the location of the database masters. Instead of thinking SF and NY, locate the masters in Las Vegas or SLC and Atlanta or ORD with read-slaves in SJC and Ashburn as required.
Google and AT&T have virtually unlimited CONUS fiber, meaning unlimited bandwidth and known reliability around 31 ms. A new algorithm can be built according to those constraints. Think git, but for database transactions.

What Does a Reliable Network Mean?

Reliable for wide-area multi-master database transactions means:

almost always partition-free – 5x9s or more during most-active shopping times (Google is emphasizing partition-free in their networks, as it’s far easier than reducing latency and more predictable overall)
zero packet loss
maintenance windows known in advance
good enough for your DBA Team to say “Yes, we can support this.”

At this time, that requires a private network, either yours or a cloud provider.

What is the Low-Hanging Fruit?

From lowest-cost to highest-cost for making database transactions WAN-safe:

wiki exercise – document how your business applications:
1. Internal and external SLAs are defined
2. connects to the database (what options are used, are they persistent and how many round-trips result)
3. how many database round-trips are needed per page
4. how sessions and session failover works
5. what percentage of writes vs. reads are made
6. are the transactions as thin as possible using row self-updates and removing read-before-write cases aka race conditions
7. how it all should really work.
data reduction/archiving (just active OLTP rows, please)
use transaction group commit
pinching west-most and east-most locations closer together. ie. put one master in a central location. See Figures 2 and 3 above.
algorithms like vector clocks, or newer/better
reducing latency on existing routes (MPLS, direct optic routes)
building new private CONUS/Gulf of Mexico fiber route.

In my experience, most organizations never even get to step #1 above:

Fortunately, there is a half-measure: multi-AZ with AWS uses different data centers in the same region with only 1-2 ms latencies. James Hamilton from AWS calls using small data centers in the same region “limiting the blast radius.”

The Speed of Light – Depends on the Medium

The speed of light in a vacuum is 299,792,458 meters per second, or 186,282 miles per second. In any other medium, though, it’s generally a lot slower. In normal optical fibers (silica glass), light travels a full 31% slower.

Exercises for the Reader

Fill in the wiki outline above.
What regions does my cloud provider support?
What is the lowest inter-master latency that can be provisioned?
How many TPS does my database do that is directly ecommerce-related (not DW or logging)?

Please leave a comment (no registration required) if you have any experience implementing similar topologies, or have suggestions or corrections.

Resources

How Google Does It

cloudplatform.googleblog.com: With Multi-Region support in Cloud Spanner, have your cake and eat it too

Microwave WAN Transmission

The secret world of microwave networks
The Abandoned Microwave Towers That Once Linked the US
Trans Canada Microwave
mckay-brothers.com: Microwave Bandwidth at Extreme Low Latency

Fiber Optic

Calculating Optical Fiber Latency
$1.5 billion: The cost of cutting London-Tokyo latency by 60ms
Researchers create fiber network that operates at 99.7% speed of light, smashes speed and latency records (fiber optic waveguide)

Public Internet Latency Measurements

SO: How much network latency is “typical” for east – west coast USA?
AWS Inter-Region Latency

Research and Other News

netflix: Active-Active for Multi-Regional Resiliency
Network latency – how low can you go?
W: Multiprotocol Label Switching (MPLS)
Latency: The New Web Performance Bottleneck
developer.apple: Networking Concepts
hpbn.co: Primer on Latency and Bandwidth
Network performance: Links between latency, throughput and packet loss
Turning the Optical Fiber Network into a Giant Earthquake Sensor

Internet Latency and Multi-Master Database Transactions

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112