Tuesday, April 30, 2024

Slashing Latency: How Uber's Cloud Proxy Transformed India's User Experience

In the fast-paced world of globally scaled technology, every millisecond counts. I joined Uber in April of 2015, and if I'm being truly honest, I'm not sure if this story happened at the end of 2015 or in 2016. It all sort of blurs together when you're working at a startup like Uber. For the sake of this story, I'm going to say that it's late 2015. We were already a global company and were constantly under the immense pressure of scale.

One day, while working in the office at 555 Market Street in downtown San Francisco, I overheard a conversation that piqued my interest. If I'm being honest, it sort of pissed me off. A team was discussing the high latency issues faced by our users in India. The latency was so severe that it took over 900 milliseconds to initialize the first fetch, stacking on each other and making the experience miserable. In hindsight, anger may have been a telling state of my emotional headspace at the time, but that's for another story. But I was angry for our users in India. I was angry they had to wait SECONDS just for the page to load. Adding to that, most of India at the time was on the 3G networking and on remote or rural connectivity, and their experiences would have made my mind numb.

Think about that. When most of us load an app that takes longer than 250 ms to initially fetch data, you'll usually do 1 of two things.

  1. Close the app and re-open it, hoping it's a transient issue.
  2. Close the app and never return.
Back at Uber HQ (or it was HQ at the time), I continued to listen to a group of engineers talking about how they wanted to approach this problem. They had been discussing using Squid, Nginx, or even a CDN to cache data at the edge of our data centers... They wanted to cache assets in our data centers... IN THE UNITED STATES! This was simply unacceptable. Anyone that has experienced any type of scale before can tell you this wouldn't even make a dent in terms of actual latency reduction. If I had to guess, it'd be in the terms of 10s of milliseconds, if that. As I continued to listen, it came up that they had been working on solving this problem for 3 weeks! Three whole weeks we had known about the issue and they were still on drawing board in terms of a solution.

Now, I had worked at Nest Labs prior to Uber and while it wasn't at all the same size in terms of scale, we did face problems with devices connecting from around the world. I knew the main culprit as soon as I heard the engineer outline the problem. The ocean has a lot of dead space in terms of traveling packets, and we're limited by the speed of light when it comes to global latency.

My mind was blown. I was motivated by their lack of progress and confident enough in my Go skills that in 15 minutes, while two engineers and a senior manager brainstormed a solution, I had written a proof of concept tool that would eventually become known as Cloudley, a companion to the service mesh tool known as Muttley. 

I don't have the original code, but it was small, something like 120 lines of code, but the real driving factor was Go's ReverseProxy. Using Go's ReverseProxy I was able to establish and cache a connection to the Uber Front End, persisting that secure connection and eliminating the need to do a 900+ millisecond TLS handshake each time a new request was made.

With this code snippet built and tested the goal was now simple: reduce latency and improve the user experience in India. Seven days later, we launched the product, pushing it as close to the end user as possible in a POP (point of presence) and the results were astounding. Latency dropped from 900+ milliseconds to a mere 400 milliseconds. The graph was used for various brown bags and all hands for a good 8 months, which was decades at Uber.


The Science Behind the Speed

To understand the significance of this improvement, let's dive into the math. Our primary ingress at the time was located in California, with a secondary data center in Virginia. The distance from California to India is approximately 13,000 kilometers. Considering the speed of light (around 300,000 km/s) and the fact that India was primarily using 3G networks at the time (which reduces the effective speed to about 133,333 km/s), we can calculate the round-trip time (RTT) as follows:

RTT = (13,142 km / 133,333 km/s) * 2 ≈ 197ms

Now, let's factor in the TLS handshake. TLS uses a three-way handshake, which means three round trips across the ocean. This translates to a p50 (median) latency of roughly 600 milliseconds just to establish the TLS connection. Add in the final hop back with data, and you get a total latency of around 800 milliseconds.

Halub3, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons



The Power of Go and Edge Computing

Our cloud proxy leveraged Go's reverse proxy functionality to establish a secure TLS connection to Uber's frontend. By deploying the proxy onto cloud providers right at the edge in India, we effectively eliminated the need for multiple round trips across the ocean. This simple yet powerful solution slashed nearly 600 milliseconds from each new request, resulting in a dramatically improved user experience.

While I've pivoted my career towards AI Risk and Security, I'm thankful for this (and many other) experiences I had at Uber. This was just once of many examples of some amazing ingenuity we developed and it will always be a fond memory for me.


I'm curious if you have ever experienced insanely high latency or come against unreasonable road blocks that you ran through to accomplish amazing engineering feats. Let me know on Twitter!


I want to thank Twitter user @gillarohith for suggesting I turn this thread into a blog post. Thank you!

No comments:

Post a Comment

Moving over to Medium!

 Thanks for checking out my TODO: Fix This blog! Please continue over to Medium , where I'll continually update and post content! Thank ...