1
0
Fork 0
6 Home
Atridad Lahiji edited this page 2024-12-18 04:54:41 +00:00

CMPT 815 Term Project by Atridad Lahiji — Performance Analysis of Multi-Region Web Application Distribution Strategies

Background

In this project, I investigate the performance characteristics of running web applications in a distributed manner. The core of the issue here is the idea that physical distance in networking introduces the issue of latency, which in this context is the time taken to transmit or receive data from a web server. The web server here, in many cases, tends to live in North America, and more specifically, in the USA. This poses an issue for users on the other side of the world. For instance, the experience for a user in New Zealand will be significantly worse than a user in Canada when accessing services hosted in common locations such as US-WEST-1 via AWS. This can be seen in a more concrete example, where it can take nearly a second to load my personal website from western Canada, since it is hosted in Germany. This inequality of the web is something that I took to investigate. Distributed systems over the globe is not new in practice. The goal here was to thoroughly test the performance characteristics of these systems once you scale out all of its moving pieces.

Methodology

Before proceeding, we need a model of a "traditional" web application. The architecture I propose here is common and quite simple. There will be three layers:

  1. A web server to process requests
  2. A Database to store persistent data
  3. An in-memory cache

In this model I chose Go for the web server implementation, since it has HTTP web server primitives baked into its standard library and is quite performant. The hosting make things interesting: I required a way to scale the same app to different regions. Fly.io ended up being the choice, since their firecracker microVM app platform can scale to 35 different regions. I chose three regions here to get a good spread around the world:

  1. ord: Chicago
  2. fra: Frankfurt
  3. sin: Singapore

The rest of the components came easy. For the database and cache I picked Turso SQLite and Upstash Redis as they both distribute on the same infrastructure as Fly.io. This trade-off means that while the scaling is easy to orchestrate on my end, I am subject to the way they implemented scaling on their end. This is something we will explore in the results.

The strategy I chose was one of incremental distribution. Instead of scaling everything, I would have 4 tests:

  1. A control with a single region for all services in Chicago
  2. Scaling the app server to all regions while keeping the cache and DB in Chicago
  3. Scaling the app server and cache to all regions, while keeping the DB in Chicago
  4. Scaling all components to all regions

These tests were done with a load testing tool that runs a pattern of 1 POST followed by 5 GET requests. The tool I built called Loadr, and can be found here. Loadr ran at 50 requests per second as a target, and ran for up to 10000 requests per test. All tests were run from microVMs in the same three regions as "clients".

Results

There are more detailed images in this repository, but here are the final results for P50, P90, and P99 latencies from the client's perspective:

Client — 50th Percentile Latencies

Client — 90th Percentile Latencies

Client — 99th Percentile Latencies

Client — Minimum Latencies

Client — Maximum Latencies

Client — Average Latencies

There are a few interesting things here. One: Singapore performed poorly in the second test (scaled app server with centralized DB and cache), which is expected as any request that requires dynamic data will have a large round trip time between Singapore and Chicago. This will be significantly worse if there is a cache miss, since you then end up with 3 round trips between Singapore and Chicago per-request. Another item of note: the single region test did not have the lowest latency numbers. That honour goes to the full-scale tests. The less intuitive part of this is that while scaling every tier of the application had the best minimum latencies, the best average latencies belong to the single region tests. Looking into the detailed measure, the lowest values tend to come from GET requests. This is because both Turso (DB) and Upstash (Cache) forward all writes to the primary region in Chicago, while reads happen from the closest replica. Using a primary write location is common strategy to maintain data consistency at the cost of performance in read-heavy scenarios.

The results mirror my assumptions going into the experiment: for highly dynamic applications, single region deployments will be more effective on average vs multi-region deployments. One strategy often employed to get the best of both worlds is to ensure that all static content on the page is delivered using content delivery networks (CDNs), while dynamic content is sent from the single origin server. Another solution is to scale all three components without replication, allowing for multiple regions a user can select. This only works if there is no requirement for collaboration across regions.

Limitations

There are a number of limitations I do not account for in my research. In no particular order:

  1. Anycast DNS routing overhead
  2. Shared CPU noisy neighbour interference
  3. One client sending requests at a time
  4. Limitations of Turso and Upstash scaling (Primary vs Replica node behaviour)
  5. Used only a single workload due to time constrains: Request -> Cache Miss -> DB Access -> Response or Request -> Cache Hit -> Response

The shared CPU issue has the possibility of impacting performance meaningfully. I carefully monitored my instance and did not notice anything out of the ordinary during testing, but noting this as a risk is still valuable. More importantly, there were limitations to the extensiveness of my tests due to time constraints. Notably, the use of a single workload and third party scaling services meant the results are not representative of all possible workloads and architectures for international scaling. All of these limitations provide important context to better interpret the results.

Conclusion

While the results imply that it is almost always better to deploy to a single region, it is essential to take the contexts of the results into account. The way I chose to scale is representative of startups using off-the-shelf tools to scale without concerning themselves with the complexities of distributed infrastructure. In this context, I stand by the results. Other solutions include using CDNs to speed up the delivery of static content, which can often be enough to make a web application feel significantly faster. More aggressive caching schemes, local-first writes, and fully detached regions can all be valid strategies depending on your needs. All of this ignores the cost of running this sort of infrastructure, which is non-trivial for high throughput applications. Ultimately, this research showed that the field of globally distributed computing is complex. Choosing distributed infrastructure is not a guaranteed win, and requires careful consideration of your common workloads, user distributions, and overall requirements of your application.