1
0
Fork 0

WIP

Atridad Lahiji 2024-12-13 01:44:00 +00:00
commit 015c81aa3e

46
Home.md Normal file

@ -0,0 +1,46 @@
# CMPT 815 Term Project by Atridad Lahiji - Performance Analysis of Multi-Region Web Application Distribution Strategies
## Background
In this project, I investigate the performance characteristics of running web applications in a distributed manner. The core of the issue here is the idea that physical distance in networking introduces the issue of latency, which in this context is the time taken to transmit or receive data from a web server. The web server here, in many cases, tends to live in North America, and more specifically, in the USA. This poses an issue for users on the other side of the world. For instance, the experience for a user in New Zealand will be significantly worse than a user in Canada when accessing services hosted in common locations such as US-WEST-1 via AWS. This can be seen in a more concrete example, where it can take nearly a second to load my [personal website](https://atri.dad) from western Canada, since it is hosted in Germany. This inequality of the web is something that I took to investigate. Distributed systems over the globe is not new in practice. The goal here was to thoroughly test the performance characteristics of these systems once you scale out all of its moving pieces.
## Methodology
Before proceeding, we need a model of a "traditional" web application. The architecture I propose here is common and quite simple. There will be three layers:
1. A web server to process requests
2. A Database to store persistent data
3. An in-memory cache
In this model, I chose Go for the web server implementation, since it has HTTP web server primitives baked into its standard library, and is quite permanent. The hosting make things interesting: I needed a way to scale the same app to different regions. Fly.io ended up being the choice, since their firecracker microVM app platform can scale to 35 different regions. I chose three regions here to get a good spread around the world:
1. ord: Chicago
2. fra: Frankfurt
3. sin: Singapore
The rest of the components came easy. For the database and cache I picked Turso SQLite and Upstash Redis as they both distribute on the same infrastructure as Fly.io. This trade-off means that while the scaling is easy to orchestrate on my end, I am subject to the way they implemented scaling on their end. This is something we will explore in the results.
The strategy I chose was one of incremental distribution. Instead of scaling everything, I would have 4 tests:
1. A control with a single region for all services in Chicago
2. Scaling the app server to all regions while keeping the cache and DB in Chicago
3. Scaling the app server and cache to all regions, while keeping the DB in Chicago
4. Scaling all components to all regions
These tests were done with a load testing tool that runs a pattern of 1 POST followed by 5 GET requests. The tool I built called Loadr, and can be found [here](https://git.atri.dad/atridad/loadr). Loadr ran at 50 requests per second as a target, and ran for up to 10000 requests per test. All tests were run from microVMs in the same three regions as "clients".
There are more detailed images in this repository, but here are the final results for P50, P90, and P99 latencies from the client's perspective:
![Client - 50th Percentile Latencies](https://admin.s3.atri.dad/api/v1/buckets/personal/objects/download?preview=true&prefix=cmpt815perf%2Fresults%2Fclient-50-percent.png&version_id=null)
![Client - 90th Percentile Latencies](https://admin.s3.atri.dad/api/v1/buckets/personal/objects/download?preview=true&prefix=cmpt815perf%2Fresults%2Fclient-90-percent.png&version_id=null)
![Client - 99th Percentile Latencies](https://admin.s3.atri.dad/api/v1/buckets/personal/objects/download?preview=true&prefix=cmpt815perf%2Fresults%2Fclient-99-percent.png&version_id=null)
![Client - Minimum Latencies](https://admin.s3.atri.dad/api/v1/buckets/personal/objects/download?preview=true&prefix=cmpt815perf%2Fresults%2Fclient-min.png&version_id=null)
![Client - Maximum Latencies](https://admin.s3.atri.dad/api/v1/buckets/personal/objects/download?preview=true&prefix=cmpt815perf%2Fresults%2Fclient-max.png&version_id=null)
![Client - Average Latencies](https://admin.s3.atri.dad/api/v1/buckets/personal/objects/download?preview=true&prefix=cmpt815perf%2Fresults%2Fclient-avg.png&version_id=null)
There are a few interesting things here.