scaling a web application from first principles

inspired by the production engineering hackathon i attended a few months ago, i decided to take a look at one of my old group projects and see if there were any improvements i could make to the application.

after deploying the application, the first thing i noticed was that the graphs would take a couple of seconds to load. to investigate this properly, i planned a load test using k6, a load testing tool from grafana labs that can simulate virtual users.

before running the tests, i want to improve the observability of the spring boot api by adding structured json logging so that errors, request timing, and status codes are easier to inspect.

load testing plan

my current load test plan is to see if the application can handle up to 40 virtual users concurrently, with the number of users slowly ramping up.

30 seconds: grow from 0 to 5 users 1 minute: grow from 5 to 10 users 1 minute: grow from 10 to 20 users 1 minute: grow from 20 to 40 users 30 seconds: ramp back down to 0 users

each virtual user will do the following:

pick an endpoint to hit
send an HTTP request
check if the response was successful
wait between 0.5s and 2.5s to simulate a person looking at a graph
repeat

metrics

since i am currently focused on improving the user experience of the application, latency is the most important metric i want to record.

p95: the latency that 95% of requests are at or below
p99: the tail latency, which helps show the worst-case experience for slower requests

i also counted the number of HTTP status codes returned during the test to get a more detailed view of how requests are succeeding or failing.