Let's Talk About Webservers

@pmackay @maikel @RohanM

I’m wondering whether Unicorn is the right webserver for us. Reading up on it (here) it is notoriously slow for slow clients, “while downloading the request off the socket, Unicorn workers cannot accept any new connections, and that worker becomes unavailable.” This perhaps fits with something I noticed looking into the Order cycle page hanging, that one of the gems fires off a request to an external API and we see the whole page hang waiting for this API that often didn’t respond.

Perhaps in the UK we just need more workers as our servers are pretty overspecced for our load, we should be able to handle more than it seems we can.

Not sure how much work is involved in updating the webserver. Probably a lot.
But given that in the UK context I can really see us having a high number of slow clients we might want to look into this. Looks like a better solution for us might be Puma, which we can scale up to clustered when we grow to a size that warrants it…

Hi! That’s an interesting point.

But your quote about Unicorn being blocked on slow clients happens in a Unicorn-only setup. Since we use nginx in front of Unicorn, that should handle the slow requests. Same post:

You can use nginx in a custom setup to buffer requests to Unicorn, eliminating the slow-client issue.

So I guess that is what’s happening. Did you compare your queue time and application response time on New Relic? That should answer the question if there is a bottle neck in queue time.

The above statement is confirmed in the conclusion of that post:

Choose a multi-process web server with slow client protection and smart routing/pooling. Currently, your only choices are Puma (in clustered mode), Unicorn with an nginx frontend, or Phusion Passenger 5.

Puma would be an alternative solution, but our current setup should do as well. The remaining question is just if our setup is really doing the right thing. @RohanM, do you know?

Thanks for the tip @Maikel. Will take a look. Very interesting to get my head around the webserver side a little more.

I generally feel way out of my depth on this stuff so would be great if @pmackay could perhaps comment his thoughts too?

Quick test with siege, accessing Stroudco shop page:

2 repeats, 2 concurrent users:

Transactions:                  48 hits
Availability:              100.00 %
Elapsed time:               10.80 secs
Data transferred:            9.02 MB
Response time:                0.42 secs
Transaction rate:            4.44 trans/sec
Throughput:                0.84 MB/sec
Concurrency:                1.85
Successful transactions:          48
Failed transactions:               0
Longest transaction:            2.66
Shortest transaction:            0.09

2 repeats, 10 concurrent users:

Transactions:                 240 hits
Availability:              100.00 %
Elapsed time:               26.58 secs
Data transferred:           45.11 MB
Response time:                0.92 secs
Transaction rate:            9.03 trans/sec
Throughput:                1.70 MB/sec
Concurrency:                8.34
Successful transactions:         240
Failed transactions:               0
Longest transaction:           10.04
Shortest transaction:            0.09

10 repeats, 10 concurrent users:

Transactions:                1200 hits
Availability:              100.00 %
Elapsed time:              123.83 secs
Data transferred:          225.56 MB
Response time:                0.98 secs
Transaction rate:            9.69 trans/sec
Throughput:                1.82 MB/sec
Concurrency:                9.48
Successful transactions:        1200
Failed transactions:               0
Longest transaction:            8.98
Shortest transaction:            0.08

For me this suggests the server can be ~7s slower when there are more concurrent users. But that’s likely to be rare.

This is just commandline download though. When rendering the page Chome shows:

So there is a big chunk of time spent rendering and loading all the product pictures!

It would be useful to try this with Puma of course.

3 years later…

From what I read and with all the problems we are having regarding long requests to generate reports for example, I think we should switch to a multithreaded webserver: Puma…

Thoughts? @sauloperez @Matt-Yorkley @kristinalim @maikel

Something I said three years ago turned out to be eventually relevant :rofl::joy::laughing:

In my experience, reports are slow because they need to compute a lot. Our CPU is very busy and additional threads don’t give us more capacity. Multi-threading is useful if we have IO waits, for example if our SQL was efficient and it was only slow on reading the data form disk.

Threads can also save memory which could make our servers cheaper. But we should first make sure that our app is stable with many unicorn workers before we introduce many threads. Last time we went up to 7 workers, we got nasty errors on checkout.

In summary, I agree that Puma is the better solution but I see other work as more important and some of that work is a requirement before we can actually use Puma’s potential. Using preload_app with unicorn and increasing our worker count will take us a long way.

Thanks for the feedback Maikel.

I agree, we need to fix our multithreading issue https://github.com/openfoodfoundation/openfoodnetwork/pull/4136
But I dont think we should see multithreading as a cause of nasty issues.

“Multi-threading is useful if we have IO waits” including DB queries. If a worker is waiting for SQL no other requests will be processed meanwhile, correct? This problem is solved by threads.
I think that switching to puma will make requests being served much faster while slow requests are being processed.