Big user Infrastructure Estimate

RohanM · January 13, 2017, 12:37am

Estimating the infrastructure requirements for big user in France

Calculating the anticipated load

For a pilot city of 200,000 people, we estimate 556 clients per day.
For two pilot cities of that size, 1,112 clients per day.
For the whole of France, 194,444 clients per day.

A model food hub in Melbourne serves 11,647 page views to 1,543 users in one day (7.5 views per user).
In the peak hour of this day, they serve 1,611 page views to 201 users (8 views per user).

Within a day, they serve 11,647 / 1,611 = 14% of page views within the peak hour.

Two pilot cities (1,112 users per day)

We expect to serve 1,112 * 8 = 8,896 page views per day.
14% in peak hour = 1,245 page views, or 21 requests per minute (RPM)

Whole of France (194,444 users per day)

We expect to serve 194,444 * 8 = 1,555,552 page views
14% in peak hour = 217,777 page views, or 3,630 RPM

Calculating the OFN server capability

By average response time

OFN Australia shows an average response time of approx 700ms. A crucial part of the work would be lowering this to provide a more responsive site to customers as well as to improve scalability. However, let’s use the current figure to generate a pessimistic estimate.

We have three web server workers, so the server can serve three concurrent requests.
60 / 0.7 x 3 = 257 RPM

By load test

Load testing the Australian production server remotely shows a real-world response rate of 186 RPM.

To take our estimates forward, I’ll use the slower of these estimation methods (186 RPM).
However, I’d expect that with performance optimisation work, we could greatly reduce the average response time.

Calculating the server requirements

All figures calculated with servers running at 75% load at peak to provide some headroom.

Costing

Prices taken from Amazon Web Services, excluding data charges:
Load balancer: US$0.0294 / hour, $21.46 / month
c4.xlarge server (web, db): US$0.237 / hour, $173.01 / month

Two pilot cities

Our peak of 21 RPM is well within the server capability of 186 RPM (11% of total capacity).
In theory we could run this pilot off a single server.
However, I recommend running two web servers and two database servers. That provides us with redundancy in case one server fails and prepares our infrastructure to scale in the future.

Load balancer: $21.46 / month
4 servers (2 x web, 2 x DB): 4 x $173.01 = $692.04 / month
Total: $713.50

Whole of France

Our peak of 3,630 RPM could be served by 26 web servers (running at 75% capacity).
As a starting point, I propose the following architecture:

1 load balancer
26 web servers
2 database servers (the second as a hot standby)
1 delayed job worker

Load balancer: $21.46 / month
29 servers: $4498.26 / month
Total: $4519.72 / month

Notes

As noted above, we can get much greater performance out of one server (perhaps 2x) by working on optimising the application.

In the above estimate, I haven’t addressed database scaling. It’s likely that we’ll need far fewer web servers and far more resources dedicated to the database. However, this estimate serves as a starting point to examine the infrastructure cost of the full rollout of the project.

The cost has been based on AWS hosting in Australian Dollars. They differ between geographic regions and the hosting provider of your choice may charge more. Since this is a project supporting local infrastructure and sustainability, a French hosting company using renewable energy might be consideration.

pmackay · January 13, 2017, 8:33am

Really interesting detailed breakdown @RohanM! Just wondering, do you use a CDN in Aus? Could that have be a factor to consider in these calculations? Or (given the national nature of OFN installs) do you think a CDN doesnt offer much beyond what the web server(s) do?

Kirsten · January 13, 2017, 9:21am

ping @MyriamBoure to pass on

MyriamBoure · January 16, 2017, 10:28am

Ping @gnollet and @sylvain, also @pierredelacroix and @nickwhite. Careful those prices are in AUD.
As a precision, the client will pay for the infrastructure he needs for his super ambitious project, and contribute also that way to the local common, as this infrastructure will be open for other hubs to use it. I will be managed by Open Food France, so we need to find tech partners to help us in that.

Feel free to comment if you have questions or disagree with the above estimation made by Rohan. I would like to send that first estimate to our potential big user by the end of the week.

@sylvain feel free to invite any other interesting participants to that reflexion. What is for you the best French/european hosting company using renewable energy?
I found https://www.greenshift.eu/en/index.html
It seems OVH works also in that direction https://www.ovh.com/us/about-us/green-it.xml
I’m just careful about the hosting company that pay “compensation”, it would be great to find a hosting company really working with carbon neutral servers farms…

We also need to add to those cost some system administration cost.
Would like also to hear your proposals / ideas @gnollet and @sylvain?
Any idea on how much it cost to have a system administrator available in case of any problem on the server, and able to do quickly the upgrades when needed, etc.? It’s not a full time job, more punctual, but we need some reactivity in case there is a strong problem, to ensure this “zero service interruption and zero data loss”. Of course the multiple servers strategy will help, but what do you think in terms of system administrator role and cost?
Ping @RohanM on that point.

I know HappyDev might propose us something, but it’s important also that we compare different options to understand what is at stake and make a conscious choice, if that project move forward (finger crossed!)

gnollet · January 16, 2017, 9:37pm

Hi all

Thanks for all these informations.

With this type of architecture, what are the objective of availability ? what are the RPO and RTO ?

Are we able to built elastic architecture to follow the load of the website by provisionning server on demand ?

We should include cost for staging servers.

Regarding the design of the infrastructure, where is the bottleneck when the load is increasing ? Is it the database, the front web application or the back office ?
Memcached is installed, is this for sessions storage ? Are we able to replace it by high availability solution ? like Redis or other redundant solution ?

Are we able to use memory cache storage instead of disk storage ? Actually the cache storage is there :
current/tmp/cache
If we have more than one web server, we must share this storage with performance effect. If this storage is in memory ans centralized, this will increase performance.
In any case, 26 Web servers to respond to 200 000 visits per day looks very much.
Per your estimation, should we include load generated by bots (like google, bing, …) and by visitors who don’t order ?

gnollet · January 17, 2017, 8:20am

Hi,

I have other questions
Is there any impact on load if the website is hosting 100 / 1000 / 10 000 producers ?
Is there any impact on load if the website is hosting 100 / 1000 / 10 000 hubs ?

Are we able to think on architecture with on one side the front end for customers, where we should be able to store on cache maximum of informations, like hubs and producers informations. And on other side, the back end part of website for hubs and producers management ? And the synchronization of the 2 parts will be done on database and file share storage.
If we are able to split the website on two parts, we should be able to better scale out and then handle the load. And then if the backend is loading too much the servers, there should not impact the front end.

MyriamBoure · January 17, 2017, 8:58pm

Just a precision (important), that simulation was made with the first figures I sent but I sent just after new figures from the potential user which are much more reasonable… but the reasoning is here, just figures to adapt.

City A year 1 full :

#orders per week : 340
#producers : 40
#buyers: 1 240

City B year 1 full :

#orders per week : 180
#producers : 40
#buyers: 680

France in 2025 :

#orders per week : 85 300
#producers : 4 300
#buyers: 233 300
#cities: 130
#preparation halls: 50

MyriamBoure · January 17, 2017, 9:07pm

ping @sylvain on our discussion on redundancy

MyriamBoure · January 17, 2017, 10:10pm

@RohanM it seems @gnollet has pointed some ways to improve the performance of the application.
Gilles we know we need to optimize the application, but we will need to start from where we are now, so this estimate is based on the current situation of the code. As Rohan said work on optimization will need to be done in parallel and probably when we are at the point to scale, the performance of the code base will be much higher.

@RohanM Gilles asks an important question about the RPO (how much data the client accept to loos) and RTO (what is the max time of unavailability of the service). In your estimation, what is your assumption on that?

Also on the point of the staging server, we will need to be able to test in real condition the new features before pushing them to production. Is it possible to pay for some new servers “on demand” for some performance testing. So we would have one staging server to test the functionalities and before pushing to production, we would deploy on a temporary “production like” instance on which we would run scripts to ensure there is no performance degradation. There might be some costs associated with developing those scripts and running the test.

@RohanM feel free to answer other questions from @gnollet that could help our common understanding

Also with the last estimation we received from the potential user and that I added here just above (I sent it the same day but apparently your estimation is based on the first estimation I did myself which was faaaar enthusiastic, but I hope they will be overloaded with demands ;-)), it seems the scalability will be much more reasonable, but I’ll compile all the info for the client to have a general overview and understanding of the possibilities.

So if I follow the same reasoning with the last version of the figures I get:

Two pilot cities (87 orders per day)
We expect to serve 87 * 8 = 696 page views per day.
14% in peak hour = 100 page views, or 1.6 requests per minute (RPM) which is no issue, but we can still propose the redundancy configuration with 2 + 2 to minimize RPO and RTO (if I understand well)
Whole of France (14217 orders per day)
We expect to serve 14217 * 8 = 113,736 page views
14% in peak hour = 15,923 page views, or 265 RPM so probably with 6 servers it should be enough…

maikel · January 19, 2017, 5:30am

Yes, we use AWS S3. I don’t have the cost here, but it is not much. Maybe less than 30 AUD per month?

maikel · January 19, 2017, 5:37am

For the staging server, you can probably add 4 servers if you want to test real conditions. But all this is really just a rough estimate.

gnollet · January 24, 2017, 8:00am

Hi,
I draw what I expect to propose for large customers and to allow traffic increase :
Visio-OFN_evolution.pdf (19.4 KB)

pmackay · January 24, 2017, 11:05am

S3 is not the same as a CDN, the AWS CDN is CloudFront. But in the OFN case they do pretty similar jobs and given OFN installs are per country, I suspect a CDN is not hugely useful, but wondered if you had ever considered it.

oeoeaio · January 24, 2017, 10:33pm

I have been looking into Cloudfront over the last couple of days for another project. I suspect that in most cases the improvement in speed for most user-generated images (what we use S3 for) will not be huge, given that (as you say) OFN instances are per country, and are likely to be using S3 buckets located close by anyway. I guess the biggest impact would be seen somewhere like the US: a large country with a distributed population and lots of different edge location options that Cloudfront can utilise.

In Australia, I think Amazon only have edge locations in Melbourne and Sydney. Our S3 bucket is hosted in Sydney, so there may be a slight bump for Melbourne-based users but I wouldn’t anticipate it being dramatic. Definitely worth experimenting with though.

May also be worth caching static assets (js, css, some images) using Cloudfront, to remove the job of serving these from our servers, that impact may well be significant.

MyriamBoure · January 30, 2017, 10:29pm

Any comment @maikel or @oeoeaio or @RohanM on @gnollet representation of the infrastructure? This is chinese for me, I don’t know what those 3 pages represent, can you explain a bit @gnollet?

MyriamBoure · February 3, 2017, 3:13pm

I’m not sure I understand, but are those drawing of the OFN architecture relevant? @gnollet made that and I’m wondering for my own understanding how does that sounds to you @RohanM @maikel @oeoeaio or @pmackay
Maybe that can help me to understand

Option 1:

Option 2:

Option 3:

MyriamBoure · February 3, 2017, 4:11pm

@maikel when you talked about 4 servers here, were in in the 29 server configuration? Or the most realistic
"Whole of France (14217 orders per day)
We expect to serve 14217 * 8 = 113,736 page views
14% in peak hour = 15,923 page views, or 265 RPM so 4 servers + delayed job worker"
Just to make sure Thanks