State of performance

Last week I was tasked to find out whether the Orders & Fulfillments report improved its performance with the latest changes. It was probably too late for my brain, I got carried away and ended up checking the impact of all our peformance related changes in 2019 outlined in Global Gathering 2019 - Day 2 - Performance action plan :see_no_evil:. There we were aiming for 3-5s to load pages.

I found interesting insights that are worth sharing with you all. This is just an evaluation of the status of performance. By no means, I pretend to decide what our priorities should be. Just pointing out facts to fuel future discussions.

I set up a dashboard for the occasion with graphs of what I explain below and I saved some queries as well. Feel free to check them out (if you have a Datadog account).

Public pages

Although there is a tangible improvement from the crazy situation we reached last summer. While with /map and /groups we reached our goals, /shops and /producers are much worse.

Most of the UKā€™s userbase experiences response times around 11s-14s for /shops and very occasionally things skyrocket exceeding the 25s.

The situation is much worse for /producers where almost no one gets the page in less than 20s in the UK, with sporadic outliers that reach 45s.

This must be impacting conversions in some way and it is worth tracking from Matomo IMO. Fortunately, @Rachel already started working on this.

All this makes sense and clearly shows why pagination was a good decision. Paginated resources are predictable, not paginated ones arenā€™t.

Source: https://app.datadoghq.com/dashboard/k9e-9cg-iur/performance-priorities?from_ts=1580480646248&live=true&tile_size=m&to_ts=1581085446248&fullscreen_widget=4901779299553959&fullscreen_section=overview

Reports

We did a very job in reports too, considering that reports are always naturally a bit slow as they involve processing chunks of data, but thereā€™s no surprise here. It is not rare for users to get Orders & fulfillment and Orders & distributors after 1min and sometimes 2.

Good news about the infamous Orders & fulfillments though. 75% of UK users get their report in 10s or less! :tada: More details in issue #4131.

Source: https://app.datadoghq.com/dashboard/k9e-9cg-iur/performance-priorities?from_ts=1580480646248&live=true&tile_size=m&to_ts=1581085446248&fullscreen_widget=6247972829137890&fullscreen_section=overview

Source: https://app.datadoghq.com/apm/resource/rails/rack.request/615e231901d77aef?end=1581086928055&env=production&index=apm-search&paused=false&start=1578494928055

Checkout

The checkout page performs reasonably well, most of the time people get it in 6s or less but there are still outliers every week up to 14s.

The process of placing the order, the checkout confirmation, performs a little worse than the checkout page rendering, but itā€™s slightly more predictable as its outliers rarely reach the 11s.

These spikes are somewhat critical for a marketplace if you ask me.

Admin: enterprises, products and OCs

Managers must be having a good experience listing enterprises, products and order cycles from /admin. 90% of their request are answered in 2s or less! but there are totally nuts outliers that are for sure related to superadmin usage.

I havenā€™t dug much deeper on product creation and update operations sorry. As for the new order cycle flow, I didnā€™t either but I can investigate that if you think is worth. Could use your help on that @luisramos0 .

What I did found out is that OC cloning is exceptionally slow sometimes (I havenā€™t checked how often that happens but I could) and it is a very common thing to do in OFN, I think. Might be worth tracking it to see its impact on business metrics.

Source: https://app.datadoghq.com/logs/analytics?agg_m=%40response_time&agg_q=%40http.url_details.path&agg_t=avg&cols=core_host%2Ccore_service%2Clog_response_time&from_ts=1578495656485&index=main&live=true&messageDisplay=inline&query=%40http.url_details.path%3A\%2Fadmin\%2Forder_cycles\%2F*\%2F*+-host%3Aapp.katuma.org&saved_view=77663&step=auto&stream_sort=desc&to_ts=1581087656485&top_n=10&viz=toplist

Please ask

And my report ends here. I want to keep things short. Please, share your ideas and let me know if there is something that needs to be clarified or you want me to dig deeper. Donā€™t hestitate to question my conclusions either. I could be wrong.

2 Likes

Nice report, thanks a lot Pau!

I think this work needs to be done with specific metrics collected in very specific ways. We only need to define them once and they will guide us. I think itā€™s a really good moment to do this work!

  1. Tool - shall we use datadog or matomo for the specific metrics? Matomo is probably better.
  2. List of pages - we can have a simple list of pages, can be the list you use in this post
  3. The metrics - for example we can use a single metric for each page: ā€œaverage value between UK and AUS of the 90pc avg load timeā€, month on month

This would give us a table like this:
Page ----- Avg Time Jun2019 ---- Avg Time Dec2019 ---- Avg Time Jun2020 ---- Avg Time Dec2020
/shops --------------- 23.5s ---------------------- 25s ------------------------ 20s ------------------------ 15s
/producers ---------- 33.5s ---------------------- 45s ------------------------ 23s ------------------------ 11s
/checkout ----------- 8s --------------------------- 7s ------------------------- 5s -------------------------- 3s
/report O/F --------- 25s ------------------------- 20s ------------------------ 20s ------------------------ 15s

After we have this we can measure, track and improve.

I think this is crucial for product improvement (we can decide to invest on improving the performance) but also for communications (we know and tell what exactly is the state of affairs).

I am only describing what I have seen working in some companies.