Toward a roadmap for our API

Rachel · December 14, 2020, 8:57am

@lin_d_hop this is the best written discourse post one can read on a Monday morning. Hats off

In order to weight in this strategic decision, I need to understand a bit more about the use cases we want our API to resolve.

Can you and maybe @Kirsten describe some? I sensed in the other discussion that this was a burning topic for your instances, yet I’m not sure dogfooding will answer your most pressing cases. But maybe I’m wrong?

Maybe it is good to have those use cases in mind before joining the meeting? Or we can talk about them during it, either way is fine with me.

Matt-Yorkley · December 14, 2020, 11:00am

There’s some hot stuff on the API wishlist:

to maintain orders within an accounting system
to plan distribution routes
to create shoppers
to place orders
to extract data for packing lists
to extract shopper data for CRM tools and newsletters

Lets imagine we spend a year or more rebuilding the admin area as a separate React SPA. I think at the end of it we would have a huge pile of React code, but potentially we would barely have begun to cover anything from this list.

We might have also thrown away half of our test suite (~4300 tests) and re-implemented it in Javascript tests (which traditionally we haven’t been good at doing). There’s a lot of complexity and cost there.

We probably would have done a very small amount of work polishing our existing endpoints and adding a few extra API controllers over the course of that year or so. Maybe the equivalent of 1-2 weeks of dev time. We would not have touched the checkout at all and would have to prioritise that separately.

On the other hand; some of the bits on the list are basically papercut size (eg: “find Enterprise addresses” and “find order cycle details”) and we could literally do them now, without waiting a year or more. The relative scale and complexity is absolutely tiny.

Some of these wishlist items (eg: “extract data for packing lists” and “extract shopper data for CRM tools and newsletters”) are related to making all reports available on the API, which I think we are planning on doing anyway, right? It would be much easier to deliver that if we’re not rewriting the whole admin area at the same time. We could start doing it after Tax Reports and it should be fairly straightforward.

Rachel · December 14, 2020, 11:34am

To be honest I’m a bit worried of using making all reports available on the API as a unique solution for packing or tax or any of our reports problem.
I understand that sentence as something that develops only the API, but does not build any front-end to consume the data. Thus, the only way to use the data is to know how to develop scripts that will make calls to the API - which not all instance have.
Maybe something to move to Hub can easily generate accurate packing slips - #51 by Rachel or the improved reporting discussion…

Matt-Yorkley · December 14, 2020, 11:56am

It would make reports available on the API, available in the app (as they are now), and available for download in a range of spreadsheet formats (more than we offer now), plus additional UX like hide or show selected columns before exporting to a spreadsheet (to make printed copies nicer). It would also be much easier to add new reports that have all of the above features by default. I already wrote it ~6 months ago, it works really well.

luisramos0 · December 14, 2020, 11:45pm

Matt I think you are saying it will be ok to go with “Strategic Direction the Third” which includes Reactive Rails, REST API and DFC on the side. Right? You are saying you dont think this is too much to maintain.
I think that’s a valid option. Maybe not too complicated.

My position is based on long term strategy for OFN, in 5/10 years do we want to be a nice little ecommerce app with some reports on top and a few integrations? We dont need to make API the center of our ecosystem for that.
Or do we want to listen to our own mission of creating a new food system, a ecosystem of interconnected apps? In that case we need API to be our main product and start building things on top of it like our own backoffice and then off course the frontoffice (we need to have a checkout API, a react project to rebuild the frontoffice, future, would get you there) and then many more other things like discovery apps, mobile apps for farmers to do things (farmOS?), integrate with logistic systems, distribute data across instances (connect data from different instances), etc, etc.
So, I think dogfooding is relevant but it’s more than that here, it’s really about the strategic product decision to make API front and center of OFN.

There’s also the closely related monolith discussion, imo keeping the monolith is ok, long term planning to keep the monolith is not. And breaking the FE from the BE helps to then separate the BE in smaller parts as you grow in complexity.

Matt-Yorkley · December 15, 2020, 10:50am

I think keeping and improving our existing JSON API as something that’s generic and widely usable, and keeping DFC on the side would make the most sense, yes. We have 19 API controllers currently, and they’re looking pretty reasonable. We have 6 DFC controllers that return data for things like products, enterprises, “offers” and “persons”, with a very specific format (JSON-LD) and with a very specific ontology mapping that is not generic at all and has quite a few quirks.

So for example; if we started to look at adjustments/fees endpoints for accounting integrations, would that fit nicely into the DFC ontology? What kind of complexity would there be in making them fit the DFC model? I think creating those endpoints with vanilla JSON that maps to our own ontology would make the most sense in that case.

Matt-Yorkley · December 15, 2020, 11:01am

Seeing how most of our objects are Spree objects, we can get some pointers from Spree’s list of API controllers, which includes checkout. The issues in the repo are a great resource as well, as they include discussions around any problems that were encountered, with valuable insights.

Matt-Yorkley · December 15, 2020, 12:33pm

So in terms of what a roadmap might look like, I’d picture it like this:

Phase 1

1. Tidy up and prep

I had a look a look through our controllers and they’re looking pretty good. One thing that jumps out in terms of cleanup is extracting the pagination rendering logic so it’s defined in a single place where it can easily be re-used in any controllers we add later.

Update: the pagination cleanup got done over the holidays, partly by volunteer contributions

I’d add the standardising the JSON structure issue in here, and probably investigating switching from AMS to fast_jsonapi

2. Ensuring we have a clear standard with clear examples / guidelines

Plenty of work has been done on this before (thanks Luis! the wiki page is here for reference), but we could make it a bit more concrete and add some clearer examples so we don’t leave any room for ambiguity. If we’re:

sticking to REST
using ransack on index actions in a generic way
showing pagination output in a standardised way
authorizing actions with cancan in a standardised way
following the other guidelines laid out in the wiki

…then adding new endpoints or controllers by following the pre-defined template is basically a paint-by-numbers excercise.

3. Look at our existing API controllers and add some missing actions

We can go through them and it’s pretty obvious which bits are missing. The ShopsController needs an index endpoint, the CustomersController needs create and possibly delete endpoints, etc. These bits of work could be written up as issues and added to an API column in Zenhub, and they’d probably be simple enough that we could use the good-first-issue tag.

4. Flesh out some of our missing controllers

If we look first at Spree’s controllers we can see some really obvious bits we are missing and would definitely need. A ZonesController for example. We might not even need all of the actions there (do we want to allow deleting Zones via the API to begin with? Zones don’t really change and should only really be editable by superadmins anyway?). So maybe just a ZonesController with index and show actions. Zone objects are incredibly simple and basically public-readable. Returning them as JSON is not difficult.

We can list all of these obvious missing bits and create issues for them, and again with examples like the ZonesController it’s potentially papercut-size and could probably even be a good-first-issue.

Phase 2

If we went through the above steps we’d have a stack of say 15-20 very small, self-contained and clearly-scoped issues that we could start picking off and delivering pretty quickly. This would move us towards the point where pretty much all the basic CRUD operations that can be done in the admin UI can also be done in the API, and wouldn’t be much effort.

There are bigger projects we could also start to plan and gather use-cases for in the mean time, which might require a bit more brain power, such as:

creating API endpoints for the checkout (possibly based on Spree)
making reports data available through API endpoints
seeing what kind of requirements would be needed for more complex use-cases like accounting integrations or distribution planning apps and looking at how to meet those requirements

If we gather the details and clarify the requirements we can tackle these bigger projects one by one.

In terms of the bits in Phase 1 though, there’s no reason we couldn’t be delivering sizable chunks of it immediately, it’s really simple and it’s not blocked by anything.

Matt-Yorkley · December 31, 2020, 1:42pm

Standardising the structure of our JSON reponses

One thing that would be good to clear up is the consistency in the datastructures we return. There’s a widely used convention in putting all primary data within a top-level field called data:, but we don’t use it. It also means you can nicely put any metadata or secondary data alongside the primary records. We have some inconsistency in this respect.

When rendering multiple objects in a response, we sometimes render the array as the response itself, and sometimes render the array inside a top-level field with the name of the object (which is maybe not a great idea, and it means the field containing the data is different for every endpoint).

So for example, the output of the Api::CustomersController#index action (here) currently looks like this:

[
  <customer 1>,
  <customer 2>,
  <customer 3>
]

Whereas the Api::OrdersController#index action (here) currently looks like this:

{
  orders: [     # Resources in a top-level field (good) with a changeable name (bad).
    <order 1>,
    <order 2>,
    <order 3>
  ],
  pagination: <pagination data>  # Secondary data can be added (good)
}

So the convention would be to return primary records in a data field, and use that consistently for both single and multiple records, eg:

{
  data: <single object or array of objects>,
  pagination: <pagination data (for example)>
}

This means secondary metadata can be added in any response. It could be pagination data, or it could be other things. For example, if a record has linked records such as adjustments on a line item, it’s common to return some links to other API endpoints where those records can be accessed.

Also; successful responses should contain the data field, error responses contain an errors field, but responses never contain both. I think we already do this errors bit fairly consistently.

How does that sound?

lin_d_hop · January 8, 2021, 5:16pm

Hey folks,

I’d like to make a proposal for working through not just the technical side of this decision (which is obviously important) but the crucial product and business implications of this decision.

Step One: Understand our desired API Use Cases

This step has been started by the Product Circle and we’d like to invite the wider community to input if you have any thoughts.

To contribute go to this slide deck and copy the template then fill it in with your API desires.
We’ll then need to prioritise them and probably consider what timeline horizons we are working to for the different endpoints.

Step Two: Understand the Business Implications

This step will involve understanding our different potential API approaches and costing at least our first horizon use cases under each scenario. As I understand it we have three potential technical solutions that we are comparing:

Stick to the JSON API and use it ourselves between FE and BE. DFC consumes the JSON API.
Use a framework like Reactive Rails and move to the DFC API as our primary API
Use a framework like Reactive Rails stick to the JSON API, though we won’t consume between FE and BE. DFC will consume the JSON API.

Somehow we need to come up with a way to cost:

Rewriting the FE under each scenario
Delivering the prioritised API endpoints under each scenario
Things we do and don’t get for free under each scenario
Any implications to scaling under each scenario
Additional overheads to recruiting and team structure under each scenario.

Obviously we’re not going to come up with accurate figures but if we can ballpark financial implications it will give us some really useful insights into these decisions.

Step Three: Decision Time

With this extra information we should be able to make a well informed decision.

lin_d_hop · January 15, 2021, 10:54am

API Strategy Meeting

8pm GMT on Thursday Jan 21st.

Thanks to everyone that has signed up to this meeting. The topic we are discussing is quite complex and very deeply technical so I am going to write some notes of preparation here. In this meeting we simply won’t have time to explain everything from first principles to everyone attending. I will propose at the end of this post ideas for getting basic tech understanding up to speed in advance.

What this meeting is not…

This meeting will not be a place in which we prioritise which API endpoints or functions we will work on in the coming months. We will not be looking at the different things we want from our API and putting them into the work plan. While there will be some space in which we will choose important endpoints, the goal of this exercise is to gain a snapshot of the cost of implementation of 4 different API strategies. It is not about choosing endpoints that we will implement.

This meeting will not be a recap of all API conversations that have happened so far. While we’ll need to do a lot of discussing on this highly technical topic I beg everyone attending to take the time to read this post and, in fact, this whole thread. Also take a look at this thread for a little more context.

This meeting will not be about making decisions. Period. We are not choosing which API strategy. We are not choosing which front end framework. We are simply gathering information about the Ease and Value of the four different approaches across 3 high level questions and 6 endpoint examples. Ease is a metric we can use to guess the cost of implementation. Value is a metric we can use to estimate benefit to users and opportunities unlocked.

This is not a place to campaign for your preferred options. By far the most value will come from this meeting if anyone with favourites puts aside their babies and tries to think openly and expansively about the implications of each of the four strategies.

The Four API Strategies

In this meeting we will explore four different potential strategies for our API with the goal estimating the ease (ie cost of implementation) and value (ie to users/potential users) across a range of questions.

The last post says three, but I have added a forth for completeness. The four potential API Strategies are:

1. Frontend consumes our API for the data. Continue using and prioritise extending the existing JSON API. The DFC consumes the JSON API.

This approach is most similar to what we are doing now, however we will become more disciplined if we make this a clear strategy. Being more disciplined will likely mean we use a frontend framework like React meaning that the front end of our application becomes completely separate from the backend of our application. Our API will therefore offer everything that the FE of the application requires. When there are other requirements from users then we will extend our API to offer them. Under this scenario it would be wise for the DFC to sit on top of the JSON API, such that data consistency is easier to maintain.

Pros

As this approach is most similar to what we are already doing we have a BIG headstart here.
The little bit of functionality that has already been built consuming our API external to the app can continue to be used
We will be dogfooding our API meaning that it will be heavily used, tried and tested.

Cons

The DFC is a side product rather than our primary API
We have quite a bit of legwork to update our API to enable our FE rewrite.

2. Frontend consumes our API for the data. The DFC becomes our primary API and all OFN functions are served over the DFC

Before any technical people read this and judge, I am including for completeness after it was raised as a question in a meeting. Under this scenario we migrate away from our existing JSON API and use the DFC to serve all OFN requests and responses. We will therefore offer everything that the FE of the application requires by converting to the DFC data structure and then back again to the OFN FE.

Pros

The DFC is our primary API
We will be dogfooding our API meaning that it will be heavily used, tried and tested.

Cons

The OFN data models are very specific to OFN so converting to the DFC structure and then back to OFN for every call between FE and BE might be prohibitively onerous (financially and computationally)

3. Use a full stack framework and don’t consume our API between FE & BE. Keep, extend and streamline the JSON API. DFC consumes JSON API.

Under this scenario we keep our existing JSON API as our primary API, though we will not be using it ourselves. This will mean that many of the existing endpoints be unused as we stop consuming them between FE and BE, and we’ll soon be streamlining and refactoring the API to suit our needs. The DFC will (at least in the short term) consume the OFN API.

Pros

We continue to use our existing API which has a lot of functionality already
The little bit of functionality that has already been built consuming our API external to the app can continue to be used

Cons

We won’t be dogfooding our own API a higher chance of unused endpoints failing to be maintained.
We’ll be maintaining two APIs, meaning bugs, maintenance etc on both API products

4. Use a full stack framework and don’t consume our API between FE & BE. Rewrite our API to follow the DFC standard.

Under this scenario we will no longer use our API between the FE and BE of the app and our strategy will be to abandon the existing JSON API. We will focus API development on creating our prioritised endpoints to the DFC standard.

Pros

Our API will be fully DFC compliant. We’ll only be maintaining one API product
Developing based on our own needs means we’ll continue to dogfood in a way - externally to the app but through the scripts and tools we build from our API in meeting the needs of our users.

Cons

Any existing API usage will be phased out and replaced by the new API

The Meeting Structure

We’ll be asking the following questions:
Under each API Scenario rank them in terms of ease and value (using an ease value matrix) in terms of:

Rewriting our frontend
Growing the delivery/dev team
Scaling the application
Implementing six selected endpoints

We’ll be doing this in Miro and you can see the facilitation board (WIP) here.

Some final notes:

I want to be clear that I do not have a preference at this stage. For transparency, there are two options that I see as less viable and two options that I see as more viable, but I am genuinely interested in seeing how each of the options plays out against each other. I have volunteered to facilitate this process. If there is anyone that feels uncomfortable with me facilitating please do say and we can explore if someone else might be better.

In this meeting I want us to limit our engagement with the specific FE frameworks we might use. I understand these topics are inextricably linked. However I intend to keep us away from debating FE frameworks and focus on the API. We have one question in the process in which we look at the ease/value of rewriting our frontend under each API strategy. This should cover most of the thinking that is specific to FE frameworks.

As we won’t have much time to go into the technical detail in this meeting I invite folks to DM me on Slack if you would like a chat about the technical detail in advance. If there are a few people interested I will aim to organise a prep session with those folks and a dev. So please get in touch if this is you.

Finally, if you want to share other pros or cons at this stage for the 4 methods, go for it. But please try to remain impartial. It might not always be easy - but it won’t help our process by just coming out and saying all the reasons one option is bad/good. I’m really trying to lay the groundwork for an unbiased, unemotional meeting because this is an absurdly emotional topic. So please please please I would so very much appreciate if you could join me in this quest to be unemotional

Sorry for the long post. Thanks for reading

luisramos0 · January 18, 2021, 11:58pm

Matt, to your data structure comment: yes
I think we can follow json-api or something similar: https://jsonapi.org/

Matt-Yorkley · January 20, 2021, 9:18pm

Yeah, that’s was I was thinking of. Maybe we don’t need to use the whole specification, but we can at least take some sensible tips from it in terms of standardizing the output a bit more.

We’re in a good position currently where we don’t currently have many clients, so we could take advantage of that to make some of these smaller structural improvements now without causing any big problems.

But that leads to the important question: who is actually using the API currently, and which parts are they using?

Matt-Yorkley · January 20, 2021, 9:30pm

In terms of the other choices, I don’t really understand the “DFC consumes the JSON API” option…? What would that actually look like?

luisramos0 · January 20, 2021, 9:43pm

DFC is also json, json-ld so for clarity I’d call our current API, the REST api

There are a few options for that dfc-rest connection. I think the most important thing is to avoid having both apis going for the models/DB, that will be painful and error prone. We should try to have some form of abstraction layer to avoid this… at service level, at controller level or at api level.

Matt-Yorkley · January 20, 2021, 9:51pm

Ok, that makes sense

Matt-Yorkley · January 20, 2021, 9:54pm

Another thought that comes to mind: I think it’s common practice to pre-emptively put APIs under a /api/v1 namespace (in terms of routes and directories). I’m wondering if it’s worth doing that now, or if we can just deal with it later if we encounter a situation where we need to make breaking changes?

lin_d_hop · January 21, 2021, 11:03am

The current API is not heavily used externally.
Orders/[id] endpoint is used by a couple of UK enterprises to integrate with an accounting package via Zapier. I believe FR and AU have also implemented the Zapier integration but I am not sure if it is used.
@Rachel @Kirsten @sean may all wish to comment?

Other than this any use of the OFN API has been done outside of the knowledge of the core team and we have made no commitments to officially support the API. Hence its a great time to get clear on our strategy

I agree that adding a versioning to the API would be a good strategic step to take as we prepare to officially supporting API endpoints.

Rachel · January 21, 2021, 11:18am

No API use in FR.

100% aligned.

Matt-Yorkley · January 21, 2021, 1:26pm

REST vs DFC

Thinking about the proposed options where we totally scrap all of our current API endpoints and rewrite everything in the DFC ontology and JSON-LD: as a thought experiment I imaged the task of adding a Api::ZoneController to the API so that clients can get some basic info on tax zones for an instance. I decided to look through the DFC documentation to try to figure out what that would look like in the DFC format. It was pretty brutal, I gave up reading the docs after a while.

So… if I was going to sit down and write that simple controller in a vanilla JSON/REST format like we do now, I think it’d take about 15 minutes. If I had to write that same controller conforming to the DFC ontology and the JSON-LD format, I think it might take me something like 5-10 hours, most of which would be reading dense technical documentation and scratching my head. In terms of developer velocity for the short-term expansion and long-term maintenance of the API, I think it would be like comparing snails to sports cars.