Subscriptions and customers

I’m working on a bug that causes subscriptions to fail. The problem is a conflict between guest customer records and registered user customer records.

  • Perrine orders from a shop as a guest. A customer record is created.
  • The shop manager creates subscription with the current customer record. This record is a guest record.
  • 4 subscription orders are successful. Everything works.
  • Perrine signs up for a user account and confirms their email address. The customer record is not updated and is still a guest record.
  • When Perrine tries to buy more as guest, the email address is already in use. Perrine has to log in to order from the shop.
  • The subscription tries to create the fifth order. As usual it tries to create a guest order. But ordering as guest is not allowed if there is a registered user with the same email address. The order fails and remains in the cart state.

This is a design bug and therefore hard to solve. We discussed several solutions, but I’m still not sure what the best technical solution is. The best we came up with so far: When a guest creates a user account, all customer records with the same email address are associated to that new user. That would change several things:

  • All anonymously created guest orders with the same email address are associated to the new user account.
  • All subscriptions created for that email address are associated to the new user account.

That sounds okay. There are privacy concerns, because we don’t really know who created the guest orders. So we may need to put some effort into hiding these guest orders from the user’s history (but that might already be covered). There is also some concern about email addresses changing their owners, but we can ignore that for now. Dealing with that is a whole new feature that would affect more parts of the OFN. There is also the idea to get rid of guest checkouts which would make all this obsolete. But I see that as a separate discussion as well.

I’m writing this up now, because I see part of the problem in the underlying data structure and I don’t want to make it worse. I’m wondering if there is an opportunity here to improve it. These are the basic relationships for this scenario:

There is some redundant data:

  • An order belongs to a shop directly and through a customer record.
  • A subscription belongs to a shop directly and through a customer record.
  • A user, a customer and an order all have an email address saved.

I think, it’s okay that an order has an email and a user id. It also has all the users data like their address, because it acts as an artefact that doesn’t change once it’s finalised (in theory). It documents the email address and user id at the time. But the distributor_id is redundant since we have customer records.

Customer records have a email address and a user id, because they can be a guest customer or a registered customer. They are polymorph. Guest customers don’t have a user id and registered customers have a redundant email address that could be deleted. They inherit this polymorphism from orders.

Subscriptions use this polymorphism. They don’t care if they are associated to a guest or a registered customer. But the association to a customer actually makes their own shop_id obsolete.

The reason for this data model is historic. The customer model is fairly new. I believe it was created for several reasons:

  1. It enables tags to be used on a user-enterprise relationship. For example certain customers can be labeled VIP and this label can be used to make a certain payment method available (e.g. bank transfer for trusted users).
  2. It simplifies the subscriptions model.
  3. It enables permissions to be set on the user-enterprise relationship (e.g. allow the shop to charge the user’s credit card).

When introducing the customer record, Rob was wondering if that makes some other database fields obsolete, but we didn’t want to change the database too much. We first wanted to see how the customer model evolves. So now is the time to review and see if we can improve it.

Possible solutions

  • The simplest way to fix the current problem is to change customer records during sign-up. All customer records with the same email address are associated to the new user. What could possibly go wrong? Could it happen that there is a customer record with the same email address, but already associated to a different user? Maybe if users get soft-deleted and people can sign up with the same email again. We should ignore those records.
  • We could also get rid of the user_id in the customer record and link to the user by the email address. That would remove some redundancy and chances of inconsistent data. We also don’t need to manually associate new accounts, they would be found automatically. What is the performance impact of using a string column as search index instead of an integer?

Sidetrack: string vs integer

In this case varchar(255) vs integer. Strings take up more space and the theoretical search space is bigger as well. But we have an index for id and email of users on disk already. And the number of users (the actual search space) is the same for both indexes. I rand a quick test query on Aus production:

select id from spree_users where id = 4378;
> 0.639 ms, 0.667 ms, 0.617 ms, 0.493 ms

select id from spree_users where email = 'maikel@openfoodnet...';
> 0.429 ms, 0.658 ms, 0.521 ms, 0.632 ms

I don’t think that is something to worry about.

Additional optimisations

  • We can remove the distributor_id from the order model, because we have a customer_id now. This further reduces the risk of inconsistent data. It also removes one of our customisations of a Spree model. It could make some database queries less efficient though. I’m not sure how often we use enterprise.orders which would then go via customers.
  • We can remove the shop_id on subscriptions with the same rational.

Discussion

  • Do you have better ideas how to solve this problem?
  • Which of the proposed solutions would you prefer and why?
  • Where do you see risks? Which parts should we investigate first or test really carefully?

Thank you.

Hey Maikel, great write up!

Here are some thoughts:

I think the customer model is the problem.
As you say, customer is an association between enterprise and user, I add, not directly related to the order. We should have a simple model, easy to understand without exceptions. For that we need to remove customer from order and email from customer and manage all info through the existing user model. The complexity comes from the duplication created in customer (the connection to the order and the email are in the user model already). In this case, an order is always placed by a user (guest or not). This is the spree model that we should use. Do you think this is possible to do?

This would be the simple model:

We can remove the distributor_id from the order model

I think this one is important and useful, orders management has to be based on this one.

  • We can remove the shop_id on subscriptions with the same rational.

Same as above, I think you will need this for management in the backoffice.

We could also get rid of the user_id in the customer record and link to the user by the email address.

I think this would make everything more complicated. Sounds like a hack.

I like your thinking about removing the customer_id from order. We can still find the customer by enterprise_id and user_id or email. There is really no need for that column.

I’m not sure about removing email from the customer though. We loose all guest customer records. That means we can’t use tags or permissions on guest customers which is probably okay. We also can’t have subscriptions without user accounts which breaks a few processes at the moment. More fundamentally, a guest who orders from a shop is a customer, right? We wouldn’t be able to model this any more. The customer overview in the admin panel would need additional logic to find all the non-persisted guest customers. And reporting becomes more complicated again.

I’m not sure if I understand that. The Spree::Order model has a user_id which can be nil. Are you suggesting we create user records for guests? Spree is not doing that. Spree is just using the email column of the order in that case. Users come from spree_auth_devise and are not in the core.

certainly interesting to this topic: https://github.com/openfoodfoundation/openfoodnetwork/pull/3247#issuecomment-451198714