I’m working on a bug that causes subscriptions to fail. The problem is a conflict between guest customer records and registered user customer records.
- Perrine orders from a shop as a guest. A customer record is created.
- The shop manager creates subscription with the current customer record. This record is a guest record.
- 4 subscription orders are successful. Everything works.
- Perrine signs up for a user account and confirms their email address. The customer record is not updated and is still a guest record.
- When Perrine tries to buy more as guest, the email address is already in use. Perrine has to log in to order from the shop.
- The subscription tries to create the fifth order. As usual it tries to create a guest order. But ordering as guest is not allowed if there is a registered user with the same email address. The order fails and remains in the cart state.
This is a design bug and therefore hard to solve. We discussed several solutions, but I’m still not sure what the best technical solution is. The best we came up with so far: When a guest creates a user account, all customer records with the same email address are associated to that new user. That would change several things:
- All anonymously created guest orders with the same email address are associated to the new user account.
- All subscriptions created for that email address are associated to the new user account.
That sounds okay. There are privacy concerns, because we don’t really know who created the guest orders. So we may need to put some effort into hiding these guest orders from the user’s history (but that might already be covered). There is also some concern about email addresses changing their owners, but we can ignore that for now. Dealing with that is a whole new feature that would affect more parts of the OFN. There is also the idea to get rid of guest checkouts which would make all this obsolete. But I see that as a separate discussion as well.
I’m writing this up now, because I see part of the problem in the underlying data structure and I don’t want to make it worse. I’m wondering if there is an opportunity here to improve it. These are the basic relationships for this scenario:
There is some redundant data:
- An order belongs to a shop directly and through a customer record.
- A subscription belongs to a shop directly and through a customer record.
- A user, a customer and an order all have an email address saved.
I think, it’s okay that an order has an email and a user id. It also has all the users data like their address, because it acts as an artefact that doesn’t change once it’s finalised (in theory). It documents the email address and user id at the time. But the distributor_id
is redundant since we have customer records.
Customer records have a email address and a user id, because they can be a guest customer or a registered customer. They are polymorph. Guest customers don’t have a user id and registered customers have a redundant email address that could be deleted. They inherit this polymorphism from orders.
Subscriptions use this polymorphism. They don’t care if they are associated to a guest or a registered customer. But the association to a customer actually makes their own shop_id
obsolete.
The reason for this data model is historic. The customer model is fairly new. I believe it was created for several reasons:
- It enables tags to be used on a user-enterprise relationship. For example certain customers can be labeled VIP and this label can be used to make a certain payment method available (e.g. bank transfer for trusted users).
- It simplifies the subscriptions model.
- It enables permissions to be set on the user-enterprise relationship (e.g. allow the shop to charge the user’s credit card).
When introducing the customer record, Rob was wondering if that makes some other database fields obsolete, but we didn’t want to change the database too much. We first wanted to see how the customer model evolves. So now is the time to review and see if we can improve it.
Possible solutions
- The simplest way to fix the current problem is to change customer records during sign-up. All customer records with the same email address are associated to the new user. What could possibly go wrong? Could it happen that there is a customer record with the same email address, but already associated to a different user? Maybe if users get soft-deleted and people can sign up with the same email again. We should ignore those records.
- We could also get rid of the
user_id
in the customer record and link to the user by the email address. That would remove some redundancy and chances of inconsistent data. We also don’t need to manually associate new accounts, they would be found automatically. What is the performance impact of using a string column as search index instead of an integer?
Sidetrack: string vs integer
In this case varchar(255)
vs integer
. Strings take up more space and the theoretical search space is bigger as well. But we have an index for id
and email
of users on disk already. And the number of users (the actual search space) is the same for both indexes. I rand a quick test query on Aus production:
select id from spree_users where id = 4378;
> 0.639 ms, 0.667 ms, 0.617 ms, 0.493 ms
select id from spree_users where email = 'maikel@openfoodnet...';
> 0.429 ms, 0.658 ms, 0.521 ms, 0.632 ms
I don’t think that is something to worry about.
Additional optimisations
- We can remove the
distributor_id
from the order model, because we have acustomer_id
now. This further reduces the risk of inconsistent data. It also removes one of our customisations of a Spree model. It could make some database queries less efficient though. I’m not sure how often we useenterprise.orders
which would then go via customers.
- We can remove the
shop_id
on subscriptions with the same rational.
Discussion
- Do you have better ideas how to solve this problem?
- Which of the proposed solutions would you prefer and why?
- Where do you see risks? Which parts should we investigate first or test really carefully?
Thank you.