I'm trying to figure out how companies that use nosql database solve this general nosql race condition issue:
Lucky example: User and Product. Product has quantity of 1 and there are 2 users. When the first user tries to buy this product, system first checks whether quantity is > 0 and it is indeed > 0, proceeds to create a Transaction object and decrement quantity of product. The second user tries to buy the product, system rejects as quantity isn't > 0.
Unlucky: Both users try to buy the product simultaneously. For both, system confirmed quantity is > 0 and so created a Transaction object for both users, hence destroying the company image next day...
How to generally deal with this common scenario?
From similar cases i found on the net, one suggested solution is to use request queue, and process the request one by one. However, if all transactions are queued, and you're running business like Amazon (millions of transactions every now and then), how do we expect users to know whether or not their purchase succeeded shortly after they clicked that purchase now button?
One of the ways to solve this problem is to allow both users to order products simultaneously.
Then there are two possible situations:
One of the users doesn't finish a transaction (refuses to pay, closes a browser window etc). Then another one will have the requested amount of a product.
Both users finished their transactions. Then you will get a random user your product and say sorry to another one giving away a coupon with $10 to him/her.
The second situation should happen extremely rare. So you won't blow out all your money on coupons and your users will be happy whatever the outcome. But you still need to monitor the 2nd situation in order to react and make changes to your system if it happens more often than you expected.
Related
I am currently building a booking tool (django and postgresql) where people book spaces in a location. The location has limited capacity and therefore might run out of space while a user is trying to book a space. I imagine that this could happen when another user books a place slightly before the current user is booking, e.g. if only 1 space is left, the other user might book too and the database cannot handle it.
So my question is would you strongly advise using select_for_update or is there something else that might help me overcome this issue?
Yes, that's a correct use of select_for_update. You would be blocking a specific location row (apply a filter before calling select_for_update). That means that 2 different locations can be booked concurrently, but if there are 2 bookings for the same location happening at exactly the same second they would be called.
This creates a critical section and you can sure that it won't overlap with a critical section of another request. In within the critical section, you will have to validate that the selected time slot is free - without that validation select_for_update would have no effect.
I could imagine another approach based on unique constraints, it's not universal but might be easier to implement. Let's imagine that you are booking a resource for a specific day. You could have a "unique together" combination for the resource_id and date. A subsequent save would raise an IntegrityError and you could catch it and inform the user that the resource was just booked for the selected date.
I'm building an app where two users can connect with each other and I need to store that connection (e.g. a friendship) in a DynamoDB table. Basically, the connection table has have two fields:
userIdA (hash key)
userIdB (sort key)
I was thinking to add an index on userIdB to query on both fields. Should I store a connection with one record (ALICE, BOB) or two records (ALICE, BOB; BOB, ALICE)? The first option needs one write operation and less space, but I have to query twice to get all all connections of an user. The second option needs two write operations and more space, but I only have to query once for the userId.
The user tablehas details like name and email:
userId (hash key)
name (sort key)
email
In my app, I want to show all connections of a certain user with user details in a listview. That means I have two options:
Store the user details of the connected users also in the connection table, e.g. add two name fields to that table. This is fast, but if the user name changes (name and email are retrieved from Facebook), the details are invalid and I need to update all entries.
Query the user details of each userId with a Batch Get request to read multiple items. This may be slower, but I always have up to date user details and don't need to store them in the connection table.
So what is the better solution, or are there any other advantages/disadvantages that I may have overlooked?
EDIT
After some google research regarding friendship tables with NoSQL databases, I found the following two links:
How does Facebook maintain a list of friends for each user? Does it maintain a separate table for each user?
NoSQL Design Patterns for Relational Data
The first link suggests to store the connection (or friendship) in a two way direction with two records, because it makes it easier and faster to query:
Connections:
1 userIdA userIdB
2 userIdB userIdA
The second link suggests to save a subset of duplicated data (“summary”) into the tables to read it faster with just one query. That would be mean to save the user details also into the connection table and to save the userIds into an attribute of the user table:
Connections:
# userIdA userIdB userDetails status
1 123 456 { userId: 456, name: "Bob" } connected
2 456 123 { userId: 123, name: "Alice" } connected
Users:
# userId name connections
1 123 Alice { 456 }
2 456 Bob { 123 }
This database model makes it pretty easy to query connections, but seems to be difficult to update if some user details may change. Also, I'm not sure if I need the userIds within the user table again because I can easily query on a userId.
What do you think about that database model?
In general, nosql databases are often combined with a couple of assumptions:
Eventual consistency is acceptable. That is, it's often acceptable in application design if during an update some of the intermediate answers aren't right. That is, it might be fine if for a few seconds while alice is becoming Bob's friend, It's OK if "Is Alice Bob's friend" returns true while "is Bob Alice's friend" returns false
Performance is important. If you're using nosql it's generally because performance matters to you. It's also almost certainly because you care about the performance of operations that happen most commonly. (It's possible that you have a problem where the performance of some uncommon operation is so bad that you can't do it; nosql is not generally the answer in that situation)
You're willing to make uncommon operations slower to improve the performance of common operations.
So, how does that apply to your question. First, it suggests that ultimately the answer depends on performance. That is, no matter what people say here, the right answer depends on what you observe in practice. You can try multiple options and see what results you get.
With regard to the specific options you enumerated.
Assuming that performance is enough of a concern that nosql is a reasonable solution for your application, it's almost certainly query rather than update performance you care about. You probably will be happy if you make updates slower and more expensive so that queries can be faster. That's kind of the whole point.
You can likely handle updates out of band--that is eventually consistency likely works for you. You could submit update operations to a SQS queue rather than handling them during your page load. So if someone clicks a confirm friend button, you could queue a request to actually update your database. It is OK even if that involves rebuilding their user row, rebuilding the friend rows, and even updating some counts about how many friends they have.
It probably does make sense to store a friend row in each direction so you only need one query.
It probably does make sense to store the user information like Name and picture that you typically display in a friend list duplicated in the friendship rows. Note that whenever the name or picture changes you'll need to go update all those rows.
It's less clear that storing the friends in the user table makes sense. That could get big. Also, it could be tricky to guarantee eventual consistency. Consider what happens if you are processing updates to two users' friendships at the same time. It's very important that you not end up with inconsistency once all the dust has settled.
Whenever you have non-normalized data such as duplicating rows in each direction, or copying user info into friendship tables, you want some way to revalidate and fix your data. You want to write code that in the background can go scan your system for inconsistencies caused by bugs or crashed activities and fix them.
I suggest you have the following fields in the table:
userId (hash key)
name (sort key)
email
connections (Comma separated or an array of userId assuming you have multiple connections for a user)
This structure can ensure consistency across your data.
Imagine a little webshop which uses its own currency.
In the database, there is a table for items, which can be bought. The count of each item is limited. This limit is stored in a collumn in this table as integer.
Also, there is a table for the user's cash accounts, whereas for each account the current balance is saved.
If, for example, two users conduct their purchases at the same time and only one item is available, it could be possible that both users pay but only one receives the item due to racing conditions.
How can such racing conditions be resolved, without relying on entity framework throwing exceptions on saving?
How can I ensure the count of available items and the account balance of the buyer is correctly updated?
This isn't really a problem specific to Entity Framework, it's applicable to just about any shop scenario. It comes down to a question of policy - the only way to ensure that two customers do not purchase the same item is to allow a temporary lock to be placed on that item when they add the item to the cart, or begin the checkout process, similar to how concert tickets or flights are sold. This lock would expire if the purchase is not completed within a set amount of time, and the item would be released back for other customers to purchase.
In an e-commerce setting, this is not as suitable, since people may add an item to their cart and not check out, or spend extra time choosing additional items. This may lead to the scenario where you have items for sale but they can't be bought because they're in someone's cart who's not planning on checking out. Instead, duplicate orders are allowed, but payments are typically only pre-authorised and then completed at the time of shipping or order confirmation, so even if the second customer enters all their details and presses Buy, their card wouldn't be charged since the item wouldn't be shippable.
You can implement checks at different stages during the Checkout process to ensure the items in the cart are still available, or at the simplest level, leave it for the final "Pay Now" button on the last page. Ultimately though, this just reduces the potential for the race condition, rather than eliminating it.
What are strategies to deal with seemingly common scenario of a limited inventory and an order form.
If there is one item left, and two people attempt to purchase at the same time. How do you deal with whoever submits payment last?
When a user adds a limited-supply item to their shopping cart, put a hold on the item for a small window of time - say, 15 minutes. It's theirs if they pay within the window, otherwise the hold is removed and the item is returned to the pool. (For the duration of the hold, the item considered "not available" to other users.)
AFAIK, it's pretty standard technique - I've seen Gilt do this, for instance.
My model:
class Order(models.Model):
property_a = models.CharField()
property_b = models.CharField()
property_c = models.CharField()
Many users will access a given record in a short time frame via admin change page, so I am having concurrency issues:
User 1 and 2 open the change page at the same time. Assume all values are blank when they load the page. User 1 sets property_a to "a", and property_b to "b", then saves. A second later if user 2 changes property b and c then saves, it will quietly overwrite all the values from user 1. in this case, property_a will go back to being blank and b and c will be whatever user 2 put in.
I need recommendations on how to handle this. If I have to have a version field in the model, how do i pass it to the admin, where do I do the check so I can elegantly notify the user their changes can't be saved because another user has modified the record? Is there a more seamless way than just returning an error to the user?
The standard solution is to prevent your users from sharing a single record. It's not at all clear why so many users are messing with the exact same Order instance.
Consider that Order is probably a composite object and you've put too much into a single model. That's the first -- and best -- solution.
If (for inexplicable reasons) you won't decompose this, then you have to create a two-part update transaction.
Requery the data. Compare with the original query as done for this user's session.
If the data doesn't match the original query, then someone else changed it. The user's changes are invalidated, rolled back, wiped out, and the user sees a new query.
If the data does match, you can try to commit the change.
The above algorithm has a race condition, which is usually resolved via low-level SQL. Note that it invalidates a user's work, making it maximally irritating.
That's why your first choice is to decompose your models to eliminate the concurrency.
my model has a miscellaneous notes field
This is a bad design. (a) Concurrency is ruined by collisions on this field. (b) There's no log or history of comments.
Item (b) means that a badly-behaved user can maliciously corrupt this data. If you keep notes and comments as a log, you can -- in principle -- limit users to changing only their own comments.
[In most databases with "miscellaneous notes" the field has become a costly, hard-to-maintain liability full of important but impossible-to-parse data. Miscellaneous notes is where users invent their own processes outside the application software. ]
"miscellaneous notes" must be treated like a log, with an unlimited number of notes -- date-stamped -- identified by user -- appended to the Order.
If you simply partition the design to put notes in a separate table, you solve your concurrency issues.