Sometimes, wheels have to be a little squeaky before we can learn how to properly oil them. Once, a race condition was discovered within our Rails codebase that required a deep dive into concepts of concurrency which I hadn’t yet encountered in my career.
What’s a race condition?
A race condition occurs when two different processes attempt to update the same value simultaneously, and are unaware that the other process is doing it. Usually you would encounter this when multiple Ruby processes have connections to the same database, but you can reproduce the concept on the command line.
An example would be two transactions that both add $5 to the same account balance:
account = Account.find(1) # First instance pulls balance of $10
account = Account.find(1) # Second instance pulls balance of $10
account.balance += 5 # First instance Increments balance by $5
account.balance += 5 # Second instance Increments balance by $5
account.save # First instance writes back $15 as new balance
account.save # Second instance also writes back $15
This clearly saves back a balance of $15, where it should now be $20.
Aren’t there other ways of solving the issue without storing the balance?
The original designers of the system chose to sum the balance every time a new transaction occurs. For instance, within your transaction table, you could have a callback that triggers a balance recalculation on the account every time a transaction is saved:
class Account < ActiveRecord::Base
self.balance = transactions.sum(:credits) - transactions.sum(:debits)
class Transaction < ActiveRecord::Base
This will work happily on your development machine while you’re prototyping, but as the number of transactions in your database increases in the production system, this will take increasingly longer to process. ActiveRecord will begin locking records during the update, and any action that creates transactions (including placement of orders) will be affected. Better to go with storing a balance in an accounts table, and updating it as needed.
Concurrency and Deadlocks
The best way to ensure that no two processes are able to update the balance simultaneously is to lock the record that’s being updated until the process has successfully saved it. This can be done in PostgreSQL, the DB of our choice, by using SELECT FOR UPDATE. In order for such a select to be effective, it must occur within a transaction, which in the example below is bookended by BEGIN and COMMIT.
SELECT * FROM ACCOUNTS WHERE id = 1 FOR UPDATE;
UPDATE ACCOUNTS SET balance = 15 WHERE id = 1;
While the block is being executed, other sessions can read the balance but no other session connected to the database can perform its own SELECT FOR UPDATE or perform an UPDATE on that record. They have to wait in a queue until this transaction is finished.
The equivalent of doing this in Ruby with ActiveRecord is to wrap the entire operation in a transaction (which will ensure that everything you do in that block falls between BEGIN and COMMIT), and lock the record:
account = Account.find(1, lock: true)
account.balance += 5
This appeared to work well when we used a multi-threading test gem, but we discovered another issue completely by accident. In some instances, while we were updating balances on the account, other processes were adding new transactions. This created a deadlock – and the reason was not immediately apparent, generating this cryptic error:
SELECT 1 FROM ONLY
"public"."accounts" x WHERE "app_id" = $1 FOR SHARE OF x
ERROR: deadlock detected
Process 20480 waits for ShareLock on transaction 74278; blocked by process 20481.
Process 20481 waits for ShareLock on transaction 74275; blocked by process 20480.
What’s going on here? After Googling this, it turns out that referential integrity checks at the database level also perform their own SELECT FOR SHARE, which would also happily wait for a SELECT FOR UPDATE to finish under normal circumstances. But since we were testing with hundreds of new transactions doing INSERTs and UPDATEs from different sessions, they end up crossing paths and waiting for each other to finish. This makes sense – the application can’t have other processes changing the record you’re checking for integrity. But it seems if you want to get around this intractable problem, you need to make sure that the field you’re storing is not going to be sitting on a table that other tables reference. It was decided that we should move the balance field into its own table that would exist only to reference the account and not be referenced by anything.
How did you simulate the multi-threading environment?
Oh! That was pretty cool. Our sysop pointed me to a sweet gem called “parallel”. It allows you to do cool stuff like the following, which sends an array of numbers to be printed in four separate processes:
Parallel.each([1,2,3,4], :in_processes => 4) do |e|
See more here: