Here’s What Can Go Wrong with Database Replication

Replication is the process to copy data from one server to another. It’s generally done so as to prevent data loss in case a server fails or to distribute the incoming load to multiple servers which have same data.
Replicating the data might seem like a trivial operation but it comes with its own challenges. In this article, we are gonna look the challenges in the ubiquitous Master-slave architecture in ACID databases.
Reading your own writes
Let’s look the the following scenario.
- You Insert a value in the database.
- You read the updated value a few milliseconds later.
- You update the same value and rewrite it.

What can go wrong in this scenario?
Generally slaves are at some lag with respect to masters. This lag can range from a few records to a few hundreds. Whenever you issue a write to master, it’ll replicate it to all the slaves.
In this case since slave is at lag, if you read the previous inserted value, either you won’t find anything or you’ll find a previous one. This is disastrous. e.g.
- You added an item to the cart on some e-commerce website and then proceeded to checkout.
- The cart doesn’t show your item (because it’s not updated in slave yet) so you proceeded to add it again.
- You go back to the cart and suddenly you have two items (this time the slave caught up and replicated the most recent writes as well).
This results in a bad customer experience.
What can you do?
Databases may provide Read-after-Write consistency which ensures that you always receive the latest updated data or don’t receive anything. MySQL doesn’t provide this guarantee so you need to take care on the application end. Some of the ways are-
- Always read from the master. This approach is simplest but won’t scale in case the reads are a lot.
- Read from the master only for some duration after last write (let’s say a minute) and then switch to slaves.
- Make sure the timestamp of the read is greater than the timestamp of last write. In case it’s not, you can retry again.
Back to the future
When one of the slaves is at a lot more lag than others, it may seems like that your writes are getting lost.
Let’s take E-commerce example again.
- You added some item to the cart.
- You check the cart and find the item is there. Then you go on and browser other items.
- While checking out, you find that the item is now missing. You refresh the page and only to find the item appear back mysteriously.
Why would that happen?

This is because the first time you checked the cart, the query went to the slave which was caught up with the master. However, the second time it went to the slave at lag which doesn’t have any information.
Databases can provide a Monotonic reads guarantee which ensures that you don’t ever see a read going back in time. This can be done by routing a user’s . read request always to the same slave. However, this won’t scale in cases when some users have a lot more reads than others.
Partitioned databases comes with even more challenging scenarios. You can wound up listening to events from one partition before other partition even though the latter may have been updated early on.
Dealing with all of these problems is not at all trivial and many ACID databases till this date don’t provide any of these guarantees. As a developer, it becomes your job to consider these issue and handle them properly in your application.
If you want to explore more about replication, you can refer to the links below:
Connect with me on LinkedIn or Twitter or drop a mail to kharekartik@gmail.com