Speaking “MongoDB Replica Set”

Replica sets are cool. They add data redundancy in a mostly transparent way, requiring only a little extra developer knowledge. That knowledge is important, though, and we regularly help our customers troubleshoot applications running into preventable problems.

Replica sets in one tweet or less

Replica sets are master/slave clusters that automatically “promote” a slave if the master DB becomes inaccessible.

Or read a longer description.

What a developer needs to understand

MongoDB handles the intricacies of syncing data, managing which member is primary, and recovering secondaries for you. It doesn’t, however, make your client code any smarter about which member to talk to (well, unless you’re running mongos, but that’s less common). Applications themselves should know about as many replica set members as possible and be able to switch between them on the fly.

The most basic requirement for a replica set is that all writes must go to the primary set member. Secondaries can handle reads, but will complain if they encounter a write command. Most drivers will figure out which member to write to, though… assuming they can connect to any member. This brings us to our first rule of speaking replica set:

Rule #1: Always provide your driver with as many replica set members as possible

Many applications only connect to a single host on startup. This is bad, and not for any good reason. if that one host isn’t available at connection time, the application will behave as if the entire replica set is down. Drivers options differ slightly, but they all have options for connecting to multiple replica set members. If you tell them everything you can about your setup, they will usually do the right thing when connecting.

Rule #2: Handle connection failure exceptions

Secondaries are most commonly promoted when the primary becomes completely inaccessible. This can happen for a number of reasons, anything from a routine mongod restart to a helicopter “landing” in the middle of your datacenter. Drivers will generally raise a failed connection exception when they send an op to a missing replica set member, then leave the rest up to you. You can choose to either retry the operation (a good idea for critical writes) or just give up and wish your users better luck next time (a good idea if you don’t like your users).

Rule #3: Be ready for “not master” exceptions

Drivers won’t always raise connection exceptions when a primary “steps down” to secondary. We usually see this happen when we demote the primary member of a replica set for maintenance purposes and an application keeps on sending writes along. MongoDB will send errors back indicating that it’s “not master”, but applications may not know what’s going on.

This is a little more tricky to deal with, unfortunately. If an application is writing in safe mode, drivers will usually raise an exception with “not master” in the message string. If an application isn’t using safe mode, and never checks the getLastError command it will happily keep on sending data into oblivion. In a complete coincidence, rule #4 takes care of this problem:

Rule #4: Practice safe writes (unless you can afford to lose them)

Simply using safe mode will help you catch failing updates, but there’s a bit more safety to be had in a replica set setup.

In single server MongoDB setups, safe mode ensures that the write “succeeded”, either in memory, on disk, or to the journal (depending on the options). Replica sets get to use another flag that tells the command to hang out and wait until the write has been replicated to other replica set members. If your data is important, you probably want to verify that writes are committed to a majority of replica set members (the “majority” flag).

There’s no flag for “chisel this into granite and store it in a converted underground missile silo for future generations to unearth after the inevitable llama uprising”, unfortunately. 10gen does take feature reqeusts.

Rule #5: Go ahead and read from secondaries

You may as well spread queries across your secondaries. They’re basically just sitting there with nothing to do but keep up with writes. You could even be extra clever and have a multi-datacenter replica set (we’ll set it up!) for disaster recovery purposes, host your app in multiple places, and read from the closest replica set member.

As an aside, secondary replica set members are a great way to get around the one-at-a-time map/reduce limitation in MongoDB. We have a number of customers who run “map/reduce secondaries” that will never take over as master, but run map/reduce jobs all day.

Don’t waste your replica set

It’s sad to have a nice, shiny replica set setup and an application that doesn’t understand how the data store works. In the worst cases, you’ll lose data despite a fully functional database. On normal days, your app may benefit from spreading reads around to otherwise bored servers. Make sure you teach your applications to speak “replica set”.

This post was written by Kurt Mackey.