dilluns, 9 de setembre del 2013

MongoDB Architectures

When you talk about data base clustering the first thing you search is for an architecture and a failover solution. Many services offers some kind of arbitrer to manage them, and taking decisions about which instance has to be the master is its main purpose.
You can see Redis, for example. Since version 2.4 exists a sentinel component which monitors HA instances. Also MySQL needs an external script to monitor master failover and promote some slave. So when you think about MongoDB clustering, it is easy to think that an arbiter is always needed to give an automatic failover promotion. This is a HUGE error...

Let's talk first about MongoDB replication. You can set two kinds of replication:
- Replica Sets: Standard master-slaves architecture.
- Sharded clusters: A data partitioning solution with Replica Sets.
MongoDB gives an automatic solution for failover promotions: all components know each other and when starts a cluster, a votation is done to decide who has to be the primary. All the procedure is well documented. This is good, but also a little headache, because if you want to be sure some instance win, you will have to deploy an odd number of instances. 

So let's see some posible architectures and analize them.

Think in this minimum HA architecture:


Obviously, you will need to make an external component to decide when to promote, if primary fails, secondary by itself won't be able to make a votation, it has anyone to do with. Your system will crash.

To avoid this, we can add a new Secondary to act as an arbiter:



Now you have a real HA solution, without configuring any extra thing. But... if you do this solution, you probably are not the one paying. ARE YOU SERIOUS?? Paying for a full machine, but only pinging the other two instances... Please, be a little more expense careful. 



Ok, so you need HA, you care about costs and minimize instances, but your DB has a high reading needs, and need some more secondary instances... so you decide to add a new instance:



Wait, we saw that an odd number of members are needed, and now we have four, will it work? Well, as you can see, it seems that will work, but the arbiter will be a little unuseful. So, being a little HA paranoid, why to have a 3 server schema and not have a better fault tolerance? Try this:



Since here, add so many secondaries as you want and play with your arbiters to have an odd set.

Nice, now you know how to improve a massive read cluster, but what if we need a better write performance? Some will say... MIGRATE TO CASSANDRA!! Really? You are reading this probably because you have done a search, have compared some noSQL solutions, and chose mongoDB so... find a way with mongoDB!!

If you need write performance, write on multiple instances. We talked earlier of sharding (be sure that you REALLY need it). See this architecture.



As you see, we are using the same number of servers, but complexity has grown a lot. We have new components, let me explain them:
- mongos: Router to DB, you have to access to them from your app.
- CSx: called Config Server, mongod instance that only holds metadata about the cluster.

So as you can see, you can write in all servers, but of course will be different data. Each data document could only be written in one server, and be readed from two of them, but if you do a good work with shard keys, you probably will multiply your write performance.

If you choose a sharding solution see some tutorials, work with your development team and be sure to do a smart data partition. 

1 comentari: