At the MongoDB 2.6 Index Intersection

Although MongoDB 2.6 is still some way off, some of the major changes which will make the next release are now getting to see the light of day. One important change, index intersection, has been discussed for some time as it addresses one of the oddities of MongoDB – it can currently only use one index at a time when handling queries. Now,  MongoDB’s CTO has confirmed in a discussion that MongoDB 2.6 will by default, be able to use more than one index when planning a query. The next question would logically be “how many indexes”  and the code that has landed in the development tree for MongoDB version 2.5.5 answers that question: 2 indexes, though with the rider that this limit many increase over time. To understand what this means in practice though, we need to look at MongoDB 2.4’s use of indexes.

Let’s start with an example of a simple document:

{ firstname: "Theresa", lastname: "Prohaska", city: "East Haylie" }

There are three fields – four if you include the “_id” field which will be generated when this document is saved – in our document. Now, without indexes, any query will need to scan every document stored. If we index the lastname field, then issue a query including lastname, its index will be used to thin down the field of documents to be scanned. If we index the firstname too, and issue a query including the lastname and the firstname, the query planner doesn’t use both indexes but instead tests to see which field’s index runs fastest the first time it sees the query and uses that from then on. It’s not optimal, especially as indexes cost memory.

The traditional MongoDB workaround for that problem is compound indexes. These are indexes made up of more than one field. You can think of them as a declaration that you intend to search on this particular combination enough that it’s worth using the memory and resources to maintain this particular combination. And there’s more catches to compound indexes. For example, the order the fields are defined in the compound index determine how useful the index will be in supporting queries that use the fields, so if we compound indexed our example like so…

db.collection.ensureIndex( { firstname:1, lastname:1 } );

The index would be useful for query including firstname and lastname or firstname only, but would be no help for lastname only; you’d need to index lastname separately too. This is a simple example and in practice you may have many combinations of fields you’d want faster queries for. You could create an index for each combination but each index would cost memory and write performance.

The catches don’t stop there; indexes help in sorting, but with compound indexes can only boost sorting if the sort order of the index is the same as, or exactly opposite to, the query.  Our example compound index would help out with both fields sorted ascending:

db.collection.find().sort( { firstname:1, lastname:1 } )

and both fields sorted descending:

db.collection.find().sort( { firstname:-1, lastname:-1 } )

but be unable to offer any assistance for one field ascending and one descending like:

db.collection.find().sort( { firstname:1, lastname:-1 } )

So a compound index can be very useful but it’s a very specific performance optimization with a cost. If only there was a more generic solution…

Enter Index Intersection

The index intersection technique has been around for many years in the world of traditional relational databases; by using independent indexes together it gives the query planner more flexibility to find an optimal way to answer queries and minimize scanning of the stored data.  What MongoDB 2.6 will have is the ability to use two indexes when planning a query.  If we assume our example documents now just has an index on firstname and another on lastname:

db.collection.ensureIndex( { firstname:1 } )
db.collection.ensureIndex( { lastname:1  } )

When a query is made the query planner can now use both the independent indexes in working out how to resolve the query by looking for where the indexes cross over, or intersect. This in turn cuts down further on how many documents have to be scanned to fulfill the rest of the query’s specification.  For collections which have a compound index with two fields, as long as the fields involved have their own indexes, it should be possible to drop the compound index and let the query planner handle it.  Less indexes mean less write contention too as there’s less to be updated.  But, if the fields aren’t already indexed, then you’ll need to consider whether you want to create two indexes to replace that one index.

If you rely on the indexing for boosting sort performance, there’s a catch there too: as of MongoDB 2.5.5, sorting isn’t able to use indexes in the same way as the query planner does for queries. So, disposing of a compound index may well slow down sorting and you will also need to check where the compound index comes into play with sorting. Where there are many fields in a compound index, the planner changes will have a limited effect, unless the compound index fields and another indexed field are referenced in a query; a compound index appears to be treated by the planner as just another index.

That initial limit of 2 indexes is also going to make decisions about when or if you should change indexes a bit trickier. If more indexes were available to the query planner, then these decisions would be simpler, but MongoDB 2.6 is going to be limited to 2 indexes.  With that in mind, MongoDB users are going to have an opportunity, before 2.6 is eventually released, to look at how they are indexing their collections, especially the ones with lots of indexes, and seeing whether they can win performance or memory by letting the new query planner optimize with index intersection. Until a later MongoDB gets the ability to intersect more than two indexes and expands support for intersection to sorting, it is going to be a delicate performance balancing act for users.

MongoHQ customers who may have any questions about how 2.6′s index intersections could be harnessed should contact us.

Written by Dj Walker-Morgan

Content Curator at MongoHQ, Dj has been both a developer and writer since Apples came in ][ flavors and Commodores had Pets.

  • Luiz Felipe Mendes

    Great Post!
    This is one of the most important features that I was waiting for, BUT it seems that it will probably not solve my problems =(
    It is completely necessary for me to use the sort, almost all of my queries use the sort.
    I will have to test some things, but I’m using a lot of compound indexes to query on at least 3 fields and sort by another, so I do not think that this feature will help like I was hoping for.

    Do you know if the index intersection will work with text indexes?

  • Nevi_me

    I’ve followed the ticket on query intersection, and when the 2 index solution was committed, I honestly felt a bit indifferent about it. I understand that there was probably time pressure, and that one does not just type away at their machine and get things running, but I still feel like more could have been done to address query intersections.

    We go to great lengths to denormalise our data so we can use the document model, but then very important things as intersection still lack. For example, I have a document model which has about 15 sets of arrays, most are nested, and some are location fields. Mongo has good ways of expressing queries to find that data, but to start with; the fact that a common query on fields that include a location index will *always* (from my experience) use the location index regardless of other indices (yes, I’m aware of geosearch) feels backwards.

    Using 2 indices addresses such issues for people who use Mongo for relatively simple analytics, but for people who are using it for more complex applications, this simplicity often still leads to scans taking place.

    One can’t just have massive compound indices to solve the problem, as there are still limitations – especially on multiple array indices – like Cartesian explosions that make it impossible to have better performance gains.

    PostgreSQL with their JSON support could end up being a safe-haven for people jumping ship from Mongo (all those “Mongo was wrong for us” people, and other genuinely discontent and limited users).