Monday, September 15, 2008

Small, frequently updated tables

New in 0.2.1 and 0.18.0 is a finer grained configuration of major compactions. I won't go into the details today for what is a compaction so I'll refer to the architecture page of the wiki.

So, suppose you have a small table (10 rows) which is frequently updated (100 times / minute) and a MAX_VERSION of 1. From a developer point of view, you expect that this table will eat only a few MB in HDFS but upon inspection you will see that it's not the case. What happens is that when you add new cells, the old ones are kept but marked as deleted and will only be cleared after a major compaction which happens once a day! One thing you can do is to set the hbase.hregion.majorcompaction to a smaller value but this will affect your whole cluster and it's not recommended. With the introduction of HBASE-871, you can now set this value for each family. In Java, the code looks like:

family.setValue("hbase.hregion.majorcompaction", "1800");

Here the family is a HColumnDescriptor and the value 1800 will get you major compactions every 30 minutes (more or less). That'it!

No comments: