jdcryans's Blog: 2008

Tuesday, October 14, 2008

Project news

While we are working on HBase 0.19.0, here's what happening:

Andrew Purtell of Trend Micro became the fifth committer. They recently began using HBase in a production environment as you can see in the PoweredBy page of the wiki.
Michael Stack and Jim Kellerman, the two committers from Powerset (which was bought by Microsoft), are once again able to contribute code. Expect a lot of improvements in the next release.
The 4th HBase User Group meeting will take place at the Powerset office on October 21, please signup if you are coming. Currently, all active committers should be there (including me by phone).

I will soon post about the writes improvements in HBase, stay tuned!

Thursday, September 18, 2008

UPDATE: This work on scanners is now committed into trunk, enjoy the new speed!

Stability and reliability were the focus of the previous HBase versions, now it is time for performance! In this first post on the subject, I will describe the work done on the scanners.

As described in the Performance Evaluation page of the wiki, current numbers compared to Bigtable are not that great (though a lot better than it was). One big hotspot we have in HBase is remote procedure calls (RPC) and some profiling we did showed us that it's sometimes even bigger than the overhead of HDFS. So, our strategy for HBase 0.19.0 will be to batch as much rows as we can in a single RPC in order to diminish it's impact. It will require different implementations for different operations.

Implementation and tests

Scanners in HBase already provide the fastest way to retrieve data but compared to how a direct scan in a HDFS MapFile is, it is clear that something is wrong (3 737 rows per second instead of a staggering 58 054). HBASE-877 is about fixing that hotspot. The implementation is simple: since we know that we want many rows, why not get a bunch of them each time we communicate with a region server and then serve directly from cache? In other words, fetch x rows, serve from cache x times, repeat until rows are exhausted. That works for big jobs, but what if a user wants to scan only 2-3 rows (or even 1), can we do something so that it doesn't fetch 30 rows? If we make sure that that number is configurable, it will also work fine for small scans.

My first tests with my patch confirm that batching rows is efficient. In the following chart, I ran the Performance Evaluation test 30 times, going from batching 1 row to batching 30 rows:

As we can see, the test took around 108 seconds to scan all 1 million+ rows on my machine (2.66 Ghz, 2GB of memory, SATA disk). That's 9692 rows per second, better than the PE in the wiki but my machine is also stronger. But at 30 rows per fetch, it scanned 30 028 rows per second! That's a lot better for a very simple fix. I think this could be further optimized by fetching the rows in a second thread with some logic to make sure we don't load the full table right away... and that's for a future version of HBase.

Configuration

There are many ways to configure how many rows are fetched. First, hbase-default.xml will define a default value for hbase.client.scanner.caching that you will be able to override in hbase-site.xml like any other value. The other way is to use HTable.setScannerCaching to define an instance-specific value. For example, you have to scan thousands of rows and right after that you only have to scan 2 rows and close the scanner (all on the same table), you should just pass 30 the first time and 2 the second time to setScannerCaching. That's a bit aggressive, but you can do it if you want.

Something you must verify before setting the caching to a high number of is the size of your rows. While testing the patch, I used 100MB rows to have many splits and kept the caching at 20. Bad idea, my machine began swapping like hell!

Conclusion

HBase 0.19.0 is all about speed and the first optimization is done on scanners. By fetching more rows, the cost of doing RPCs is greatly reduced. By keeping it configurable, it also fits all use cases. In my opinion, those who are using HBase as a source for MapReduce jobs will see a big improvement since it does full scans of tables (as long as the caching is configured at around 30 or less for very big rows).

If you want to use this sweet improvement right now, just download HBase 0.18.0 and patch it with my patch in HBASE-887 or wait until it gets committed.

Happy scanning!

Monday, September 15, 2008

Small, frequently updated tables

New in 0.2.1 and 0.18.0 is a finer grained configuration of major compactions. I won't go into the details today for what is a compaction so I'll refer to the architecture page of the wiki.

So, suppose you have a small table (10 rows) which is frequently updated (100 times / minute) and a MAX_VERSION of 1. From a developer point of view, you expect that this table will eat only a few MB in HDFS but upon inspection you will see that it's not the case. What happens is that when you add new cells, the old ones are kept but marked as deleted and will only be cleared after a major compaction which happens once a day! One thing you can do is to set the hbase.hregion.majorcompaction to a smaller value but this will affect your whole cluster and it's not recommended. With the introduction of HBASE-871, you can now set this value for each family. In Java, the code looks like:

family.setValue("hbase.hregion.majorcompaction", "1800");

Here the family is a HColumnDescriptor and the value 1800 will get you major compactions every 30 minutes (more or less). That'it!

Wednesday, September 3, 2008

My Personal Notes on 0.2.1

0.2.1 is a bug fix release with some improvements. We mainly fix issues not discovered with the 0.2.0 release candidates.

BUG FIXES

Many serious issues regarding MAX_VERSIONS and deletes were fixed to the point that using 0.2.0 is not recommended.
Another very serious issue fixed is that using any character under the “,” in the ASCII table can result in unreachable rows. For examples, see HBASE-832. This other issue should also be considered as a strong argument to upgrade.
The timeout for the scanner went from 30 seconds to 60 by default. This prevents getting UnknownScannerException during heavy tasks (mainly MapReduce).
Writing in a region while scanning it could result in a temporary deadlock if a split was required. For example, a MapReduce job with no Reduce that modified each row it mapped and that committed the change in the Map would surely fail after some time.
The row counter present in HQL is now in the new shell. It is supposed to be slow.
Committing a BatchUpdate with no row will now complain.
Various other issues were resolved that fixed stuff in the logs, cleaned the code, and fixed the API for deleteFamily.

IMPROVEMENTS

Be aware that 0.2.1 is based on Hadoop 0.17.2.1 which was released as 0.17.2.
When dropping a table in the shell, you will now have a nice message if the table was not disabled.
The pauses were reduced so creating a new table, for example, is faster.
All methods that take a Text that were not deprecated now are. Be aware that in 0.18 these methods will be removed so this is your last warning!
Various speed improvements regarding compactions and Jenkins Hash (used to encode a region’s name).

INCOMPATIBLE CHANGES

The Thrift IDL was updated to match the new API. Be sure to have a look!

Friday, August 29, 2008

Current Project Status

Ok so a lot has been going on lately, here's where we are today.

The acquisition of Powerset by Microsoft was feared by many members of the community (since PSet sponsors HBase by having two near full time devs and we all know MS love for OSS) but it turned out to be not that bad. For example, MS became an Apache Platinum sponsor which seems to imply that they are willing to change their perspective. Also, Jim and Stack (the two devs employed by PSet) cannot commit code since August (though they can support, test, describe how to fix bugs, make releases, etc), which IS bad, but this is only to give some time to the MS lawyers to sort out a way to make sure that the company's IP does not get into the HBase code. From my point of view, this is really motivating knowing that 1) they *might* want to keep HBase and 2) they *might* put money into it (instead of burning some VC's money). Finally, I became a committer and some contributors helped so the patches kept coming.

Regarding HBase versions numbering, there is a big change. Current major version is 0.2 and the next one will be 0.18. The goal of this is that it will be clear to users what version of Hadoop they should use with each version of HBase. We will also try the follow a schedule for our releases. Here is what Jim said on the subject:

I do not want to get HBase as far behind the curve as we were with 0.2.0. Until that release, there was no version of HBase that worked with hadoop-0.17.x.

What I'd like to do in the future is feature freeze HBase when a new Hadoop release comes out, apply only bug fixes and release a new HBase as soon as possible after Hadoop releases.

Any features not included in a release will be included in the next one. In this way, there will always be a version of HBase that works with each released version of Hadoop shortly after the Hadoop release.

Since Hadoop 0.18.0 was released, HBase is now in a feature freeze and as soon as we fix the remaining bugs a release candidate will be posted. I will try to put up some release notes before it comes out.

The Unofficial HBase Blog

Hi everyone,

For those who don't know me, I'm one of the HBase committers but also a student in Software Engineering at Ecole de technologie superieure in Montreal, Quebec. A lot of things have happened in the recent months (Microsoft buying Powerset for starters) so I thought that it would be nice to have a blog on HBase. Subjects I will likely cover:

Project status
Release notes (from my point of view)
Introductions to HBase
Frequent problems new comers have with HBase and how to solve them
Personal notes on HBase

Oh and by the way, my first language is french so be kind regarding my english ;)

J-D

jdcryans's Blog