more innovation – ZFS Deduplication

When asked about Sun Microsystems, one word will always spring to the top of my mind: innovation

There is such a fantastic DNA in this company that looks to push boundaries and make things better – ok, we often do not got the message across well but the effort and dedication shown by employees always makes me proud.

To emphasis this point again there is great news as told by Jeff Bonwick earlier this week: "ZFS now has built-in deduplication"

Deduplication is a process to remove duplicate copies of data, whether it’s files, blocks or bytes.

It’s probably easier to explain with an example: suppose you have a database with company addresses, the location ‘London’ will exist for quite a few customers, so instead of having this entry 100 times, there will be one entry and the other 99 references to the original entry. So it saves space and lookup time as it’s likely that the reference will already be loaded in cache.

How easy is it to set up?

Assuming you have a storage pool named ‘tank’ and you want to use dedup,
just type this:

zfs set dedup=on tank

There is more to it, so read Jeffs blog for the whole story.

I’m guessing this should appear shortly in the OpenSolaris /Dev builds, which will feed into the next OpenSolaris release (2010.03) and possibly into a later Solaris 10 update. Once it’s released, I’ll try and run some tests to see the savings I get.

This should also feed into the FreeBSD project. Such a shame OSX has dumped their ZFS project.

%d bloggers like this: