Understanding Bitcoin LevelDB Format (and messing up with bytes)

I’ve contributed to a small project I found useful in my digging of Bitcoin’s LevelDB databases structure; leveldbctl is a CLI tool that permits to parse a LevelDB database rather simply. But it was missing a critical function for my use case, hexadecimal fields handling, which are heavily used by Bitcoin. Until now it was only capable of reading / writing strings.

It is now possible to use it to retrieve values from Bitcoin’s (and probably a lot more cryptocurrencies) block index keys, i.e.:

$ leveldbctl --dbdir=samples/testindex -xk g 6635070000|xxd -p
7fbebbc82287cf9112a3d431a3db7f84ecc1dc4684ece882200a

That said, for the explorer that got into this post wondering how did I came up with that key value, here’s what I learned.
From the “official” documentation we know that the key format is <char> followed by a parameter. In the previous example, we retrieved the value of the key f (file) and block file number 1845. Let me explain:

f == 0x66, you can convert it using echo -n f|xxd -p
1845 == 0x0735, for instance using printf "%x\n" 1845

But one thing that’s not clearly explained anywhere, is that the “value” part of the key should be little endian. Big endian to little endian conversion can be achieved in shell with this combination:

$ echo 0735|rev|dd conv=swab
3507
0+1 records in
0+1 records out
4 bytes copied, 9.137e-05 s, 43.8 kB/s

3507 is the little endian representation of 1845. Now about the trailing 0’s, the documentation says this value should be a 4 bytes number, and 3507 is only 2 bytes long. Our final number is then 35070000

Same goes with the b-type key. It takes a block hash as a key “parameter”, let’s get the following block hash 0000000000000000000566072a6b442341a543e884c37a76cd16bc2a74dc58ec (details):

$ echo 0000000000000000000566072a6b442341a543e884c37a76cd16bc2a74dc58ec|rev|dd conv=swab
ec58dc742abc16cd767ac384e843a54123446b2a076605000000000000000000
0+1 records in
0+1 records out

Considering that b is 0x62 in hexadecimal:

$ leveldbctl --dbdir=samples/index -xk g 62ec58dc742abc16cd767ac384e843a54123446b2a076605000000000000000000|xxd -p
8acc14a5b53c801d93798f168ae0be7b80aebc1800e0ff2f5d3068b71352
bf8d1a3a44fdfc41ab5b638b3bd4a7f70a000000000000000000c8a78990
778a18cd2bb110f9f2b061d29d7abc620f430c44a20adc35fe65670b64fb
b55e397a1117d1f973300a

About the chainstate database, according to the “documentation” (which is mostly a copypasta from a Stackoverflow reply), it seems to me the key prefix is wrong. It says:

‘c’ + 32-byte transaction hash

When really there is few records with a c prefix, and moreover, the trailing part of those is not 32 bytes long. Instead, a capital C prefix delivers, for example:

$ leveldbctl --dbdir=chainstate k|grep -a '^C'|head -1|xxd -p
430000017981b8aec4a6bba617b32343cf1909670f62ef57cc308139d9d16ef0cb010a

Strip the first character (0x43 is C) from that result and take 32 bytes from there, i.e. 0000017981b8aec4a6bba617b32343cf1909670f62ef57cc308139d9d16ef0cb
Apply the same treatment than with previous keys:

$ echo 0000017981b8aec4a6bba617b32343cf1909670f62ef57cc308139d9d16ef0cb|rev|dd conv=swab
cbf06ed1d9398130cc57ef620f670919cf4323b317a6bba6c4aeb88179010000
0+1 records in
0+1 records out

And here you have a valid TXID!

I hope these explanations made the LevelDB format as clear as I would have liked it to be when I started this journey ;)