Understanding Bitcoin LevelDB Format (and messing up with bytes)

I’ve contributed to a small project I found useful in my digging of Bitcoin‘s LevelDB databases structure; leveldbctl is a CLI tool that permits to parse a LevelDB database rather simply. But it was missing a critical function for my use case, hexadecimal fields handling, which are heavily used by Bitcoin. Until now it was only capable of reading / writing strings.

It is now possible to use it to retrieve values from Bitcoin's (and probably a lot more cryptocurrencies) block index keys, i.e.:

$ leveldbctl --dbdir=samples/testindex -xk g 6635070000|xxd -p

That said, for the explorer that got into this post wondering how did I came up with that key value, here’s what I learned.
From the “official” documentation we know that the key format is <char> followed by a parameter. In the previous example, we retrieved the value of the key f (file) and block file number 1845. Let me explain:

f == 0x66, you can convert it using echo -n f|xxd -p
1845 == 0x0735, for instance using printf "%x\n" 1845

But one thing that’s not clearly explained anywhere, is that the “value” part of the key should be little endian. Big endian to little endian conversion can be achieved in shell with this combination:

$ echo 0735|rev|dd conv=swab
0+1 records in
0+1 records out
4 bytes copied, 9.137e-05 s, 43.8 kB/s

3507 is the little endian representation of 1845. Now about the trailing 0's, the documentation says this value should be a 4 bytes number, and 3507 is only 2 bytes long. Our final number is then 35070000

Same goes with the b-type key. It takes a block hash as a key “parameter”, let’s get the following block hash 0000000000000000000566072a6b442341a543e884c37a76cd16bc2a74dc58ec (details):

$ echo 0000000000000000000566072a6b442341a543e884c37a76cd16bc2a74dc58ec|rev|dd conv=swab
0+1 records in
0+1 records out

Considering that b is 0x62 in hexadecimal:

$ leveldbctl --dbdir=samples/index -xk g 62ec58dc742abc16cd767ac384e843a54123446b2a076605000000000000000000|xxd -p

About the chainstate database, according to the “documentation” (which is mostly a copypasta from a Stackoverflow reply), it seems to me the key prefix is wrong. It says:

‘c’ + 32-byte transaction hash

When really there is few records with a c prefix, and moreover, the trailing part of those is not 32 bytes long. Instead, a capital C prefix delivers, for example:

$ leveldbctl --dbdir=chainstate k|grep -a '^C'|head -1|xxd -p

Strip the first character (0x43 is C) from that result and take 32 bytes from there, i.e. 0000017981b8aec4a6bba617b32343cf1909670f62ef57cc308139d9d16ef0cb
Apply the same treatment than with previous keys:

$ echo 0000017981b8aec4a6bba617b32343cf1909670f62ef57cc308139d9d16ef0cb|rev|dd conv=swab
0+1 records in
0+1 records out

And here you have a valid TXID!

I hope these explanations made the LevelDB format as clear as I would have liked it to be when I started this journey ;)