Sunday, 18 October 2015

Blockchain proof of existence, part 1

Despite the blockchain originally being designed as a ledger for a payment system, it is turning into much more. TØ, the Overstock subsidiary, has developed a way to represent financial assets as colored coins and recently reached a milestone when a hedge fund borrowed 10 million USD worth of stocks using their system. Others are using the fact that data written to the blockchain is all but impossible to modify once it's written. Small amounts of data can be written and, unless the blockchain at some point ceases to exist, it can always be read back exactly as it was written (there are potential attacks that can rewrite history, but they get harder fast as the data we care about gets older. For my use-case is, which is really long-term, I don't think these attacks are of any concern).

My current project is taking advantage of this fact, in combination with the fact that every block has a timestamp that is guaranteed to be within certain bounds and, like the data in the block, can't be changed once written. Thus, data included in a block can not only be read back correctly later, it can also be accurately (within a couple of hours) dated. In other words, if you write a piece of information to the blockchain, you can later prove to anyone that you knew that info at that time in the past.

Of course, once the information is written, anyone can read it. If that's not what you want, you're better of writing a cryptographic checksum of the information to the blockchain, keeping the raw data secret. When the time comes when you need to prove that the data existed at that point in the past, you share the original data, the method used to generate the checksum and the block where the checksum can be found, thereby proving that the data in question existed when that block was finalized (with the usual caveat that the algorithm used to generate the checksum must still be considered strong).

Bitcoin transactions

The blocks in the blockchain are made up by bitcoin transactions of various kinds. Most are transferring ownership of bitcoins between two or more parties. With the exception of the special coinbase transaction, every transaction has one or more inputs. A transaction doesn't need to have an output, but even transactions made with the purpose to put data in the block chain probably should have at least one. The reason for this is that the nature of the Bitcoin protocol is that every transactions spends all its inputs in full and any funds not going to an output can be collected by the miner of the block; unless you have an input containing the exact fee amount, you'll need a change output in the transaction.


Example bitcoin transactions
Even though payments are made to familiar looking bitcoin addresses (eg. 1QAsw4fsfq3MJdgiEGod3MxY9BY2EnTWEU), the inputs to a transaction don't technically come from an address, they come from previous transactions. Likewise, an address does not technically have any bitcoins in it, it has transaction outputs that are spendable with the private key corresponding to the address. Each output contains a script field that describes how the output can be spent, eg:
OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG
Bitcoin's scripting language can be used to program countless different behaviours, but most nodes will only forward a small subset of standard transactions. The script above is an example of a pay-to-pubkey-hash output, which is the most common type.

Alternative outputs and OP_RETURN  

There has been a lot discussion and controversy around how the Bitcoin protocol and the underlying blockchain should be used. The community is basically divided into two camps; those who want the protocol to be expanded and generalized to facilitate as many different use-cases as possible, likening it to the TCP and IP protocols that underpins the whole internet; and those who want to keep the protocol simple and reserve the blockchain for the Bitcoin currency it was originally intended for.

The Bitcoin Core implementation has tried to resolve the controversy by introducing a short list of standard transactions, including one intended for non-currency operations: the provably unspendable OP_RETURN output. This script allows for up to 40 bytes of arbitrary payload, opening up for alternative uses of the protocol without bloating the blockchain too much. The fact that the output is provably unspendable allows full nodes to prune it from their list of spendable outputs (which is usually kept in memory). At just 40 bytes, we won't be able to store much data in these outputs, but it's the perfect size to store a cryptographic checksum.

Wallets

When first starting out on this project, I imagined writing a small program able to send simple transactions with a little bit of payload would be almost trivial, especially considering that I had already implemented the whole network layer (of course, this whole exercise could have been completely trivial if I wasn't set on writing everything from scratch). As it turns out, forming valid Bitcoin transactions is a little more involved and doing it right requires the following:
  1. A list of keys pairs, or better, a way to generate them on demand. It is considered bad practice to reuse keys so, at a minimum, a new change key is needed for every transaction.
  2. Knowledge of transaction outputs spendable by those keys.
  3. Being able to form valid and signed transactions, taking care to send the change to an address you're in control of.
Modern wallets tend to rely on hierarchical deterministic key derivation for the keys (BIP-0032, BIP-0043BIP-0044), thus doing away with the hassle of keeping track of and backing up large amounts of unrelated keys. It's pretty straight forward to implement, and provided with a single seed it gives you an endless supply of fresh key pairs.

Part 2 turned out to be quite a bit of work, and essentially amounts to scanning through the whole blockchain and thus the entire history of Bitcoin. Conveniently the Bitcoin protocol provides something called Simplified Payment Verification that makes this achievable. I won't cover the details of this part of the protocol right now, but rather dedicate a separate blog post to that. The summary is that SPV makes it possible to cryptographically verify that the transactions you care about are recorded in the blockchain, while only downloading and verifying a tiny fraction of it.

First transaction

Once you have some spendable transaction outputs and addresses to send change to. Using the Hierarchical Deterministic Keys library I generated a BIP-0044 style Bitcoin account with a couple of external addresses
  • Chain m/44'/0'/0'/0/0:
    • Public key (compressed format):
      032d9850b19296fe1077f28a3c64d957f5242367dc6d42595b7ca7ca1aaed3dd1d
    • Double hashed public key (first SHA256, then RIPEMD-160):
      66fb2649c17e60c4cd16a05c903a161a24aa11a6
    • Base58 encoded public key hash (this is what's usually thought of as the address):
      1APWnkAgU5iaiJtz7Ga7i3pGA127oQnnTG
  • Chain m/44'/0'/0'/0/1:
    • Public key (compressed format): 02abab53af3804c090db8255cc4a6f2c8d3783a8d873db93d689135e39db358df0
    • Double hashed public key (first SHA256, then RIPEMD-160):
      fe2927160e030613c119fbc68bc9ab0576f59118
    • Base58 encoded public key hash (address):
      1QAsw4fsfq3MJdgiEGod3MxY9BY2EnTWEU
And a change address (this address is not in any way different the other two from a protocol perspective, it's just a BIP-0044 method of accounting) 
  • Chain m/44'/0'/0'/1/0:
    • Public key (compressed format): 02e4531becbd0b5f86b611fabe85a63f00cf8a2a1f9ff324f0afb084e192d4da56
    • Double hashed public key (first SHA256, then RIPEMD-160):
      5ee70c467c5b2c35d2d08ef0d6d202b9c917b261
    • Base58 encoded public key hash (address):
      19eoJXTS4gWk1ZAuaQji73JnBNZBhgkpGJ
I then funded my new wallet with 2 mBTC (milli-bitcoins -- about 50 US cents at current rates) by sending them to the first external address. This transaction (dfea6c60cdd3630965d8fc4d9a3090d6dc2b6a802492f7ef02d8b55ac4669bca) has two output, the first of which has
  • Value: 200000 (satoshis)
  • PkScript: OP_DUP OP_HASH160 66fb2649c17e60c4cd16a05c903a161a24aa11a6 OP_EQUALVERIFY OP_CHECKSIG
The script is a standard pay-to-pubkey-hash script, and the double hash of the first public key that was just generated (if you inspect the transaction on blockchain.info, you'll see the m/44'/0'/0'/0/0 key show up both in the base58 address format, and as a the hex encode hash). That key corresponds to the following private key that was generated as part of the same key pair, and as such, controlled by my wallet:
  • private key (compressed Wallet Export Format):
    80107c9829b61c25851f5d5aa4e1e642eef64d1d995a1df1d46f632360a6dc0cb001 
 The new wallet is funded and ready to send a transaction of its own. 

This transaction will have two outputs, one sending 0.5 mBTC to fe2927160e030613c119fbc68bc9ab0576f59118, and one sending 1.4 mBTC to the change address 5ee70c467c5b2c35d2d08ef0d6d202b9c917b261. The remaining 0.1 mBTC of the input is not sent anywhere and is a fee that will be collected by whoever mines a block with our transaction in it (this is the incentive we give miners to actually include our transaction in a block).

With outputs added to the transaction we ready to include the input and prove that we own it:
  • Previous output hash:
    dfea6c60cdd3630965d8fc4d9a3090d6dc2b6a802492f7ef02d8b55ac4669bca
  • Previous output index: 0
  • Signature: 30450220324de09f8f7c8908d4a4b99c13aa28d1a5b3b680a196e6e6d39255e2b4d126c802210089fed6635be470c33f717db1a44f89150ebad18ab511fcf176c959cef7a9c94201 
    032d9850b19296fe1077f28a3c64d957f5242367dc6d42595b7ca7ca1aaed3dd1d
The signature is the proof that this is an input we control as well as an insurance that our transaction is broadcast and recorded unmodified. The second part can be recognized as the (compressed, unhashed) public key of the address the previous transaction was sent to. The first part is the result of signing our transaction (a double SHA256 hash of a serialized version of it) with the the corresponding private key, using an Elliptic Curve Digital Signature Algorithm (ECDSA). The signature proves that we're in control of the relevant private key and, as the whole transaction, including the outputs, is signed, it prevents tampering with our transaction after it's sent.

And that's it. After broadcasting the transaction to a few connected nodes, it was quickly forwarded to the rest of the network, and can now be seen in places like blockchain.info. A few minutes later it was included in block 378726 mined by AntPool. Alongside the 735 other transactions in that block it's now forever recorded in the blockchain.

Next step and source code

The next step will be to send a transaction with a payload and have this data recorded in the blockchain.

If interested in the source code written so far, it can be found here.

No comments:

Post a Comment