chain of thought: go

Go comes with two categories of synchronization primitives, traditional locking mechanisms like mutexes, and channels. When implementing bitcoin-network I decided to, as an experiment, go solely with channels and not utilize any of the traditional synchronization primitives.

This was somewhat elegant in some situations, but really painful in others. One of the better ones were a straightforward module that already relied heavily on channels, the message dispatcher (essentially a demux). This is a module that reads messages from a single input channel and based on their type, forwards them to a channel to that type's subscriber. The (simplified) main loop looks something like this:

for {

	msg, ok := <-d.input
	if !ok {
		break
	}
	ch, found := d.subscribers[msg.Type]
	if found {
		ch <- msg
	}
}



With d being the dispatcher object:
type Dispatcher struct {
	subscribers map[string]chan<- Message
	input       <-chan Message
}


As long as all the subscribers are added to the map before the main loop is started, this module doesn't seem to need any form of synchronization. The issues start surfacing when looking at the shut down process. As it stands above, the dispatcher module can be shut down cleanly by closing the input channel. However, we have no feedback on this shut down, making it impossible to know how to handle the outgoing channels after the input has been closed.

Closing a channel from the reader side has the risk of crashing if the writer is still active. Shutting down the reader without closing the channel runs the risk of blocking the writer, causing us to leak memory and threads. Unless we let the dispatcher take ownership of the outgoing channels (ie. close them when shutting down), we can only solve this by adding a "shut down" method that blocks (or otherwise signals completion). As it greatly simplifies the downstream code to allow reuse of channels, I chose to go with the second option.

A more flexible approach than a shut down method is an Unsubscribe() method that removes one type from the subscribe map. Since a map structure is not thread safe, and we're not using any locks, the removal needs to happen in the main loop. We can achieve this by introducing a new structure and couple of new internal channels:
type unsubscription struct {
	key  string
	done chan<- bool
}

func (d *Dispatcher) Unsubscribe(key string) {
	done := make(chan bool)
	d.unsub <- unsubscription{
		key:  key,
		done: done,
	}
	<-done
}

New main loop:
for {
	select {
	case s := <- d.unsub:
		delete(d.subscribers, s.key)
		s.done <- true

	case msg, ok := <-d.input:
		if !ok {
			break
		}
		ch, found := d.subscribers[msg.Type]
		if found {
			ch <- msg
		}
	}
}

A call of Unsubscribe() will now block until the operation has completed, and as all the work is being done in the main loop, we don't have to worry about races. After unsubscribing all the subscribed keywords, it will be safe to close the subscription channels or just shut down the readers on those channels. This technique works, but we can already tell that more complex modules are going to need a lot of plumbing to get it right (look at the main loops here or here for examples). Even in the relatively simple dispatcher module we can already get simpler and more efficient code by utilizing a sync.Mutex to protect the subscribers map.

Obviously, this was all sort of a contrived example and the learnings don't transfer to channels in general, which certainly have their place alongside the traditional locking mechanisms (https://github.com/golang/go/wiki/MutexOrChannel provides some advice on when to use either mechanism). It does however show that the traditional locks still have a place in Go and that channel shut-down can get complicated quickly.

Bitcoin is based on a peer to peer network, where each peer, or node, verifies, stores and forwards the transactions taking place on the network. Some nodes, the miners, collect valid transactions into blocks and publish these. Each block references the previous block, thus creating a chain of them, the Blockchain. Although creating blocks is hard work and requires specialized hardware, anyone can subscribe to these blocks, then inspect and verify them.

By connecting to a node on the network, we can receive a stream of real-time transactions as well as the blocks being published as miners create them. For robustness and security, we should probably connect to several independent nodes, thus reducing the chance that transactions are being censored by malicious nodes, or dropped by a faulty one. Ensuring that we're well connected is far from trivial, so in this first pass we'll just connect to a handful of nodes and hope that they're not all colluding against us. Unresponsive or otherwise broken connections will be dropped and new ones created to replace them.

Implementation

I've written a simple implementation of this network layer based on the protocol documentation (refer to this for details on the various messages mentioned below, or terms that are unfamiliar). I chose Go as the implementation language, and as an experiment the code relies solely on channels for synchronization, which is elegant at times, but far from perfect. I'll write a separate blog post about interesting problems arising from these choices.

The code consist of two layers, the connection and the network. The former focuses on a setting up and maintaining a single connection, including sending and responding to pings. The latter will acquire the addresses of as many peers as possible, score the quality of them by various metrics and create and set up connections to a number of the best scoring ones. New peers' addresses are found by performing DNS look-ups, asking already connected peers by sending a getaddr message, or loading previously known peers from local storage. Usually this will quickly result in a pretty good overview of the network, with hundreds of nodes to chose between for connections.

Users of the library will get a channel to send messages on (either addressed messages or broadcasts) and the ability to subscribe to received messages by type. They don't have to worry about managing the connections and won't even be aware of the low level messages, with the exception of received version messages, that can be subscribed to by users who need to be aware of new peers being connected to (eg. because they want to address messages to them -- the example application included with the library will listen for version messages and send a mempool message to each new peer, thus initiating one or more inv messages containing transactions in the respective peer's mempool to be sent our way). This is a good starting point for beginning to write software interacting with the Bitcoin network.

Inspecting current connectivity

The example application contains a simple web server that can be used to see current connections as well as the applications view of the network. The page /peers on localhost:8080 list all known peers, with their connection status, address, time of last activity and a quality score.

Example peer list as seen on http://localhost:8080/peers:

(connected) 75.189.201.141:8333   2015-09-17 13:10:48.073832944 -0500 CDT Q(1000)
(connected) 73.162.143.196:8333   2015-09-17 13:10:42.964351941 -0500 CDT Q(1000)
(connected) 79.98.137.63:8333   2015-09-17 13:10:47.438899944 -0500 CDT Q(1000)
(spare)     52.11.33.24:8333   2015-09-17 13:10:32.825600001 -0500 CDT Q(800)
(pending)   50.147.76.86:8333   2015-09-17 13:10:32.825595045 -0500 CDT Q(800)
(spare)     69.143.149.38:8333   2015-09-17 13:10:32.825507889 -0500 CDT Q(800)
(pending)   24.52.35.44:8333   2015-09-17 13:10:32.695354838 -0500 CDT Q(800)
(spare)     54.153.97.109:8333   2015-09-17 13:10:32.825616717 -0500 CDT Q(800)
(spare)     25.96.107.191:8333   2015-09-17 11:08:03 -0500 CDT Q(678)

[...]
(spare)     71.93.135.167:8333   2015-09-17 05:02:53 -0500 CDT Q(313)
(spare)     155.133.19.109:8333   2015-09-17 05:00:28 -0500 CDT Q(310)
(failed)    50.177.196.160:8333, failed at 2015-09-17 13:10:43.100961354 -0500 CDT Q(300)
(failed)    173.69.49.106:8333, failed at 2015-09-17 13:10:42.699932334 -0500 CDT Q(300)
(failed)    198.38.93.227:8333, failed at 2015-09-17 13:10:42.700074666 -0500 CDT Q(300)
(spare)     68.238.62.169:8333   2015-09-17 04:49:01 -0500 CDT Q(299)
(spare)     72.241.129.31:8333   2015-09-15 16:09:59 -0500 CDT Q(0)
(spare)     84.104.142.12:8333   2015-09-09 23:23:42 -0500 CDT Q(0)
(spare)     87.220.72.124:8333   2015-08-20 08:24:59 -0500 CDT Q(0)

The application contains a simple In the example above there are 3 established connections, with 2 more pending. 3 nodes have had recent failures and are ranked lower because of this, while the rest are spares, ranked by the time of the last known activity from them (as seen by us, or as communicated by a peer in an addr message).

chain of thought

Wednesday, 30 September 2015

Go synchronization primitives

Thursday, 17 September 2015

Bitcoin peer to peer network

Implementation

Inspecting current connectivity