Inspecting Tezos decentralization: 200+ public nodes, 1000+ in total

0
19

When it comes to arguing Tezos decentralization they usually put roll distribution on the first place saying: “look, top 5 entities own more than half of the stake”. More advanced also highlight attacks on the voting mechanism: how many entities can block or force a proposal (which is actually a changing value).

However it’s not that straightforward, because once you are in a Proof-of-Stake network it’s not just rewards but also Value at Risk. At the end of the day it’s risk/reward ratio that matters when it comes to economic incentives and it’s only if we assume all agents are rational!

Ideally, for each attack vector (and strictly speaking every proposal introduces a new vector) one should estimate reward/VaR considering all risks for each attacker class (there are more than one profile).
We leave that for a separate study, but in this article, let us focus on another aspect of decentralization namely P2P layer.

Collecting peers and connections

In order to conduct a comprehensive analysis, we needed a high-quality data set.
Basically we could just set max_connections in the node config to a relatively large value and use /network/points RPC endpoint. However, as we found out, this output is rather polluted with nodes having different chain_id or nodes that are not operating.

Moreover, we also wanted to try to build the network graph so we needed not only vertices (nodes) but also edges (connections). We didn’t get to do it precisely in the end, but we learned a lot about how P2P works in Tezos.

Tezos Handshaker

Anyways, we went deeper and wrote a simple P2P scanner that connects to bootstrap nodes and queries known peers, then tries to connect to those peers and query their connections, etc. It worked great, however we faced several limitations:

  • Obviously, we couldn’t query known peers from nodes that are not exposed to the internet ( hidden nodes). Basically that’s fine, since we are mostly interested in public nodes;
  • Some nodes were probably rejecting our connections because they have reached the maximum connections count or for other reasons. As a workaround we do the scanning in a repeatable manner, however that does not give us 100% guarantee we’re not missing something;
  • The main problem is related to the way nodes respond to the request: they return no more than 50 results, of which 30 are best (active connections sorted by the time of establishment), and the remaining 20 are random (could be both active or not).
P2P LIKE A PRO
If you are interested in how P2P layer works in Tezos, check out the SimpleStaking blog.

Another problem relates to determining whether a node belongs to a particular network, in our case mainnet. We can confidently distinguish between public nodes, as they return version string during the handshake, however we cannot be 100% sure about hidden nodes. All we can say is that if a particular hidden node is known by several public mainnet nodes, it is likely to be mainnet node as well.

We are not sure about the reasons why carthagenet/zeronet/other nodes occur in the list of known peers of mainnet nodes. Probably this is due misconfiguration, or one’s running several nodes on the same machine, or else.

Goals and objectives

Given the above problems and limitations, we had to decide what we could calculate and how. We have formulated several goals:

  1. Identify all public nodes as they are in essence the “center” of the network and have the greatest importance;
  2. Try to detect active hidden nodes using heuristics;
  3. Make geographical analysis of these two groups;
  4. Draw an approximation of the network topology.

In order to do that we used the following algorithm:

  1. Do iterative peer scanning in order to handle max-connections issue and enumerate all random points;
  2. Finish the scan when the number of nodes stop growing for a sufficiently long period of time;
  3. Filter out nodes that do not belong to the mainnet
  4. Assign a score to each hidden node calculated as the number of public nodes that know that particular node;
  5. Filter out hidden nodes that have score less than the average.

Terms and conditions

In this article we will operate with the terms Public node and Hidden node. In both modes nodes are connecting to others, but only public ones accept incoming connections.
Bootstrap nodes are the default ones specified in the node config. This is actually a single hostname hiding a load balancer that routes requests to 27 nodes spreaded across the globe.

DISCLAIMER
In this article:
We analyse only Mainnet nodes;
The scanning method is time-stretched and it’s not possible to make a snapshot at a particular time;
We only rely on the geographical location of the nodes as well as the connections between them;
We recognize that we may not have scanned the entire network or may included inactive nodes in the dataset.
Thus, it’s important to understand that our results DON’T fully characterize the system.

We will look at the criteria for decentralization which determine how well the network can oppose a breakdown or an attack.

Tezos mainnet results

NUMBERS
During the scan we have discovered:
6298 addresses in total
1679 presumably operating nodes
203 public nodes

As you may notice, there are far more nodes in Tezos mainnet than the number of bakers. It is clear why the bakers should be decentralized (in all senses), but what about the other nodes? What are they?

Roughly speaking, while baker nodes ensures the valid state of the blockchain and actually “write” the data, the rest of the network provides decentralized access to that data (i.e. “reading”) and makes sure broadcasted “write requests” reach the baker.
This is just as important as block validation, because what’s the point in a decentralized network if you cannot access it in a decentralized way.

In the next chapters we will analyze all (presumably) running nodes and public nodes in isolation. Note, that while we are pretty confident about public nodes, there are certainly some deviations when we operate with the whole network. Still, we think it could give some interesting insights.

Geographical distribution

This is an intuitive criterion: the more continents, countries, jurisdictions, segments of the global network are covered by Tezos the better.
Connectivity and network topology are also important, especially their dependence on transcontinental communications and tier-1/2 operators, but we will examine that a bit later.

The heat map looks good, and although there are obviously countries with high concentrations of nodes, we will see later that these are mostly cloud provider data centers.

NUMBERS
Tezos nodes are distributed across 56 countries and 193 regions.

Let’s take a look at each of the sub-criteria in detail.

Hosting providers

Before we move on to detailed statistics by country and region, let’s look at the distribution of nodes by hosting providers.

Not surprisingly, we see the prevalence of popular cloud hosts, but if you take into account the country where the hosts are located, the numbers are not that big. For example, top 3 cloud providers with data centers in US (AWS, Google, Digital Ocean) host 300 Tezos nodes. The actual question is how important are those nodes for the network in general, and although we cannot answer that from the staking perspective, we can analyze the network topology based on our dataset.

Countries

Europe and the U.S. dominate, taking on about 2/3 of all nodes.

Interactive map

Note the (decimal) logarithmic scale.

Regions

As for the regions of individual countries, we can see that there is a correlation with the location of data centers of the largest hosting providers.

Interactive map

It’s more interesting, we think, to see how Tezos is scattered around the planet. Use the zoom to see the names of settlements.

Tezos network topology

We will only investigate the logical network topology. Unlike the physical topology, we will not consider the physical distance between nodes, latency and speed of packet propagation in the underlying network (Internet).

NOTE
As was pointed out, the numbers can differ in reality, but the topology will likely remain the same.

Using nodes as graph vertices and known peers connections as edges we built a network graph and calculated its basic properties.

MAINNET GRAPH
Radius: 2
Diameter:
3
Average path length:
1.9
Center size:
1082
Clustering coefficient:
0.82
Density: 
0.008

Here is a simplified interpretation of the results:

  • , , and are small which is good for network synchronization and fast propagation, and also says that presumably every node can reach the network center directly or via a trusted peer, or is part of the center itself;
  • is more than half of all presumably running nodes, supposedly it’s a more robust estimation of the network size that we used;
  • is high, the network is divided into three clusters, varying in the degree of connectivity. This is most likely a side effect of the way the scan is done, so let’s not give it much importance;
  • is low which indicates that Tezos graph is sparse;

Public nodes

Let’s take a closer look at the public nodes, we are particularly interested in how they are distributed across hosting providers and countries.

In theory, you can optimize the latency and improve connectivity using this information, e.g. in order to deal with endorsement misses or resolve other network issues.

Top countries and hostings

While the world’s largest cloud providers provide a highly reliable service, diversification will never hurt.

Interesting observation: half of Tezos’ public nodes are spinning on Amazon, including all the bootstrap nodes.

Bootstrap nodes alternatives

There is a predefined set of peers (set in the default configuration) a new node initially connects to. These peers called bootstrap peers and there are currently 27 of them, hidden behind load balancers. It is logical to assume that they are part of the center, and we will mainly care what proportion they make up and how far they are geographically dispersed.

The question that worries many people is what happens if the bootstrap nodes suddenly stop working?

As the graph shows, nothing terrible.

Further work

Using results of this work we will enrich our products with two features:

Stay tuned!

Originally published at https://baking-bad.org on July 30, 2020.


Inspecting Tezos decentralization: 200+ public nodes, 1000+ in total was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.