This document explains the activities performed in Phase One load testing of the first version of the decentralized STORE consensus algorithm called Dynamic Proof of Stake (DyPoS). Version 1 of DyPoS is partially built on top of Tendermint and eventually will be replaced by BlockFin, STORE’s leaderless, fork-tolerant, high-throughput consensus protocol. Version 1 of DyPoS models various ways to improve consensus throughput, which is defined as the rate at which the validator nodes add transactions to the new block and successfully add the new block to the blockchain. These models help identify deficiencies in traditional approaches to consensus, so they can be better addressed in BlockFin. This is the first of a series of tests designed to measure the consensus throughput of DyPoS.
With its Dynamic, Fork-tolerant and Auditable Proof of Stake-based consensus protocol (DyPoS), STORE will secure free transactions, high throughput and a decentralized governance system with built-in checks and balances inspired by the U.S. Constitution.
Also inspired by the supply and demand principles of Uber Surge Pricing (blockchain economics) and Power of Attorney (blockchain scaling), STORE will secure crypto-powered incentives and payments inside of apps.
General Test Summary
What we are trying to learn
In testing the Dynamic Proof of Stake consensus algorithm, we want to see how the software responds to processing multiple transactions of different sizes at different rates (transactions per second) consecutively on a P2P network consisting of four validator nodes. The validator nodes are configured geographically in different regions to mimic a real-world setup.
The table below shows the plans for Tests 2 through 7.
In Test № 1, a total of 24 inquiries were performed over four nodes using six groups of transactions of varying number (5, 100, 1,000, 10,000, 100,000 and 1,000,000) and four transaction sizes (500 bytes, 1 kilobyte, 10 kilobytes and 100 kilobytes).
Test Details and Results
Environment and Tools
To load test the STORE consensus algorithm, a cluster containing four nodes was set up on an Amazon Web Services (AWS) Elastic Compute Cloud (EC2).
Each node is running a r3.xlarge instance with 4 CPUs, 30.5GB memory, and a 80GB SSD drive.
The nodes were physically located in the following regions.
The transactions were generated from an m4.2xlarge instance located in Canada.
Type of Test Performed
Load testing is the process of putting demand on a software system and measuring its response. The load test client used was TM-Bench while the transaction monitoring tool utilized was TM-Monitor.
TM-bench allows for benchmarking the performance with a specified transaction rate and size. For example:
TM-bench –T 5 –r 10000 –c5 127.0.0.1:46657
The load test indicated above simulates five clients (indicated by “c” in code above) sending 10,000 transactions per second with a 5-second timeout. For each test, we used the transaction sizes listed above (500 bytes, 1KB, 10KB, and 100KB.)
The 4-node load test was run using the following procedure:
- In each inquiry, a specific transaction size (say, 500 bytes) and transaction rate (say, 100 transactions/second.) was selected. The transactions are sent from five clients in each test.
- The inquiry is run until the validator nodes are saturated.
- At the end of the inquiry, statistics for block latency, blocks per second, and transactions per second are collected.
- The inquiry is repeated for different combinations of transaction sizes and rates.
- The performance statistics are plotted for average blocks and transactions per second to number of transactions.
Transaction Size: 500 bytes
In the graph above, the green line shows the number of blocks processed per second as measured by the average blocks per second. The red line shows the number of transactions processed per second as measured by average transactions per second. The X-axis shows the input transaction rate originated from each client.
The graph shows results for five different inquiries of 5, 100, 1,000, 10,000, and 100,000 transactions 500 bytes in size. At five transactions, the system is able to process the greatest number of transactions at the greatest speed. This is to be expected, as the system is running the lightest load in terms of size and number of transactions.
At 100 transactions, the system is processing a greater load in terms of the number of transactions. As a result, the number of blocks and transaction time increase. This was expected.
At 1,000 transactions, the number of blocks and transaction time decreases. This is also to be expected. The trend continues through 10,000 transactions, when the computers containing the nodes processing transactions start timing out.
Between 100 and 1,000 transactions per second, the slope is gradual and parallel to blocks processed per second. At 500 bytes, this would be the “sweet spot” where the system is performing optimally. Between 100 and 1,000 transactions, the system was able to process between 11,923,200 and 14,342,400 transactions per day.
Transaction Size: 1 KB
The graph above shows 5, 100, 1,000, 10,000, 100,000 and 1,000,000 transactions of 1 kilobyte in size. Performance between 100 and 1,000 transactions per second dropped considerably. This shows that the optimal load in this configuration is much closer to 100 transactions, but the system was able to support upwards of 1,000 transactions, albeit at a decreasing rate.
Like the load test of transactions 500 bytes in size, optimal performance is between 100 and 1,000 transactions. Unlike the previous tests, system performance declines at a much faster rate as the number of transactions processed per second approaches 1,000.
As the system processes more than 1,000 transaction more than 1 kilobyte in size, performance declines at a greater rate until the system starts timing out approaching 10,000 transactions, returning error messages.
Transaction Size: 10 KB
The graph above shows results for 5, 100, 1,000, 10,000, 100,000 and 1,000,000 transactions of 10 kilobytes in size. The first three inquiries running 5, 100, and 1,000 transactions were completed successfully, whereas inquiries running 10,000, and 100,000 transactions ended in failure as the system timed out before the transactions could be completed. The test of 1,000,000 transactions at 10 kilobytes was not run.
It should be noted that in our experience, transactions 10 kilobytes in size or greater are an uncommon, if not rare, event. In this test, the system is sending 100 transactions to the nodes for processing. The system is able to process the transactions received at a rate of 5 transactions per second. This performance can most likely be explained by the node receiving the transactions being saturated rather than validators not being able to keep up with the transactions.
The evidence for the node receiving the transaction becoming saturated can be found in the next inquiry processing transactions 100 kilobytes in size. In this series, the inquiries return error messages almost immediately at higher transaction rates.
Transaction Size: 100 KB
The graph above shows 5, 100, 1,000, 10,000, 100,000 and 1,000,000 transactions of 100 kilobytes in size. Of the six inquiries run, only three — processing 5, 100, and 1,000 transactions — were completed successfully. However, the inquiry processing 1,000 transactions appears to have finished processing just before the system timed out. Since the system did not return an error message, the inquiry is considered a success, but just barely.
The last three inquiries running 10,000, and 100,000 transactions ended in failure when the system timed out before the transactions could be completed. The inquiry of 1,000,000 transactions at 100 kilobytes was not run.
The results of Test №1 show some interesting characteristics that are typical of many blockchain networks. The consensus protocol works great with smaller transaction sizes (say, < 1–2 KB) at a lower rate (< 100–150 txs/sec.) This is to be expected because a small subset of transactions is picked up to be included in the block and each validator needs to sign a smaller chunk of data. Gossip for pre-vote and pre-commit also requires sharing smaller datasets.
As the transaction rate or size increases, there is a drop in the block creation time and transactions per second (txs/sec). Near the breakout point, where the rate and size is just enough for the validators, the total transaction “bytes” processed per second is almost the same.
In this series of tests, the number of validators was held constant at 4. However, in future tests where the variable tested is the number of nodes processing transactions, it would be expected that changing the number of validators would have a similar effect as increasing the number of transactions and the size of transactions. Increasing the number of validators should slow the rate at which transactions are approved. As the number of nodes increases in future load tests, throughput can be expected to drop at a proportional rate.
Test 2 Completed
To build on the results of Test №1, users sent transactions to all four validator nodes to observe behavior when transactions are distributed among nodes. Whereas in Test №1, one node was allocated a large number of transactions and transactions large in size, Test № 2 did the same, but to four nodes rather than one.
Consensus throughput was significantly improved, telling us that validators were under-utilized in the first test and real-world transaction sizes can be processed more easily.
A detailed review of Test № 2 will be published soon.
With active software development ongoing, the following tests are planned for the DyPoS consensus protocol:
In Test № 3, transaction rates and sizes will be distributed between a minimum and a maximum range in order to mimic real world events. Node performance is measured as transactions of different sizes are sent at differing rates.
An extension of Test № 3, Test № 4 will run over a predetermined length of time, such as 48 hours or 1 week. The purpose of Test № 4 is to measure the network performance over an extended period of time focusing on system stability and consistency.
In Test № 5, different scenarios will run with an increasing number of validator nodes ranging from 4, 8, 16, 32, 64, 128, to 256 nodes. This will measure the degree of decentralization, as additional nodes cause the blockchain to become more decentralized. The effect on the block creation time and the number of transactions processed is measured as validator nodes on the network increase.
We expect that the consensus throughput will drop to a point of failure as the number of validator nodes on the network increases. This test enables us to determine the number of validator nodes the system can support prior to failure.
In Test № 6, the validator “agents” are made to participate in consensus rounds to increase performance with a larger number of validators. With agents performing the consensus function, we expect the added consensus throughput to be largely unaffected by the increased number of validators on the system.
In Test № 7, the objective is to determine if transaction data can persist off the blockchain while nodes cryptographically verify transactions included on the block. This approach is designed to minimize the size of the main blockchain by keeping only that data required to validate the entire chain.
Definitions, Acronyms, and Abbreviations
Blockchain: A digital ledger in which transactions made in bitcoin or another cryptocurrency are recorded chronologically and publicly.
Breakout Point: The point or range of points where system performance is at its optimum.
Node: Node refers to a “full” client. A “full” client is a client that owns the block chain and that is sharing blocks and transaction across the network.
Block: Blocks are files where data pertaining to the Bitcoin network are permanently recorded. A block records some or all of the most recent Bitcoin transactions that have not yet entered any prior blocks. Thus a block is like a page of a ledger or record book.
Latency: Blockchains are designed to run on commodity machines that may be thousands of miles apart. Arriving at a common history of transaction order in this kind of asynchronous network is the classic distributed computing problem that distributed systems engineers deal with. The time lag between writing a packet onto the wire to receiving it on the other end could be on the order of milliseconds, seconds, minutes or even days. This is latency.