Guest User

June 3, 2022

Understanding Crypto

Understanding Crypto 1: Daniel Mescheder: What Problem Do Blockchains Actually Solve?

Guest User

June 3, 2022

Understanding Crypto

As a software engineer, Daniel designs data systems for a living - until recently at TomTom and now at Amazon. On his blog, Daniel writes about software projects trying to find a more rational and nuanced perspective on technology and the culture around it.

Welcome to the first episode of our limited series focusing on cryptocurrencies and everything you need to know about them. Our first guest is Daniel Mescheder, who joins us to appropriately break down some of the basic concepts and engineering of the blockchain, using his expertise as a software engineer as the lens for this discussion. We felt this chat was the best way to launch the series and prepare listeners for the following episodes on the subject, and you can expect to hear Daniel share very helpful insight and explanations of fundamental terms and concepts such as distributed systems, consensus, hashing, digital signatures, and more. We also have time for our guest to weigh in on the subjects of smart contracts and NFTs, both of which are regular points of intrigue for the uninitiated. Importantly, we do hear from Daniel about the limitations of the technology at present, and which types of technological problems that he believes the blockchain is well-suited to address. So for all this and more, and to start this journey with us into such an important and hot topic, make sure to listen in.

Key Points From This Episode:

The reasons for Daniel's interest and involvement with the blockchain and cryptocurrencies. [0:03:33]
Daniel compares the hype around AI in the 1980s and the current atmosphere for crypto. [0:04:50]
Getting to grips with Daniel's specific perspective on the blockchain and explaining distributed systems. [0:06:34]
How the concept of consensus fits into the subject of distributed systems. [0:11:17]
Looking at Byzantine consensus problems and how these occur on the blockchain. [0:13:51]
Daniel gives an overview of the elements that make the blockchain functional; hashing and digital signatures. [0:19:17]
How Satoshi Nakamoto introduced an economic incentive to comply with the protocol. [0:24:09]
Differentiating between the public and permissioned blockchains, and databases. [0:27:33]
How Bitcoin achieves consensus and some of the downsides of proof of work. [0:33:31]
An assessment of the decentralized status of the Bitcoin and Ethereum blockchains. [0:41:16]
The amount of control that is held by miners in relation to transactions. [0:45:27]
Understanding interactions between the blockchain and other external systems. [0:49:16]
Immutability and the blockchain; what the rules allow and the questions that still need to be answered. [0:52:47]
Basic engineering downsides to the blockchain. [0:54:40]
Vulnerabilities on the blockchain and how these have been exploited by hackers. [0:58:23]
NFTs, DAOs, and smart contracts; weighing how neatly these fit into the current blockchain ecosystem. [0:59:27]
The abundance of rhetoric surrounding discussion about the future and validity of the blockchain. [1:06:09]
Which problems would be well-suited to a solution found within the blockchain? [1:08:10]

Read the Transcript:

So Daniel, why are you interested in crypto and blockchain?

The area that I'm most knowledgeable about and that I spend most of my professional time in is architecting and engineering systems that store and process data in general, but I'm also generally super interested in the dynamics of software projects, why some succeed, others don't and are there any patterns that can be observed? So I think I have a bit of a double interest in blockchain technology here. I have professional responsibility to study new technologies, to be able to identify when a technology matches a potential problem that I come across, but on the other hand, I also think that it's a very interesting space to study because there are some dynamics that can be observed, which are quite interesting to understand how the industry is behaving, especially when you start comparing it to other hypes, like the artificial intelligence hype of the '80s. By the way, this is also one of the reasons why I write about these things, because sometimes I get interesting comments back, like you missed something important in your argument, or my experience was quite different and these are always opportunities to learn.

100% agree with that. That's one of the things we love about creating content as well as people come back and say-

I can imagine.

Yeah, it's great. I want to follow up on something you said there, you mentioned the artificial intelligence hype of the '80s. I don't know what that is. That sounds interesting. Can you just briefly say what that is and why it's related to this.

A lot of up and coming technologies follow a hype cycle where in the initial phases you have a lot of ambition and a lot of expectations attached to a new technology. So artificial intelligence is something which was very big in the '80s, which basically means you have machines that use data and that use procedures to make decisions by themselves, which previously, only humans would make, and there was a lot of interest in that, this interest really ebbed off in what's known as the AI winter, artificial intelligence winter, and recently we've seen a reemergence of artificial intelligence projects and problems, but in slightly different shape.

Interesting. So are you using blockchain in practice?

I'm not using that technology in any production setting or professional context beyond what is, I'd say reasonable experimentation to get acquainted with the technology and to develop the necessary understanding. Maybe related, it's maybe also important to know that I have no long or short position in any crypto assets. Some people say it, "You really need to dive into the community to get it." I think it also works the other way around, the debate is sometimes a bit more complicated, especially when you just want to discuss the cool technical facts, because you don't quite know whether the speaker has any financial stake in the position that they're representing.

That's good to know. We're talking to a bunch of experts on various aspects of this topic, as I'm sure you understand the cryptocurrencies span so many different disciplines, which is, I think one of the reasons that they're kind of hard to think about and understand. Can you talk a little bit about the perspective that you bring to this topic?

Yeah. I think this is a very important point and a very important question also because the entire debate is complicated by the fact that there are so many different angles to this space. There is the ecological problem of course, there is a big question around monetary policy, about business models and then how far they're distinct from Ponzi schemes. The question about crypto as investments or as assets, fraud and whether it can be prevented or whether it's just inherent in the system. I'm not at all an expert in monetary policy or any of these fields. So I'm also not going to take any side on whether or not a currency without a central bank is a good idea in general, because there are people that spent a lifetime studying these kinds of questions and I haven't.

So I'm really going to take the engineering angle for which kinds of problems could blockchain technology potentially be interesting and where don't I think it's interesting. That doesn't mean that the other angles are not important. Maybe the engineering angle is even the least important, one of all of them, but that just happens to be the part of the discussion where I can contribute.

What is a distributed system?

That's kind of the angle that I take when looking at blockchain technology, maybe to understand what a distributed system is let's take a little step back and look more generally at systems that store and process data, and then see how we arrive at the distributed system. Let's take an example of, let's say we wanted to build a system that handles financial transactions, because that's kind of the bedrock application for these blockchain systems. In theory, I could propose a new currency, which is called Daniel coin, and that is working as follows. I've got an Excel sheet on my laptop. This Excel sheet contains some initial balances. Ben you have $50, Cameron you have $80 in that Excel sheet, and then for every transaction that happens, we add a new row to that Excel sheet to say what happened.

To make a transaction you can just send me an email and I will update that Excel sheet, and if you want to ask for your balance, you can do the same thing and I will tell you what your current balance is. If we just rented the shared flat, and we just wanted to keep track of who paid for who's lunch, that would be a system that works for most practical applications. This is obviously a very bad system. There is, of course, the very big question of, can I be trusted with managing this system? But putting that aside, I wanted to look at two different problems with this system, which is resilience and scale.

Looking at scale first, let's say, well, I can update the Excel sheet if I have two transactions per day, but what happens if we have 20,000 transactions per minute or per second, or even more. In that case, I'd be very, very busy, and even if we automated that system so that I don't have to do that manually, and my laptop is doing that, at some point my laptop will reach its limits and that is really the problem of scale. Then the problem of resilience is what happens if my laptop breaks and the Excel sheet is gone? Or even if I have a backup and I can restore the Excel sheet. Maybe at some point I need to reboot my laptop, and while I'm rebooting, the system is not available and we cannot handle transactions, and for most fill systems, this is not acceptable.

Distributed systems are really one part of the typical answers to these two challenges. So you can partition the data, meaning you can send different parts of your data set to different machines. We can have one laptop in Toronto that stores the transactions correspond to all accounts starting with A. A laptop in Montreal corresponding to the transactions of all the accounts starting with B, et cetera, et cetera. That is somewhat addressing the scaling issue because now we can distribute the workload onto different machines. For the resilience issue, we can duplicate data. We can have one copy of the entire Excel sheet here in Brussels, let's say, and another copy of the same Excel sheet in Toronto, and if one of these laptops is rebooting, you can access the data contained in the other Excel sheets. But now we're in the territory of distributed systems and that comes with a lot of challenges on its own.

Continuing on that, can you talk about what consensus is in the context of distributed systems?

So consensus addressing one of these challenges, having data spread in multiple locations comes with a lot of potential problems. One of the problems is to keep each other up to date. So we have now one machine in Brussels and one machine in Toronto, these machines send each other messages, and we need to use these messages to make sure that these two Excel sheets contain the same data and that we don't have one sheet that says, then you have $60 and another sheet that says you have $50. What makes this worse is that networks are unreliable, and it can happen that a message that I send from my machine in Brussels to your machine in Toronto contains an update, and that update is not received. I can wait for an acknowledgement from your site to know that you have received the update that sourced that somewhat, but now we have a new problem because that acknowledgement might get lost and I might assume that you haven't received my update, but in fact you did.

So we are getting into all of these difficult questions. The good news is, for this problem, which is called consensus, which is to have multiple actors or multiple machines agree on the same view of the world, we do have some algorithms that solve this problem. Paxos or Raft are names of algorithms that are sometimes used, these work under a certain assumptions. What is maybe important to note is that this has a price, trying to achieve consensus is quite expensive, and usually when designing a system, I need to ask the question, do I even need consensus? For a financial application maybe the answer is yes, because we really want all the machines to agree on the current balances, but let's say we're designing a system like Twitter, maybe it's not so important that everyone really has the same view on the data.

Can you expand a little bit on what you mean when you say expensive? Because I think expensive in a software engineer language is very different from in a-

Oh yeah, that's true.

Financial language.

Indeed. It is computationally expensive because the different participants in the protocol have to send a lot of messages back and forth over a network and sending messages over a network is one of the most expensive things that you can do in computing, the least performant things let's say that you can do.

What is a Byzantine consensus problem?

So we've looked at consensus in general and distributed systems, and what I said earlier is that the main enemy in traditional consensus is really unreliable networks and unreliable clocks to some degree, but that's more an aside, but here in this setup, I still assume that I control the software that is running on all of these machines. I control the software which is running in Toronto. I control the software which is running in Brussels, and you kind of trust me that this is done correctly. All of these machines follow the protocol. Byzantine consensus says some of the participants in this consensus scheme will try to cheat.

So for example, we could say that instead of having these two machines that I run, each of us is actually hosting your own version of that Excel sheet, but now Cameron, you could try to signal to everyone actually I have a billion dollars on my account. By doing that you would try to cheat the system and to convince us that, that is actually the case. You could come up with ideas to mitigate that issue, which is for instance, you could go for a majority vote. We all have a shared view on that data. Everyone has a copy of that data and Ben and I could say, "Well, actually, according to my information, you only have 80 bucks." So definitely not a billion. That would be one potential idea of how to solve this Byzantine consensus problem.

A typical way, how this is occurring in the cryptocurrency space is the so-called double spent problem. That is the name that you sometimes come across. The way that the double spent problem occurs is when I might want to buy a pizza from you, you see the payment happening on the blockchain or in whatever system we are using and you give me the pizza. After the transaction has taken place, I convinced the network that before that I actually transferred all the money that I have on that account to another account that I also owned. So at the time where our transaction took place, I didn't actually have any money to begin with. So that transaction couldn't have happened and you don't get any money but I received the pizza. Byzantine consensus is basically trying to solve that, the consensus in the face of malicious actors.

What is a consensus protocol?

A protocol in general is a convention of how to exchange messages. For instance, when you say, "over" on the radio to end your message and to signal to the other person that now is your turn to speak, that's in a sense, a protocol and a consensus protocol is basically a way of exchanging messages in such a way that consensus is established and that everyone at the end of the exchange agrees on the same view of the world.

How is consensus related to blockchain?

So finally we are approaching blockchain. The thing that Satoshi Nakamoto wanted to solve is having a currency that does not rely on any central entity that everyone needs to trust. This means that you really want to do what we just talked about. We want to take that Excel sheet and everyone has a copy, and we need a way of exchanging messages that make sure that we all agree on the contents of the same sheets even if one of the participants tries to cheat, because we don't have any custodian of the data that handles this for us.

Vitalik Buterin, the founder of Ethereum took this a bit further and said, "Well, instead of an Excel sheet with all the transactions, we have a sheet that contains any kind of data, and we also have some macros that can be executed automatically." But the idea is generally the same. We have this problem of everyone having a copy of the data, and we need to make sure that even if someone tries to cheat, we all agree on what this data is. Now we have seen that there are algorithms for the Byzantine consensus problem like I introduced it earlier and we said, you could potentially try a majority vote to solve that. However, these algorithms that we know, they don't really protect against all kinds of attacks.

When you're talking about Byzantine consensus, you really need to ask what is the threat model that I want to protect against? How many malicious actors are there in my system? What kind of cheating behavior am I expecting and what kind of cheats do I need to resist? So one important term in this context is the concept of a civil attack. So we said earlier, Cameron, you want to convince the network that actually you have a billion dollars and we are now introduced the majority vote. But what you could try to do is you create many, many clones of yourself, meaning machines, that signal to everyone, yes, yes, I agree, Cameron has a billion dollars on his account and now there goes our majority vote, and these civil attacks is something that many traditional Byzantine consensus algorithms do not really solve, but for the thing that Satoshi Nakamoto wanted to achieve, this is really necessary, and that's where blockchains really come in.

So we've gotten into blockchains now, we talked a little bit about consensus. Can you do like an overview of the key components that make a blockchain work?

Of course, this is a bit oversimplified, but I think the two main components that are very fundamental to blockchain technology are hashing and digital signatures. There is a bit more that goes into that, but I think if you have these two, you already have like 80% of the ingredients. A hash is a way of generating a small fingerprint for a piece of data. More specifically a hash function is a way of generating a small fingerprint for a piece of data. Data can be a document or music or anything, number, and the thing that is special about a hash function is that it needs to satisfy certain properties.

One property is there mustn't be any easy way of determining the original data that produced the hash. More technically you should not be able to find the pre-image of a hash, and the most efficient way of doing that, you get a hash and you want to know what data went into the function would be to guess a lot of potential inputs and then compare the output with the original hash that you got, which is of course, very, very inefficient, but that's the most efficient thing you can do to reverse this function. Computing, the fingerprint itself is easy. Furthermore, it should also be almost impossible to find two pieces of data that have the same fingerprint, the same hash. They exist, but it should be very hard to find them.

So that's hashes and digital signatures are a mechanism that allows you to prove that you acknowledge a certain piece of data. So you can provide me with a very big number, and based on that number, by looking at that number, I can say with a large degree of certainty, that this proves that Cameron had knowledge of a certain piece of data and no one else could have falsified this trying to pretend to be you. This is a bit abstract and it's a bit magic the first time you hear it, but it basically relies on the idea that you have a secret on your sides, and you can prove to me that you have knowledge of that secret without actually revealing that secret to me.

Wow.

That is how digital signatures work. These two things are very, very useful techniques and they're used really all over the place they're not specific to blockchains. In the case of blockchains, simplifying a little bit, again, examples of how they're used is the blockchain is a sequence of blocks and that's where the name comes from, and each of the blocks contains a bunch of transactions. So basically the Excel sheet that we talked about earlier is cut up into different blocks, and one block contains one sequence of transactions. Each transaction in one of the blocks is signed digitally by the "account" which is sending the money. What account means that depends a bit on the blockchain that we're talking about. But the signature basically guarantees that I cannot spend your money, which is a very useful property for a financial system.

Similarly for hashes, these blocks, they form a chain which is called a hash chain, which means that each block contains the hash of the preceding block. More technically it's the hash of the header of the block, but it doesn't really matter. So what this does by adding the fingerprint of the preceding block to my current block, I enforce a certain temporal ordering because by adding this fingerprint, I can prove that I had knowledge. I have seen the previous block at the moment where I generated the current block.

Wow.

These things existed way before blockchain. They're extremely useful and used in all kinds of computer systems and I think that also none of the blockchain minimalists would argue that these components aren't useful. They're very useful.

Interesting. You talked about hash functions, digital signatures, linking blocks together. We've also talked about distributed systems, which I think is probably also important as a component of blockchain. Is that true?

Exactly. Yes, because that's the problem that we try to solve. So now we have some technologies and we have the problem that we try to solve, namely establishing consensus in the face of adversarial behavior, and now the question is how do we actually achieve that?

Interesting. We've got hash functions, digital signatures, distributed database. We've got consensus. You just mentioned something that I think is important. All of these things have existed for a long time. That's one of the things that surprised me when I started researching this stuff is that it goes back to the what? '60s even for some of the technology, or maybe not quite that early.

For some of the things. Yes. A lot of progress on distributed systems has been made in the '80s. Leslie Lamport is a big name. Cryptography, it goes back before the age of computing, we've had cryptography. So yeah, these are variable techniques.

This is a normal thing with innovations, I think where it's tying a bunch of existing things together in a new way. So what was unique or what was the innovation of Satoshi Nakamoto's introduction of the public blockchain?

It's exactly what you say. Some people argue, well, it's all already been there, but I think this is true for all innovation that usually it's just people recombining existing knowledge or existing techniques in some way that happens to match a certain problem that they had at that time. So we said that the goal of Satoshi Nakamoto was to build the system, which resists civil attacks and solves the Byzantine consensus problem. I think the key insight here, the thing which is really new as far as I see it, which has been introduced in this 2008 Bitcoin paper, that the mysterious Satoshi Nakamoto wrote is the idea of creating an economic incentive to comply with the protocol.

A miner in blockchain is basically a machine or a person, which is generating the next block, taking all these transactions together, bundling them and proposing a new block to the network, and miner needs to spend money, to create a new block. It's very expensive to do that. However, you're also allowed to give yourself some money in the block that you are creating because there are transactions in the block and you can create a new transaction where you create some money out of thin air and assign that to yourself. So that means that that way, the rest of the network can decide whether they accept your block or not, and if they accept your block, you have a net positive transaction because you spend a lot of money trying to generate that block, but you get more money back through this so-called coin based transaction.

However, if you try to attack the network and people don't accept your blocks, and you try to do that unilaterally, then you have to spend a lot of money without getting anything in return, unless of course the attack is actually successful. The way that this is achieved, the most interesting part here is how do you actually make it expensive to find a new block? This is where this entire system of proof-of-work comes in. The idea is, proof-of-work is again, using hash functions. We said it should be very difficult to find a piece of data which maps to a certain hash and to create a new block, a miner needs to find some data that produces a hash that has certain properties in the case of Bitcoin and Ethereum as well. It needs to start with a lot of zeros, and we know that the best way of achieving that is to guess a lot of data and compute the hash and see whether that hash matches your expectations, and that is very expensive because you have to literally spend energy and electricity to do that.

What is the difference between a public blockchain and a permissioned blockchain?

I think this is sometimes a bit confusing. The main issue that I have with permissioned blockchains is their name, because they do share 12 letters or so with the public blockchain. I think in public discourse, when you read Twitter, the word blockchain usually implies that Nakamoto consensus. The idea of building something which is civil attack proof and can achieve consensus in such a distributed system. I think that is the thing which is also really new here. Permission systems like say Hyperledger Fabric, which is the system that IBM developed say, "Well, what if we could control participation to the network?" So you could do that by literally checking all the transactions that are proposed or all the blocks that are proposed, or you can limit who can actually write to the chain.

Once you limit access to the system, you don't have that civil attack problem anymore because you cannot just replicate a million instances of yourself because the authority that decides who can participate will say, "No, they can't, they are not allowed to vote." Now we can, again, go back to simpler mechanisms like that majority vote that we talked about earlier, or other consensus protocols, like the ones that Leslie Lamport proposed in the '80s, but it's quite different I think from what the public blockchain is doing.

How is a public blockchain specifically different from a database?

A database, that's a generic term for a system that is designed to store and retrieve data. Different databases are designed for different environments. Some are distributed, some are designed for small data sets. Some are designed for big data sets. Some give very strong consistency guarantees through something like consensus for instance, some don't. Also, different database systems are designed for different ways of accessing the data. If you squint a little bit, you could see blockchains, especially very flexible systems like Ethereum as kind of sort of a database. However, they are pretty bad at most common data storage and retrieval tasks, because that's also not really what they're designed for.

You can do hundreds of thousands of transactions per second with a traditional database system. No problem. A system like Ethereum is nowhere close to that. At the end, the existing blockchain systems are not really designed for arbitrary data lookup, they're designed to solve this consensus problem that we talked about earlier. Sometimes if advanced lookup is necessary, you can build derived data structures. So you can use traditional databases in combination with the blockchain. Bitcoin is doing that for instance, with the UTXO set, which the UTXO set is a bit simplifying, again, an overview of all the money that is there ready to be spent, and because it would be very expensive to look up this information on the blockchain itself, they kind of derive a database from the information which is contained in the blockchain.

That's really interesting. So to follow on to Ben's question, so under what circumstances does a public blockchain solve a problem that cannot be solved with a centralized database?

I would be a bit careful with the word centralized here because traditional databases can be distributed. It's just, as we said earlier, that it's still one entity, which is owning all of these machines and making sure that each of the individual machines that contribute actually behave correctly. I think that's really the big difference. Can it work without having one entity that everyone trusts and without taking a stance on whether or not the peer to peer currency like Bitcoin is actually a good idea. I think this is one of the use cases that is really impossible to achieve with existing database technology, because somebody would have to act as the bank, which is making sure that the database's behaving correctly.

You kind of already touched on this I think, but a permission blockchain does not solve the problem that you just described. Is that correct?

Yeah. Yeah. So what Nakamoto really wanted to achieve was not having this central entity, and if you have an entity that controls access to the system, you again, have someone that you need to trust. If you wear a cynic, you could maybe think that the way that these permission systems are created is marketing asked. We need something with blockchain because blockchain is in fashion, and then engineering tried to come up with something that still looks like a blockchain, but actually solve the problem of their customer. I'm not that cynical. So maybe the more good faith view on permission systems is that they are bet on the data model of blockchain. So that you basically say this entire idea of having a ledger of transactions is a useful building block in itself, like a relational database would be, and that this would be a useful technology to build other solutions on top of, but it's quite different from what Satoshi Nakamoto wanted to solve.

Super interesting, because there are companies that have big fancy press releases about their blockchain. I've always wondered about that.

Well, I guess we'll only see in a couple of decades what came out of that.

That's a good point. Maybe it is a really good idea.

Maybe. Maybe.

Maybe. We've talked about consensus, I think you might have mentioned proof-of-work. I don't know if you mentioned it by name or not, but we kind of touched on it. How does Bitcoin achieve consensus?

So we already said that we now have this data structure where we have a sequence of blocks and each block points to its parent block through a hash, and each of these blocks contains all of the signed transactions that we want to store. Now the question is the network needs to agree on what the next block is going to be. This is really the consensus problem that we are trying to solve, and the miners are those participants that are going to propose a new block and to do this, they have to spend real money by solving this proof-of-work, computing a hash, and they get rewarded only if they're successful.

It can happen that two blocks are suggested at the same time, which is then called a fork, which is two blocks that point to the same parent hash. However, it is part of the protocol that in such cases, the chain with the most work, or you could say the longest chain to simplify it a bit is the chain that counts, and that means that all the money that you might potentially have on a chain, which is not the longest chain is gone. So everyone has an incentive to keep working on the longest chain, and that's how the consensus is achieved.

What are the downsides to proof-of-work?

The very well known problem of proof-of-work is that it's extremely wasteful. The entire point of proof-of-work is that you have to spend money, meaning that the computation that needs to happen needs to be a computation that is not in itself economically valuable. So in this case, we're just generating a lot of hashes based on random data and that has no use whatsoever, it's just there to make sure that you spend money in order to generate the next block. There are these comparisons that are saying that Bitcoin has the same energy consumption as a mid-sized country, and this is the reason why that is.

One of the things that we've heard from various projects. I mean, Ethereum's been pretty vocal about moving to proof-of-stake, but there's all sorts of other consensus protocols. Are these surefire solutions to the problems of proof-of-work or are there trade offs?

So this is a bit of a leading question because there are hardly any surefire solutions at all. The idea behind proof-of-stake is, well, we said, we want to build an economic incentive to comply with the protocol, and in proof-of-work to have to spend energy first, and you pay yourself back through fees and this so-called coin based transaction. In proof-of-stake the idea is what if we can make the part where you have to spend the money virtual. So you will have your assets or some assets on the chain frozen, and if you misbehave, you lose these assets, then we can avoid all that energy burning and have the same effect at the end of the day.

The thing is that the fact that the stake now becomes part of the protocol and is not something external to the system makes the entire system much more complex, and there are a lot of new attack vectors that people can look at in a proof-of-stake system. So what happens for example, if you try to reproduce the entire history since the beginning of time in the chain? In proof-of-work, this is not really possible because it would be a way too expensive. In proof-of-stake, that is a potential attack that needs to be considered, and if you look at this so called "merge" which is the move of Ethereum to proof-of-stake, it appears to me that this is really a very long game of Whac-A-Mole where each of these possible attacks and complications that pop up need to be solved, and then the solution opens up new problems and new weaknesses.

I'm not saying that this is fundamentally impossible. I'm not aware of any impossibility results with regard to proof-of-stake, and maybe it does actually solve the inefficiency problem of blockchains at least to some degree. I still think that the complexity and security risks that this opens up should not be underestimated. Also, we don't quite know yet how proof-of-stake will change the economics of the blockchain. But I guess that's just because we haven't really seen it rolled out on a very large scale and that remains to be seen.

Do proof-of-work and proof-of-stake consensus still work where there's collusion in the system?

They support some degree of collusion, but there is an important known weakness which is also one of these words, which is thrown around a lot, which is the 51% attack. A way of looking at this is that now we have baked economic incentives into our algorithm design, and that means that next to the typical tools of computer science to analyze our algorithm, we also need to use the tools of economics to analyze the behavior of the system. So you can make statements like, well, the system works in an equilibrium state where everyone's dominant strategy is to comply with the protocol, et cetera, et cetera. But we also need to analyze what happens in the case of a "market failure." One of the questions is can the participants of the network arrange side payments between each other and then collude to break the system once they arrive at the conclusion that their collusion turns out to be more profitable than what they could gain from remaining inside the system?

The answer here is yes, this is absolutely a risk. The somewhat oversimplified version of this is that if you can group together 51% of the hash power in proof-of-work or the equivalent in proof-of-stake, you can basically force your version of the truth to be accepted. That means you can try to impose certain changes to the protocol. You can try to double spend, or you can block certain transactions from going through. We see that collusion happens because the fact that mining pools exist where multiple miners collaborate, that is a form of collusion where they arranged for site payments, it's just that in the Bitcoin network we haven't yet seen a mining pool that surpasses the 51%.

Also, another reason why we know that can happen is economical example in the Ethereum network, where they rewrote history after the famous DAO hack, DAO being a decentralized organization that was running on the Ethereum network, it had some weakness, it was exploited. So there were a lot of transactions on the Ethereum chain that drained money from, let's say other people's accounts to simplify, and the network basically agreed to undo these transactions, because everyone or a large number of the miners agreed to revert this, this is basically a 51% attack on the Ethereum chain.

I want to follow up on the things that you're just saying. You mentioned mining pools. You mentioned the DAO with Ethereum, in their current setup are blockchains like Bitcoin and Ethereum blockchains, are they really decentralized?

It depends a bit on how you interpret the word decentralized. I think the intention behind a system like Bitcoin was really to build something which is decentralized, and to some degree it works. You have different players in Bitcoin that have various degrees of influence, but currently they still need to, if they really want to impose their version of the truth, multiple of them would need to come together to decide together that this needs to happen. Maybe that's the best degree of decentralization that you can get. There is a lot of other applications that are running on top of blockchains, which are not decentralized at all, and which kind of defeat the purpose of blockchain. But I think depending on your definition of decentralized, you could say that yeah, at least Bitcoin is to some degree decentralized.

If two of the largest mining pools got together and decided that they wanted to change history or not allow certain transactions to happen, could they do that?

Well yeah, but I think decentralization again, depends on your model and on what players there are on your chain. Indeed, if you have mining pools, which in themselves consist of many individual miners, decide to collaborate and to execute a 51% attack on the Bitcoin network, they would. But maybe that's just the best degree of decentralization that you can achieve. In theory, you could have one miner that just owns 51% of the hash power in Bitcoin, and then this miner can call all the shots. So in that scenario, the system is not decentralized. I'm not sure how much decentralization you can really get, and whether it's not something that we will also see when moving to proof-of-stake, that these systems will become more and more centralized because more and more of the decision power gets concentrated in the hands of a few. Maybe that's just something that is inherent in the system, but it's already, let's say more decentralized than the systems that Nakamoto wanted to replace. So I guess that that's an achievement.

Interesting. In the case of a 51% attack they probably wouldn't be incentivized to do that. Assuming that they hold a cryptocurrency, they probably wouldn't want to do that because it could make it be worth less, I guess.

Yeah. So that's one of the typical arguments that is brought forward, that it's not in their interest to break confidence in the working of the currency because they're benefiting from the confidence that users have in the currency. Maybe that's true. There is some incentive there. I'm a little bit skeptical, I find that argument a little bit hand wavy because who knows, it might just turn out that there is an attack that happens to be so profitable and they get to sell off the Bitcoins or the cryptocurrencies that they obtain that way so quickly that the long term gain they have from this attack is larger than the long term gain they might have from keeping and operating their mining equipment. I don't know. It's a bit difficult. It hasn't happened yet, but I don't think we really have a guarantee that it won't happen.

I guess I just made an assumption there that they do hold Bitcoin or whatever, but that's not necessarily true, miners could be selling off their Bitcoin.

Yes, but they are benefiting from the fact that the Bitcoin network is running, because every time they mine a block, they get some Bitcoin which they can then hold or decide to sell. It doesn't really matter. But when trust erodes in the Bitcoin network, they can no longer do that because nobody will use Bitcoin anymore.

Even if they didn't hold Bitcoin, they have all of their money equipment, which is worth a lot of money, and that would become worthless.

Yes, that is correct.

Very interesting. How much control do miners have over the actual transactions that get processed?

The transactions come in. So everyone, every participant in the network broadcasts their transaction to the entire network, and then you basically have a pool of transactions which are not yet included in the chain and the miners can decide which transactions they want to include in their block or not, and the block is only valid and will only be approved by the rest of the network if there are no invalid transactions in that block. A miner could in theory decide to not include any transactions in that block or they could decide to not process certain transactions. The idea is that I guess some miner will eventually process your transaction. There's also another economic incentive here, which is that you can attach a fee to the transaction that you propose to the network and the miner can take that fee. So if you propose larger fee to the miner, your chances of getting your transaction included in the next block will increase.

That's so interesting. A dominant miner they'd have to have all the hash power I guess. If a dominant miner decided they wanted to censor a transaction to a specific address, they could do that, but if they didn't mind the next block, then some other miner might accept that transaction.

That's the idea, indeed. One miner might discriminate against you, and I think the idea of the network is that some miner will eventually process your transaction and to not have your transaction included at all in any block, basically everyone would need to agree to discriminate against you.

That's super interesting. I'm just thinking about censorship because that's one of the things that Bitcoin is supposed to be is censorship resistant, but there could still be if there's three mining pools and they all decided they don't like you.

Indeed. So that would be a form of collusion. If you have 50% of the hash power saying, "I don't like Ben and we are not going to accept any of his transactions." Okay, there's a little caveat here. They need to be able to identify which ones are actually your transactions, because they're under a pseudonym on the chain. But let's assume they can do that, then they could in theory censor the chain because they can make sure that they're always on the longest chain. If somebody ever produced a block that contains one transaction signed by you, they would always not follow or build on top of that block, but they will make their own competing block, not containing your transaction.

Wow.

Because they have 51% of the hash power, they can keep going on that chain. But that would be another example of collusion. Now I think here there wouldn't really be any economic incentive of doing that, of discriminating against one particular person. Maybe there could be at some point, who knows?

It doesn't have to be an economic incentive either I don't think. Like if a state for example took control of the miners in their country and they wanted to censor transactions, but again, it's all pseudonymous, you don't know who is attached to which address.

That's a bit, the second line of defense. The fact that the transactions are not linked to a real name. Then again, at some point you need to get money into the blockchain or out of the blockchain, and these are potential places where somebody might reconstruct your real identity, unless you're not taking special precaution, which also exists. But this is quite a rabbit hole to get into.

It is. I do want to keep going though on the idea of things off the blockchain, what happens when a blockchain needs to interact with something like a stock price or a legal system?

So I think that is something that is really also sometimes overlooked because blockchains are this nice system in which nodes do not need to trust one another or people do not need to trust one another. However, that only really works if you stay inside this bounded context in the bounded world of the blockchain. For example, there are some people that are suggesting that we should store land title registers on blockchain. All right. So you claim you own some land near Toronto, and this claim to that land is more or less immutably stored on the chain, but what happens if I now go there and build a little hat and say, "This is mine." What are you going to do? Are you going to waive the blockchain at me?

This is a bit the problem of enforcing the claims that are encoded on the chain. Here in this case in practice, you still need to trust some entity in this case, maybe the police or the legal system or the state to enforce whatever is registered on the chain. The flip side, the other direction of this problem is what is commonly called the Oracle problem in the blockchain community. So for instance, let's say you have a smart contract, which does something depending on the current price of a barrel of oil, pays you out if the price reaches some value. Somebody needs to encode what the current price of oil is and you need to have this information available on the blockchain. So you again, you kind of trust this entity to do that correctly.

Now you can say, "Well, I can have multiple Oracles, which independently encode this information, or I can work with a reputation system, or I can try to limit the blast radius of what happens if one of these articles acts maliciously." But this is a bit the same techniques that we're also using with traditional intermediaries. So I think if your blockchain solution really heavily depends on third party entities that enforce information that are on the chain or a third party Oracle, then it is a good question to ask whether this same entity couldn't also be trusted with storing this data in the first place.

Are blockchains immutable?

The answer here is it depends. I think the entire idea of blockchain is that the record of everything that has changed and the order in which things have changed is replicated to everyone. So this record by replication is somewhat secured. Also, it should be very expensive as we've said earlier, for one party to unilaterally try to change this state. But we've also seen that in the case of a 51% attack, for instance, history can be changed. We've already talked about the aftermath of the DAO hack on the Ethereum network, where history has been modified. I think it's not really useful to say something is immutable. I think that what we should say instead is what are the conditions that allow changing history in such a system, and are these the conditions that we want for our application?

Well, I don't know if my next question makes sense based on what you just said, but is immutability as it's presented as a benefit of blockchains, is that unique to blockchains?

Indeed. So as I said, immutability itself is a bit of a difficult property because you could carve your ledger into the Rocky Mountains and there could be some seismic event that gets in the way of immutability even with that. Generally, I think we actually want to be able to undo things. I guess the undo after the DAO hack was maybe in some sense a good thing, because that was not a legit transaction that took place. We just want to determine what are the rules that we want to apply, who can undo and when. There are legal reasons why we want to undo, for instance, here in the EU, we have things like GDPR, which require companies to delete data if requested.

There are situations where something horrible happens and you just want to undo that, I think there are generally alternative approaches that we need to look at. I think a worthwhile question to ask is why do we think we want immutability in the first place? There are systems that you can design that allow you to prove that a certain transaction took place. So for instance, if I sent a signed transaction to my bank, my bank could in theory provide me a sequence of hashes that proof that my transaction is included in their ledger.

So basically this is an auditing mechanism. I don't know what is exactly in their ledger, but I am able to verify that they have not silently dropped my transaction. Maybe this is all that you need, and this is much simpler than the blockchain technology in itself, and the blockchain might just be overkill if that's already sufficient.

So to follow onto that from an engineering perspective, what are the downsides of using blockchain?

You could say, "Great, I have blockchains as the system, which solve the Byzantine consensus problem. I may or may not need this property, but I get it for free. So I take it." That might be an argument. Well, of course there is one argument that we need to talk about is the wastefulness of solving this problem that you don't really need to solve, especially with proof-of-work, which is, as we said, incredibly wasteful, but even with proof-of-stake, there is some overhead attached to that. But I also want to argue two more points here, which is A, running a system on a blockchain adds complexity, and I think complexity is a real killer for software projects. To understand what is going on when the system is not doing what it's supposed to be doing, and you need to look at more moving parts that just makes it more difficult and more expensive in the economic sense of the word, expensive to run your software project. Also, generally, if you have more moving parts, there are more ways to shoot yourself in the foot and make mistakes.

A second problem that I want to talk about is the problem of migrations. So migration means updating your system if you have a new version, and this happens all the time, migration and update could happen for instance, if your system contains a critical vulnerability that you need to fix and you want to roll out the new version of your software, that includes that fix and migrations are already a hard problem to solve, especially on distributed systems, but using the blockchain, makes it even more difficult. How difficult? That depends a bit on what you mean by "using blockchain." So consider for example, what it means to update the blockchain protocol itself. You want to roll out a new version of the Bitcoin protocol.

You kind of need a majority of the notes on your site, a majority of the mining power on your site to make that change. Also, there are a number of compatibility issues that you need to take into account because depending on how you design your protocol change, you might get a hard fork where the entire chain kind of splits in two, or you might get a soft fork where everyone eventually aligns on the new version of the protocol, depending on how convinced people are about that change. What about updating smart contracts? So basically code, which is stored on the blockchain. Here it depends a bit on the chain itself and also how your governance process looks like.

You could just unilaterally decide to deploy a new version of the contract and make it do something completely different. But then again you're kind of becoming the party that everyone needs to trust because you are unilaterally making all the changes and you might just include a malicious piece of quote into your new version of the smart contract. Another governance process that you might consider is to include voting. So to have people that are inside of your community vote on whether or not they want the new version of your smart contract. That potentially solves some of the trust issues, but then rolling out a quick fix for a critical vulnerability is problematic because you first need to get the approval from your community. I think these points are all things that add complexity and significant cost to a software project. So I would say if you don't really need the blockchain, don't use the blockchain.

Is that what happened in some of those big hacks where they had a vulnerability, but they couldn't fix it in time or something like that?

I think they're really different situations there. So the specific situation here, so generally most of the smart contract code is open source and visible to people, and if there is a bug in one of these smart contracts, a malicious actor could find that bug and try to exploit it right away. I think that is as far as I can see what happens in most of the cases. Usually what happen in traditional software systems is there is a procedure called responsibility disclosure. So somebody, a security expert that finds a vulnerability in one of your systems would tell you discreetly, "Look, there is a problem here." Then you have time to fix that problem, but not a lot of time because at the moment where somebody discovered the problem, you know that the problem is known and other people might have knowledge of the problem as well. So you need to be really quick in rolling out that fix.

Interesting. Given the specific problem that blockchain solve, do you think that the current applications like NFTs and DAOs and smart contracts are suitable for the technology?

Smart contracts is once again, one of these concepts, which I think have a very misleading name, because they're not really contracts, they don't have any legal value. They're just code which is run on top of your data somewhat automatically when a transaction triggers that code. People familiar with databases might recognize this as kind of a thought procedure, which is kind of the equivalent concept in traditional databases with miner differences I'd say. Is it useful to run these on blockchains? I know there's now quite a variety of smart contracts out there. I have not seen all of them, I should be careful with two broad judgment there. I think in general, it's a good idea to automate contract execution and we are already doing that.

I bought a train ticket last weekend. I bought it through an app. Payment was handled automatically. I got on board the train. I checked in at my seat. I talked to no one. I got out at the destination. Well, there was still a person driving the train, but this was basically an automated execution of the contract between me and the railway company. Of course there was no blockchain involved here. I guess the smart contracts are really the same idea but more integrated in the entire crypto ecosystem. So if you think that cryptocurrencies are a good idea, maybe these smart contracts have some value because they directly integrate with these cryptocurrencies.

NFTs, well, NFTs are basically just receipts that are registered on the blockchain. So you have a piece of data that proves that a certain transaction has taken place. I really tried and I really failed to see their use. If I want to buy art, just pay the artist. You might get some form of ownership or usage rights or whatever, depending on the deal that you make with the artist, which is the same that you need to do if you buy an NFT only that in addition to that exchange, you have that token registered on the blockchain. I don't really see what additional value this token provides. So the only explanation that I can come up with for NFTs is they're maybe just popular because of speculation and people want to ride the wave.

So what application do you see blockchain being useful for?

So blockchains, at least the Nakamoto variety of blockchains are promoted as techniques that can cut out the middle man. The fact that as we said earlier, blockchains are built to solve this Byzantine consensus problem, et cetera, et cetera, with a lot of fine print and assumptions that we've also talked about kind of suggests that this might be a good use case. I think in practice, every application needs to be carefully analyzed, whether they really fall in this category because there's a risk that you try to remove one intermediary and you place another intermediary. Coinbase comes to mind for instance, also there are some cases like what we talked about with the Oracle problem and the enforcement problem that you need the intermediary anyway.

But generally I think trying to find efficiencies to reduce rent seeking in areas such as finance seems like a desirable goal. I can kind of see this working, maybe in things like international money transfers, where there's generally less trust available through the legal framework, maybe an entity in Canada doesn't want to get into the details of the legal framework in Mongolia and still mitigate the counterparty risk of interacting with such a party. All of that can be handled in a different way, contractually, et cetera, et cetera. But maybe in some cases, the technical overhead is cheaper than the legal work that would be required to set this up.

There are projects like ripple, for instance, that try to achieve something like that. I absolutely don't want to endorse any particular solution, and we also know that the current generation of blockchains has a lot of technical challenges, but I can kind of see the general approach of having multiple participants share a view on the same data through some form of consensus. I kind of see that as a potentially useful thing when no participant would accept the other participant as the custodian of the data. Also, I generally find that Nakamoto consensus deserves some credit for the idea of baking economic incentives into algorithm design. I think that's not really something that we've done before, and I think that might have uses even beyond blockchain, but these areas, maybe they have nothing to do with cryptocurrencies.

Doesn't it have to though? Doesn't the cryptocurrency have to exist in Nakamoto consensus?

In Nakamoto consensus, yes. Indeed, because you need a native token, a native currency, because otherwise you cannot compensate the miner for the work that they spent. So you can have the cost, using the energy you can have without the currency, but you need to compensate them afterwards. So for that, the currency is necessary, but generally the idea of having an economic incentive for things that you do algorithmically, I think that's an interesting idea. There have been projects like this entire file coin idea. Actually, I don't know whether that went anywhere. I think that was one of these ICO hypes in the 2018 wave, but just generally the idea of saying, "Look, we have different machines that can offer to other participants storage capacity," and you get compensated for that, that is potentially an interesting idea, and that also relies on economic incentives. I think just having that tool in our toolbox might be interesting even if we're not talking about cryptocurrencies and blockchains at all.

Couldn't you do file coin without a blockchain or something similar?

Well, yes. I mean, I could offer my storage space on my machine and I get compensated for that, and I get compensated through a central entity who is then maybe getting paid by the person who wants to use the storage and I get paid by that central entity, and then you don't need blockchain.

I want to ask you about a Twitter exchange that you had. If you don't remember it, no worries. But it was with Jorge Stolfi, who's a computer scientist, who's been very critical of cryptocurrency. He tweeted that every computer scientist should be able to see that cryptocurrencies are totally dysfunctional payment systems and that blockchain technology, including smart contract is it technological fraud. You responded saying something similar to what you just described to us about the specific use case that they have. What was Stolfi's pushback to you saying that?

I don't remember his pushback. I don't remember the conversation.

Okay. Okay.

I can comment on the sentiment in general. I can say that there is a lot of very strong rhetoric in these debates from both sides, and I can kind of understand that because there's all these aspects of fraud and the energy consumption and all of that, and there are actual people losing money in this, and I understand why the debate is emotional. I think that we wouldn't really care so much about that if blockchains and Bitcoin technology was just some fringe research interest, and then we could even say, "Look, this is actually really cool that they achieve that and that it kind of works." I think the problem is more that this is maybe hyped more than it should be, and that means we can also not really calmly and rationally discuss the merits and pros and cons of the technology, because of course there are all these abuses and all these problems that the technology causes.

That's a good perspective. I've got a friend who works for CockroachDB. They make a distributed database project. It seems pretty cool, but nobody's talking about that.

Yeah, exactly. There is a lot of cool innovation going on in data systems and distributed systems, but everyone is full on the hype and that's a pity for those that also do great work and very useful work, but that are not in the spotlight.

As an engineer, can you describe the process that you would go through to determine whether a blockchain is the right solution to a problem?

My general view on software projects is that normally you have a space of problems that need to be solved on one side and the space of candidate solutions on the other side, which contains the technologies and techniques, et cetera, et cetera, and the magic and the software project really happens when you find a match between the two, and the problem with hypes and hype technologies is that people have the tendency to put the cart before the hose, meaning they ask, "What can I do with blockchain?" Instead of asking something like, how can I improve trade finance and then choose the appropriate technology. It's a bit like when you are consulting a client, your answer is not going to be just do small cap value indexing, but your answer is going to be well. That depends on what your goals are, what your situation is, et cetera, et cetera.

I think in engineering projects that is a bit similar. Usually for data systems, questions that I would ask are things like, what data needs to be stored? How large is that data? What load is expected in the system in terms of reads and rights? We know that blockchains are a bit limited in that regard. What are the access patterns? Meaning how applications need to access that data. Who can access that data? Who can read and who can write? Also, an interesting question is what is the timeframe in which the system should settle versus a bit of a weakness of the Nakamoto consensus? Because you only know that your transaction has been accepted by the system once it is a couple blocks deep, because then you're relatively confident that there will be no alternative fork, which will end up longer and not include your transaction.

If you want the settlement to happen relatively quickly, maybe that's not the right technology to use. Also, questions like, we touched on that, do I really need the data to be public or do I just want it to be auditable? That would also open up new alternative techniques. I think just saying that my data has a lateral lock structure is not really sufficient, there are a lot of alternative techniques that you can use if you have data that has that shape.

Now we've mostly been talking about public blockchains. We touched on permission blockchains, are there useful applications for private blockchains?

So we said that public blockchains with some caveats and fine print attached, solve this Byzantine consensus problem under civil attacks. Permission systems, we said, you introduce an authority, so they're more in the space of traditional databases, but there are maybe useful if the structure of the ledger and the building blocks that the vendor offers is very close to the problem domain that you have. I don't know, could be. I'm generally very skeptical about projects where a system runs on a blockchain which is exclusively run on your own hardware, that also exists where you just have your own machines and all of these machines run a different instance of the blockchain. I'm interested in contre examples if anyone has any, but to me it appears that with such a system, you get all the complexity of blockchain, but not any of the benefits. In such a system I guess you'd be better off with the off the shelf traditional database system.

That makes sense. All right, Daniel, that's the end of our questions. We really appreciate you coming on. This was great, and even more so because you're a podcast listener and part of our community, so that just made it even that much better for us to be talking to you.

I very much enjoy your podcast, was a pleasure.

Thanks Daniel.

Thank you.