Build Beyond: Scaling the Network
Mysten Labs Chief Scientist George Danezis explains intricacies around Sui's infrastructure that lets it scale to handle large amounts of traffic.
We sat down with George Danezis, Co-founder and Chief Scientist at Mysten Labs (initial contributor to Sui) and Professor of Security and Privacy Engineering at University College London, to get a look at how Sui's transaction processing system contributes to a performant network.
You come from an academic background. What was the focus of your research?
I am a professor at University College London (UCL) and the focus of my research is, broadly speaking, security and privacy. But I worked quite a bit back in the early 2000s on peer-to-peer systems, and also anonymity systems. A lot of these systems were large distributed systems focused on storage. When the whole blockchain thing became more about execution, with Ethereum, I got interested in distributed ledgers and blockchains and how to execute smart contracts. The permissionless aspect was quite familiar to me from my early work on peer-to-peer systems. At that time, my research group at UCL and I started looking at how to make higher performance systems. We started a company, Chainspace, to commercialize some of our ideas. The team was acquired by Facebook. We then helped come up with solutions to scale the Facebook blockchain, Libra/Diem. But when that didn't go anywhere, I left for other opportunities to realize the idea of high performance blockchains.
You're still a professor. What is the difference between the research side of your work and the application side?
There really isn't much difference. When we do research, we think of all the possibilities that one could have to do a particular thing, like having a high performance chain or particular features. But of course, when it comes to contributing to building a blockchain or picking particular features to put in a real system, one has to pick one option. One has to always judge, out of all these great ideas, which one is actually most useful to people. Which is the one that people are asking for? What is the bottleneck for blockchain adoption or what prevents people from doing what they want to do? When building systems, you still look at all of the possibilities and try to understand from the academic literature what is possible, but then you have to pick the most relevant ones. it's not just intellectual interest anymore, but value for users.
How do you determine what problems to solve when transitioning from the theoretical to the applied?
The main problem that I solved in my research is how to scale up different aspects of the blockchain. How to increase transactions and have lower latency. I'm deep into the systems aspect of the blockchain. And there, the problems were self-evident in the sense that every time we saw a contract on Ethereum becoming very popular, the Ethereum platform could not sustain the volume. It would run out of capacity and the fees would skyrocket. Every time there was a success story in blockchain, we saw that it required more capacity than was available. So it was obvious to us that the problem was that there isn't enough capacity to do what people want to do on these blockchains. It didn't come just out of our head, we saw it happen again and again and again. This was considered for a while, not just on my team, but actually across the whole academic community looking at blockchains, as a worthy challenge. Nowadays, quite a few techniques have been developed to scale them up so, of course, we can look at other challenges. But at the time, this was a well understood problem that lots of people were working on in different ways.
L2s are one answer people brought to this issue of scale. What are the differences and what is the benefit of building a new L1 like Sui?
L2s are the solution in the Ethereum ecosystem to scaling. But L2s are a bit awkward to use as an application developer. When an L2 tries to interact with Ethereum, although this is true for any L2/L1 relationship, there is a bridging activity that has to happen. The state that is in the L1 representing coins or assets or whatever has to be mirrored in the L2, and vice versa. In addition to that, of course, the L2 has to have some mechanism so that the L1 can verify everything that happens there. But even just that first part, namely that any assets that exist on the L1 need to be transferred to the L2, something has to happen on the L2, and then somehow the assets have to be transferred back. That is awkward.
That bridging activity works okay for tokens, for fungible assets, because people have two accounts and a bridge in between. But when it comes to more general assets, it doesn't work very well. To actually use an L2 on Ethereum to develop more complex applications than just tokens, you need to have a contract on both sides. One contract has to mint and the other one has to burn. They have to know about the fact that they're across two different ecosystems. It’s a custom activity for each contract. You can't trivially just say, I'll make an L2 and take all the assets and do whatever I want and bring them back. There is no such concept. It's a very manual process, and error-prone as well. So it's not a great experience. Then imagine that you have assets across multiple different L2s and you have these custom contracts across different L2s. Every time you want to do something with some other state that is on another L2, you have to bridge all the way back down to the L1 and then back to the L2. You can't just easily say, I just did something on this one blockchain, and then I'm going to do something else on the one blockchain. I don't have to think about which L1 or L2 it is on. It's all here. I have it in my hands right now and it's ready to take more transactions on whatever state I want to access. This is why it's not a great experience to split that state across L2s. Moving assets around is super awkward and quite visible to users. So this is why L2s never really captured my imagination.
Another example, Cosmos, which has a very interesting ecosystem, takes another approach, which says that to scale up we'll just have different blockchains for different apps. And we can basically have different transaction rates going on different chains, and again, bridge the assets from one chain to the other when they need to do things across the different apps. But it has the same problem. Every time you want to use a different app, you first have to go through a bridging exercise, which is delicate and quite visible to the user. Then you can use the app and bridge back. You find yourself spending more time moving assets from one chain to the other, than actually doing what you want to do.
On Sui, the view is there’s one big database, effectively, that contains all of the state that is replicated across validators. As soon as you're done doing one transaction, everything is available in the same database to do the next transaction, you don't have to keep moving things around and deal with that complexity as a user.
Sui Lutris is the underpinning to Sui’s protocol. What are its key innovations that allow Sui to function with high throughput and low latency?
Sui Lutris is composed of two key ideas: (1) for many actions on a blockchain, you actually don't need to do consensus and (2) when you do need to do consensus, there is a very high throughput method. It takes these two approaches and combines them. Sui Lutris is the heart of Sui’s distributed system that ensures when you do transactions on its distributed network two different validators who follow the protocol are never in an inconsistent state. It's never the case that one validator thinks you spent a coin and sent it to Alice and another thinks the same coin actually went to Bob.
There are two different paths, one that does not require consensus and one that does. When objects you want to operate on that only belong to you, for example your own NFT character and your own hat that you want to combine so your character can wear a hat, in theory no one else should be operating on them. In those cases, Sui uses a fast path. It says you can operate on your own objects. You can get finality on this transaction, make sure it happened, and the hat is on your character, all without waiting for consensus.
But in some cases, transactions involve objects that are not just yours. They are shared by many, many people. For example, if there is an auction for selling little hats for characters, that kind of auction is represented in Sui as a shared object. People can place bids and whoever offers the highest bid wins the hat. The auction is a kind of object that is not owned by a single entity. Everybody has to be able to place bids, share it, and update its state about the latest bid. Those kinds of actions require extra consensus. Sui Lutris allows you to have shared objects and do transactions on them that lead to other objects you own, change the state of the shared objects, or create new shared objects. It allows for two paths to coexist and objects that are owned by particular individuals or objects that are shared have an interplay with each other.
The two different paths have different strengths. The owned object path is extremely low latency and it can scale quite extensively. The consensus path is higher latency. So the first path takes less than a second, super fast. The consensus path for shared objects is more than a second and also quite high capacity. But, it's more difficult to scale than the first path. Applications on Sui that really hammer the chain with millions of transactions per day, they usually use the first path and structure their applications to largely have the greatest number of transactions on owned objects, not shared. On the other side, protocols that do complex work, such as DeFi, usually have the second type of transactions because they have to combine lots of different people's bids or liquidity in order to perform operations.
Can app developers on Sui design their apps to take advantage of the fast path?
Yeah, absolutely. I think that is the core job of a designer of an app that needs to scale. A smart contract developer has full control over whether the objects they manipulate in their contract are objects that are owned by a single entity at any particular time or are shared. A trick for scaling up applications in Sui is to make sure that the most plentiful actions are largely on single-owner objects because Sui can basically manage as many of those as you want with very low latency. That's a great experience. Actions that are necessary for games should be in that category because they're very low latency. As soon as you click, the transaction finalizes on the network. That, versus actions that have to be mediated through shared state and shared objects.
The smart contract designer has full control over that. They can basically specify exactly which transactions are in each category. Of course, the first version of the contract can have everything as a shared state, and everything goes through the consensus path that is higher latency, but then, as there’s a need to scale up, the developer needs to think about how much can be done to not require those parts.
How do Programmable Transaction Blocks fit in here?
Programmable Transaction Blocks can be in the fast path or the consensus path. If a Programmable Transaction Block only touches your own objects, that means that you can do a lot of actions on-chain in one go. Let's take an example. Imagine you're a centralized exchange and lots of people have bought and sold different coins. You can do one transaction on-chain with the objects that conceptually correspond to what people have bought and sold but, because you're an exchange, they all belong to you and a thousand of them can settle at the same time. That’s the fast path. On the other hand, if within a Programmable Transaction Block some of the objects are shared, that goes into the consensus path, and has a bit of a higher latency—a few seconds, rather than less than a second.
Mainnet went live about just over 100 days ago. What have you seen happen on Sui that confirmed the theoretical hypothesis from your research? What has surprised you?
Several things confirmed Sui’s design, but a few things have given food for thought. One thing that did confirm the design is that on the days that had special promotions, even 60-plus million transactions a day during one, the majority of those transactions were indeed in the fast path. The very scalable, very low latency path of Sui Lutris. Up to that point, it wasn’t clear that anyone was going to use this path but when volume and low latency was needed it was used. And it works! That's quite nice to know. And that's the way to do it. On those days, Sui saw more transactions than all other blockchains put together. That's an interesting validation that Sui’s design makes sense.
At the same time, the Sui community has found that this fast path is a little bit delicate. Because the owner of the objects has to manage, to some extent, the sequence of actions on their own objects, sometimes they might get it wrong. Sometimes they may even use libraries that don't help them, and the libraries themselves get it wrong, so that sometimes the objects get locked up. Usually they get unlocked at the end of the day, at the end of the epoch. But that is not a great experience. People designing smart contracts are quite terrified that this may happen by mistake and this prevents them from using the facilities that are low latency and scalable to their full extent. A whole suite of techniques is being developed to allow people that mistakenly lock up objects to unlock them very quickly, within a few seconds. So if you try to use the fast path, a mistake happens, your object locks up, then you can immediately use a consensus path to unlock it without having to wait until the end of the day.
And that, strangely enough, is not just about avoiding mistakes. It also allows developers to express more things through the fast path. There are potential techniques where some objects are not just owned by one party. Maybe there is an object that you and I own together. And, traditionally, transactions on that object would have to go through the consensus path because it's shared. However, if Sui has a quick way to unlock objects, a developer could actually try optimistically to put transactions through the fast path. In the case where both you and I happen to make a transaction on the same object at exactly the same time and the system will lock up not being able to decide which one happens next, then Sui can unlock it and have it go through the consensus path, make it shared, and resolve it. But this situation is very unlikely unless people are trying on purpose to race each other. Hopefully, once Sui has the facilities to allow objects to be unlocked, it should be able to have objects that belong to multiple people go through the fast path. It's a game of trying to put as much volume through the fast path, and this is the kind of thing being developed to help the builder community.
Can you share in more detail what is causing objects to lock up currently?
If an object belongs to you, the reason it doesn't need to go through consensus to tell Sui what sequence of actions happens is because there is no one else that could act upon your objects. Sui relies on you to tell the system that action A is going to happen first, that action B is going to happen second, and action C is going to happen third. The system still has to check that A then B then C is seen by everybody in the same sequence. This system does that through a distributed protocol that only checks, is it the case that we all have seen A then B then C. The problem is if you make a mistake or your software makes a mistake. What if, for example, you have your phone that controls your asset and your computer that controls your asset and your phone says the first action to happen is A and your computer says the first action to happen is B. You, by mistake, sequenced two different things to happen first. There is a contradiction. In that case, Sui says, ok, the person I was entrusting to tell me the sequence seems to have given me two contradictory things so I don't know what to do. I don't know how to resolve this. Because Sui’s way of resolving this normally would be through a consensus path. But here, you’re trying to use the fast path. So Sui puts its hands up in the air and says, okay, there was a mistake here.
The original assumption was that this situation would be extremely rare, but it happens because people use different devices, or try to do multiple transactions on the same objects at the same time. It turns out that it happens quite often. Right now, when these objects get locked up, Sui waits until the end of the epoch to unlock them. This is a real worry. Imagine if your assets cannot be used for a day. That could actually be a serious problem.
So now Sui needs to evolve to do the right thing when something is locked up. If the entity that’s entrusted to give a correct sequence has given an ambiguous sequence, Sui will take this whole situation and put it through consensus to resolve it. And this will happen within a few seconds rather than at the end of the day.
A lot of your research was around privacy. What’s your perspective on how a public blockchain can best balance transparency and traceability with privacy?
My take on privacy and blockchains is that what needs to be private is very application dependent. For example, on Sui it is left to the app developer to develop contracts that provide the right protections when it comes to the privacy of its users. Because some people just want to develop games, and maybe their privacy concerns are not so great. Some people want to use blockchains for financial matters, where privacy may be more concerning, but then there are also other kinds of concerns around regulation. So Sui says, look, we're gonna give you a great platform and you have to build privacy on top of that platform.
To help folks build privacy, Sui offers some cryptographic primitives that could be useful for them when designing smart contracts. The most important one of them is the ability to verify Zero Knowledge Proofs on Sui. There is a native function that does the verification of one of the most widely used and understood schemes, the Groth16 scheme that was developed by my colleague Jens Groth. That means, in effect, the designer of an app can consume proofs that some things have happened off chain without revealing what those things are. And this is the kind of fundamental building block for building a whole category of privacy-friendly apps, that keep some state off-chain but then on-chain you can verify that whatever happened off-chain is correct.
App developers determine what kind of privacy needs their apps have and use the primitives to combine an on-chain, off-chain, encrypted on-chain, whatever strategy they need in order to deal with the privacy concerns they're going to have.
Are there more privacy primitives on Sui?
The community is thinking about what primitives developers need to write smart contracts in a more privacy-friendly way. Zero Knowledge Proofs are one of them. Some people may argue that Sui needs more general mathematics or cryptographic functions on-chain. It would be great to see designers of smart contracts provide feedback on what is missing. There are whole families of other techniques that one could use to protect privacy, such as multi-party computation or trusted hardware. Different chains have gone into those directions. These require very complex additional systems. There would need to be enough evidence in the community that people want these technologies because they represent some fundamental changes to the Sui architecture. But if the community wants to take it in that direction, there is a process to propose additions for how to protect privacy.
How do you see Sui evolving in the next 6 to 12 months?
It depends what kind of apps people develop on Sui. In the short term, a lot of the improvements are going to be geared towards the apps that people are actually building. In the very long term, and by very long, I mean like six to 12 months, which is very long term by blockchain standards, there is work to improve the Sui Lutris protocols to have even lower latency, to be simpler so that Sui can be able to scale even better. And to be more economical so that validators can run on more restricted hardware and use the hardware they have to actually execute transactions rather than do cryptography or all the other overhead of a blockchain. This is the kind of stuff I expect to see.