This is a story about how three developers, inspired by the movement towards decentralization, built an experimental, decentralized cloud storage system. My goal in writing this is not only to explain some of the core challenges we faced and how we solved them, but to introduce readers to the problem space of decentralized cloud storage in general.
Our process of development was, and continues to be, iterative. For each challenge we faced, we followed a systematic approach consisting of research, tradeoff-analysis, design, and implementation phases. The system I have described in this article is the final result of our first holistic iteration of development.
As a brief aside: we are all open for new opportunities, and if you’re impressed by after reading this, we encourage you to reach out to us!
Get in touch:
With that out of the way, let’s talk about cloud storage.
There’s no arguing that cloud storage is important. It’s convenient, it’s cost-effective, and we tend to assume it’s reliable and secure.
We are all familiar with DropBox, Google Drive, and Amazon S3. Although these providers have significant differences, they also share at least one significant commonality: they are centralized services.
A centralized service is one in which a single party controls all state changes. Think of “state” as the collection of all the data stored by the entity. To upload a file to Google Drive, a user requests Google Drive to store that file. It’s up to Google Drive to accept or decline that request. If they decline the user’s request, then they have declined the state change that the user has requested. If they decline the user’s request, they cannot make that request from another provider of the Google Drive service: there is only one provider.
Criticisms of centralized storage abound. When we store our data with centralized providers, we trust them to be good stewards of our data. They control what happens to our data. We trust them to keep our data secure and private. We also trust them not to lose it. Regardless of the ethics that they abide by, the fact is that they can use our data at their discretion, for their own gain, at the expense of our privacy. The very architecture of centralized services confers providers with this power.
Recently, we saw Mark Zuckerberg’s congressional hearing about the ways in which Facebook uses and stores users’ data. When Senator John Neely Kennedy asked him whether Facebook “had the right” to access users’ personal data and share it with third parties, Zuckerberg replied that they absolutely do not have the right. When Sen. Kennedy asked whether Facebook had the ability to access and share users’ private data, Zuckerberg replied that they have the ability “in theory,” but that would be a “massive breach, so [they] would never do that.” The fact is that the system’s design enables providers to do what they wish with our data and it’s up to us to trust that they remain ethical. Wouldn’t it be better if we didn’t need to trust providers to abide by their code of ethics?
Centralized services are also hot targets for malicious parties looking to steal data. It is naive to assume that all the best and brightest in the world are working to protect your data; the best and brightest exist on both sides, and some are actively looking to exploit you.
Not all storage systems suffer from these deficiencies. There is a class of systems — decentralized systems — that aim to do what centralized systems cannot: offer reliable, secure, private, even anonymous service without taking for granted that the providers of the service are trustworthy. Users can trust the system at large, despite the fact that the system is composed of low-trust actors.
A decentralized network is composed of many processes, nodes, or peers, that all possess local state and offer this local state up as a resource for use by other peers on the network. The state of the entire network is the combination of all these local states and the act of many peers sharing a portion of their resources is what allows the overall service to emerge from a multitude of small, interacting pieces (e.g., many peers offering up a small portion of their computing power can create something of a decentralized super computer… This is exactly what the Golem Project does). Each node in the network possesses an individual state and is the provider of that state. A decentralized service is built from a network of independent service providers.
The power of decentralized systems is that no provider controls all the state in the network. If a service provider decides to betray users’ trust in a centralized system, then the service itself is compromised. A bad provider entails a bad service. In a decentralized system composed of many service providers (peers) who only control a portion of the entire service’s resources, a bad provider is not equivalent with a bad service: there are new providers at the users’ fingertips. A decentralized service can be realized through a network of independent providers who compete to provide quality service.
Our work began by constructing a minimum working definition (MWD) of cloud storage. This definition would serve as our launching point for development.
We operationally defined a cloud storage system as any system that allows its users to upload files, delete them from their local device(s), and later download those files.
Now that’s what a cloud storage system is at its core. However, it’s far from being a quality cloud storage system. To make a quality cloud storage system, we needed to set some goals beyond our MWD. Namely, a quality cloud storage system had to:
- Enforce the concept of a file owner who has sole read access to the files they are storing on the network.
- Be highly available.
- Prevent data loss.
- Provide incentives for participation and cooperation (potential file hosts should be motivated to join the network in the first place, and after that, they should also be motivated to responsibly store others’ files over time)
Our first step in development was to implement a p2p version of our MWD of cloud storage. Our implementation of the MWD was a pretty simple system: two peers directly sending files to each other.
In a two-peer network, each node can decide to upload a file to the network by sending that file to the other peer, deleting it locally, and later retrieving that file from their peer.
And just like that, we had realized our MWD of cloud storage.
Although this p2p system satisfied our MWD, it did not satisfy the more demanding criteria of quality cloud storage. If peer1 sent their file to peer2, peer2 could easily read peer1’s file. If peer2 went offline, deleted, or even overwrote peer1’s file, then that file would no longer be retrievable. There were also no incentives in place to encourage hosts to participate. The only type of participant incentivized to join the network was file owners hoping to store files on the network. Hosts would have no reason to join the network other than the altruistic satisfaction of storing other peoples’ files for free, which isn’t a very assuring model.
So, we needed to enforce privacy, increase file availability, and introduce incentives for hosts to participate.
The first problem we solved was the privacy problem. According to our definition of quality cloud storage, a file owner should have privileged read access to the files they upload to the network. The file host should not be able to view the contents of the file.
We added a preprocessing step to the file upload process to solve the privacy problem. Instead of sending the file data to the host, a file owner would first encrypt the file data with a private key. Since the owner is the only peer who should be able to decrypt the data, we opted for symmetric key encryption.
At this point, we still had a two-peer network, but peers exchanged encrypted files rather than raw file data. A host could no longer read an owner’s file data.
File owners should be able to delete their files after upload. On a two-peer network, if the file owner uploads their file and then deletes it locally, there is only one copy of that file on one remote host. Since p2p networks are densely populated with casual users who may frequently disconnect from the network, we had to deal with the situation in which an owner wanted to retrieve a file that was stored on an offline host.
Luckily, p2p networks can grow very large, and the power of p2p networks emerge as the network grows. So, we abandoned our two-peer model for a more realistic, multi-peer model, and looked at how we could increase a file’s availability after upload.
This was a classic use case for a redundancy scheme: in a network of unreliable peers, we absolutely could not trust a single peer to remain online. Two primary classes of redundancy schemes we identified in our research were: simple replication and erasure coding.
We chose replication (which is just the act of storing multiple copies of a file across different nodes) over erasure coding for the following reasons. First, erasure coding introduces increased overhead to the upload and download process. Second, as I will discuss later in this article, when host nodes go offline, redundancy decreases and Layr needs to respond to these decreases by redistributing a new copy of whatever data was lost. This is straightforward with replication: just download a copy of a file, give it a new ID, and then re-upload it to a new node.
Redistribution, or maintaining a baseline redundancy, is not nearly as simple or computationally cheap in an erasure coding scheme. This is because erasure coding splits files into encoded pieces such that any subset of k pieces can be used to rebuild the entire file. Think of each piece as a point on a straight line. If a file is split into five encoded pieces (5 points on a line), any two points can be used to re-build the file (re-draw the line). If all those points were on the exact same coordinate, however, you would be unable to re-draw the original line. In an erasure coding scheme, the same principle applies: if every encoded piece is different, then any two pieces can be used to rebuild the file. If half of the pieces are identical (or if half of the points have the same coordinates) then it is no longer the case that any two pieces can rebuild the original file: only non-identical pieces will suffice. Therefore, the redistribution process cannot simply download and re-upload a copy of an encoded piece that already exists: it must derive a new piece (reconstruct the line and choose a new point on the line) to upload.
The final reason we opted for replication was that there are multiple erasure coding strategies available, each with their own sets of tradeoffs, that we need to examine in greater depth before moving forward with erasure coding.
Introducing replication was also an opportunity for file owners to take greater control over their data: we could allow them to choose the number of copies they wanted to store. We called this quantity a file’s baseline redundancy.
At this point, we had a multi-peer network that encrypts file contents with private key encryption before upload and implements simple file replication redundancy scheme so that a file could still be retrievable if one of its host nodes disconnects.
What if a file owner wanted to upload a large file, say, 100GB? Each host node would have to store all 100GB. A 100GB file is a substantial storage and bandwidth burden for hosts. It’s possible that no host on the network even has that much free storage space. Further, hosts are probably casual users without top-tier bandwidth capabilities. Asking them to download an entire 100GB file is not only a storage burden: it’s also a bandwidth burden.
It would be far better if the network could distribute the bandwidth and storage burdens of storing large files across multiple nodes on the network. We accomplished this with a method called file sharding.
File sharding allowed us to prevent an entire file from being stored on a single host by splitting that file into pieces and sending each piece to a different host on the network.
At this stage in development, we had a multi-peer network in which file owners’ files were encrypted, sharded into pieces, and then distributed to different hosts on the network. Copies of each shard would be distributed to different hosts to satisfy a pre-set level of baseline redundancy.
With the introduction of file sharding, we introduced a one-to-many relationship between an owner’s file and the shard-files distributed across the network.
As far as host nodes on the network were concerned, each shard was a distinct file. There was nothing inherent in the shards themselves that preserve the relationship between a file and each of its shards. Further, there was nothing indicating the order in which these shards should be combined to reconstruct the encrypted file successfully.
We solved this problem by generating a manifest file for each file uploaded to the network. The purpose of the manifest file was to keep track of each shard and its duplicates, as well as the order in which these shards need to combine to reconstruct the original file’s contents.
To accommodate all the shards, we modified the upload process so that a shards directory stored all shard copies (as shown in the demo above), which are then concurrently uploaded to different hosts across the network. If a file is sharded into 5 pieces and 3 copies of each shard were made, then the shards directory stores15 pieces which would be sent off to 15 different hosts (assuming there are at least 15 different hosts on the network).
With this new approach, a file owner was essentially “trading” their uploaded file for an (ideally) smaller manifest file
Managing the size of the manifest file was a fascinating challenge to explore. Since the manifest file stores a record of shards, increasing the shard count increases the size of the manifest. However, recall that the purpose of sharding was to reduce the storage and bandwidth burden of large files on hosts.
Keeping the shard count constant increases the shard size as a file grows. However, it holds the size of the manifest file constant, benefitting the file owner. It also causes each shard of k constant shards to increase as the size of the uploaded file increases.
Conversely, keeping the shard size constant increases the shard count as a file grows. However, it holds the size of each shard constant, benefitting the file host. It also causes the manifest file to grow larger for larger uploaded files.The question is:
Which method is more space efficient?
For each sharding method, one party’s storage burden grows linearly with file size, while the other party’s storage burden remains constant.
In the shard-by-k-size method, the storage cost of the file owner, or the size of the resulting manifest file, is represented by the following function:
manifestSize = 20 bytes * (fileSize/shardSizeConstant) * duplication Constant
In plain english, this is stating that the size of the manifest is roughly equivalent to the amount of space of a single shard ID (20 bytes) times the number of shards multiplied by the number of duplicates of each shard. The only non-constant value is
fileSize, indicating a linear relationship between file size and manifest size.
In the shard-by-k-quantity method, the storage cost of the file host, or the size of each shard, is represented by the following function:
shardSize = sizeOfFile / numOfShardsConstant
In plain english, this is stating that the size of each shard is equivalent to the size of the file itself divided by the pre-set quantity of resulting shards. If the number of resulting shards was set to 10, for example, then each shard would be 1/10th the size of the file. It is again clear that the
shardSize grows linearly in relation to the size of the file.
It turns out that a pure shard-by-k-size and a pure shard-by-k-quantity method both present significant drawbacks.
A shard-by-k-quantity approach is sure to fail. If the constant quantity is too low, then large files will produce shards that are unreasonably large for file hosts. We could set the constant quantity really high, but that would produce issues for small files.
Sharding-by-k-size presents similar problems. We could choose a constant shard size that could work for many files, but not for large files, where the risk would be producing so many shards that the manifest file actually grows to an unreasonable size.
For example, perhaps our duplication constant was 500, meaning that for each shard, the manifest stored
500 * 20 bytes, or 10KB. Perhaps our constant shard size was 500KB (a common shard size for files uploaded in BitTorrent). Imagine a file owner was uploading a 1GB file:
- A 1GB file sharded into 500KB pieces, which would result in 2000 shards.
- At 10KB per shard in the manifest, that ends up producing a manifest of about 20MB in size.
- If we doubled the shard size from 500KB to 1MB, we could halve the size of the manifest (saving the file owner 10MB of storage) whilst only costing each file host 0.5MB of extra storage.
The best sharding method is, therefore, one that chooses values dynamically to minimize the storage burdens on each party within certain limits (for example, a shard probably shouldn’t climb to 10GB in size in an attempt to reduce the size of the manifest file. If it came down to it, a larger manifest file is better than a large shard since the manifest doesn’t need to be transmitted or downloaded). Both BitTorrent and Storj shard files differently based on the size of the file. (See here and here)
At this point, our system looked like this: we had a multi-peer network, where uploading entailed encrypting, sharding, replicating, and dispersing. The encryption kept the file content private. The sharding kept the storage and bandwidth burden low. The replication increased the file’s availability.
However, replication only solved part of the availability problem. The likelihood that hosts have gone offline, lost or corrupted a data owner’s file increases with time. Ideally, Layr would offer a way for file owner’s to maintain their files’ baseline redundancy (which is the number of copies they have chosen to upload to the network).
An owner needs to detect when a duplicate of any shards of their file(s) have been corrupted or lost (either because the host is offline or because the host deleted the file) to respond to decreases in a file’s baseline redundancy.
A naive way to test whether a shard is retrievable is merely to attempt to retrieve each shard. If the host is offline, the test automatically fails. Otherwise, if the host returns the shard, the tester (file owner, in our case) can make sure the shard’s contents are intact. This method did not suit our needs for a couple of reasons. First, it produces a lot of communication overhead. Second, the owner would need to keep a local, authoritative copy of each shard for comparison, which violates our requirement that a cloud storage system should allow users to delete their local copies of uploaded files.
A second method we explored was requesting the hash of the file’s contents instead of the contents itself. This minimizes the communication overhead and requires the file owner to only store the hash of each shard(something they already store in the manifest file). Unfortunately, this method proved to be insufficient as well: A host could easily cheat by storing the hash of the shard’s content rather than the shard’s content itself.
The third method we considered was hash-based as well but precluded the host from generating an answer ahead of time. Instead of asking the host for the hash of the content itself, we asked for the hash of the content + a randomly generated salt. Since the salt would not be revealed until the file owner performed the audit, the host would need the shard’s contents at the time of the audit as well if they were to pass.
Because the owner should be able to delete their file and its shards after the upload process, these challenge salts + the hash of the shard content and salt needed to be pre-generated during the upload process. That also meant that an upper bound was placed on the number of audits an owner could perform. Despite these limitations and others, this approach provided a strong level of security, which is why we chose it.
The benefit of proofs of retrievability was that they allowed file owners to detect when the redundancy of their file(s) decreased. However, this knowledge is far more valuable to a data owner if they can do something about it. In lieu of this, we implemented an operation called patching. If an audit revealed that some copy or copies of a particular shard were unretrievable, then the patch operation would identify a copy that was still retrievable, download that copy, generate a new unique ID for that copy, and send that copy to a new host. It would then update the manifest file, replacing the unretrievable copy with the newly uploaded copy.
At this point in our development process, our system had come a long way. We had a redundancy scheme, file sharding, manifest files to track shards, proofs of retrievability, and file patching. There was still a burning question: Why would people participate in the first place? Sure, file owners had a strong incentive to join the network, but what about potential hosts who had free storage but didn’t necessarily have anything to store? A network composed entirely of peers who only wanted to upload their files wouldn’t be a useful network.
We thought of the most straightforward incentive scheme we could: pay-by-upload. In a pay-by-upload scheme, file owners would pay hosts for storing their files during the upload process. Choosing a simple incentive scheme allowed us to hone in on the technical aspects of implementing transactions in a no-trust environment, despite the fact that a simple incentive scheme was probably not economically robust (and, in fact, a pay-by-upload scheme has important drawbacks. See Limitations and Future Steps for details).
The first implementation strategy we thought of was: pay-before-upload. In a pay-before-upload strategy, the file owner would first identify the host to which they plan to send their file. Then, they would pay the host. They would then send the file to the host after the payment went through.
The problem with this approach was that the file owner couldn’t trust that their data would make it to the host after the payment went through. Because payment and file-transfer were separate operations, the owner could not be sure that both succeeded: they ran the risk of paying for space that their file(s) never actually occupied. Similar problems arose when we explored a pay-after-upload implementation strategy: the problem remained that the file owner cannot be sure that the host has their file intact at the time of payment.
Since a p2p network is a distributed system composed of peers who all possess a local state and can only update each other’s state through passing messages, it made sense that these methods did not provide atomicity between acknowledging receipt of data and payment for that data. Both actions were taking the form of distinct messages. A message is not guaranteed to reach its destination within a particular time frame, nor is it guaranteed to reach its destination at all. We needed a way to “combine” these messages so that one could not succeed without the other succeeding as well.
We accomplished this with smart contracts.
Smart contracts provided us with a means to perform transactions only when all parties involved have satisfied certain conditions.
Smart contracts batch transactions together and apply constraints to these transactions. If one transaction fails, the whole batch fails. If one party wants to pay another party only if certain conditions are met, smart contracts allow us to accomplish this.
Imagine I wanted to hire you to do some work on my house. I didn’t want to pay you ahead of time because I didn’t trust you to run off with the money. You didn’t want to work without getting paid first because you didn’t trust me to pay you once you were finished. So we decide to involve an attorney. You and I decide on a set of specifications the renovations must meet with the attorney present. I give the amount I want to pay you to the attorney, who holds onto it so that I can’t take it while you’re working. Once you finish your work, you submit proof that the work was completed to our agreed-upon specifications. If your work does indeed meet our specifications, then the attorney gives you the money I deposited earlier on. Otherwise, the money is refunded back to me.
Smart contracts can replace the attorney in the above scenario.
Before I go into how we used smart contracts to accomplish the above goal, it is worth taking a small detour to cover pre-requisite knowledge about how smart contracts work in Stellar. From the Stellar developer’s guide:
A Stellar Smart Contract (SSC) is expressed as compositions of transactions that are connected and executed using various constraints.
Further, transactions in the Stellar network are“commands that modify the ledger state,” somewhat akin to SQL queries in a database. Transactions can be composed of multiple operations which are actually responsible for manipulating the ledger’s state. Transactions execute operations sequentially as one batched, atomic event. If one operation fails, the entire transaction fails.
All operations have a particular threshold and all signatures, a particular weight. A transaction will automatically fail if the sum of the weights of the signatures of the transaction do not meet the threshold of any operation within the transaction.
Stellar accounts can also create more accounts with the
create_account operation. This operation can be combined with other operations to set a variety of constraints on the newly created account. In essence, by combining Stellar operations, developers can create complex relationships between users, as well as independent places to hold funds and only release those funds under certain conditions, to certain parties. Importantly (for Layr) Stellar transactions can use a combination of operations to create an automated escrow contract between file hosts and file owners..
In our scenario, we wanted the funds from the file owner to be released to the host only if they could prove they had the owner’s data.
The way we accomplished this was with the following workflow:
- File owner creates and deposits funds into a smart contract
- File owner hashes the encrypted file data
- File owner adds this hash as a hashX signer to the contract
- File owner transmits file and contract ID to the host
- The host attempts to unlock the funds in the contract by submitting a transaction and signing it with the file data it received
- Stellar then hashes the file data and checks to make sure the hash matches the hashX signer
- If the hashes match, then the transaction goes through. Otherwise, it is halted.
An important note here is that although the
hash(fileData) is added as the signer, someone cannot sign with the
hash(fileData): they must submit the
fileData itself. This is important because it ensures that the host cannot pre-generate the
hash(fileData), delete the
fileData and then sign off on the transaction: they must sign off on the transaction with the
fileData itself, thereby guaranteeing atomicity between proof of data possession and payment.
Our smart-contract payments with hashX signers took care of the basic batching of proof of data possession and payment, but there were potential weaknesses to that approach.
- If the hashX signer has enough weight to sign a payment transaction, then the weight of the hashX signer meets medium threshold. Anyone with the pre-image of the hash can therefore sign off on anyone transaction with a medium threshold, including further payments.
- The pre-image of the hash is made public once a transaction signed with the pre-image is submitted to the Stellar network (read about HashX signers here).
In regards to problem 1, the most destructive action a signer with a medium threshold can perform is create a payment. Adding other signers or merging an account require signers with greater weight. Since the account is only funded with enough money to pay for the file data, there is no risk of the funder (file owner) losing extra money. If we were to use a different incentive scheme, such as pay-by-passed-audit, we would opt for keeping the same escrow account between the file owner and each host, funding it for each audit. We would also update the hashX signer to match the hash of the
fileData + challenge salt for each subsequent audit. In this case, we would need to prevent the hashX signer from withdrawing more than they were allowed to withdraw. What we would do to prevent this is:
- Set the weight of the hashX signer to a weight < the low threshold
- Create a pre-authorized transaction that only authorizes withdrawal of funds to cover the passed audit and add this pre-authorized transaction as as a signer.
- Set the weight of the preAuth transaction such that hashX weight + preAuth weight = medium threshold.
This would prevent the host from withdrawing more than they were supposed to.
Problem 2 is a non-issue for us because the smart contracts’ sole purpose is to facilitate a single payment between the host and owner during the upload process. It’s not a privacy issue either since the pre-image (or file data) is encrypted.
However, if the owner needed to issue multiple payments to the host over the course of service — e.g., if the owner was paying the host when the host passed file audits — then this could become an issue because the hashX signature is publicly visible.
The use of pre-authorized transactions, combined with reducing the weight of the HashX signer, mitigates this risk as well.
A pay-by-upload incentive scheme is very simple which allowed us to focus on the nuances and technical challenges of payments in a p2p environment. But designing economically sound (rather than just technically sound) incentive schemes is an important challenge for any decentralized product.
Although a pay-by-upload scheme incentivizes participation, it doesn’t incentivize some of the most important behaviors for file hosts: hosts could easily delete the files immediately after getting paid without any repercussions. In fact, they are incentivized to do so because they would make the same amount of money whilst also preserving their storage space.
An effective incentive scheme should promote the following attributes on the part of hosts:
- Host participation (which pay-by-upload successfully incentivizes)
- Uptime (being available on the network)
- File preservation (keeping the files safe from loss or corruption)
Both uptime incentive and file preservation can be encouraged with a pay-by-passed-audit incentive scheme. In such a scheme, hosts are only paid when they successfully pass audits (i.e., when the files they are storing are both present and unmodified). Since a file owner initiates audits at their discretion, hosts are incentivized to remain online as often as possible (remember that an audit automatically fails if the target host is offline).
Erasure coding is a class of redundancy schemes that produce the same levels of file availability as replication without requiring as much storage. We plan to explore erasure coding in greater depth moving forward before deciding whether its benefits are worth its potential drawbacks.
In our simple replication model, a shard copy for each distinct shard is necessary to reconstruct the original file:
That means that the availability of the file is equivalent to the availability of the least available set of shard copies. All copies of shards 2, 3, and 4 could be online, but if the copies of shard 1 are unavailable, then the file as a whole is also unavailable.
Erasure coding removes this limitation by creating something called parity shards. Parity shards are like “wildcards” that can fill in for other missing shards. If the file is split into 4 shards and then 8 parity shards are created, that provides a total of 12 shards. However, any 4 out of the 12 total shards can be used to construct the original file. As long as any 4 shards are available, the file is also available.
We are currently exploring more sophisticated proof of retrievability methods that:
A. Provide an unbounded number of audits
B. Do not require the host to process the entire file to respond to the audit
C. Do not require the owner to store extra information
D. Still provide high levels of security
I hope you’ve enjoyed reading about our experience building Layr. As experimental software, there are many challenges and improvements ahead. I also hope you’ve developed a clear mental model about some of the core challenges involved with building a decentralized storage system and I urge you to give it a shot! It’s a wonderful learning experience.
Once again, we are all open for full-time opportunities at the time of this article’s writing. If you are interested in having one of us joining your team, please reach out!
Please also feel free to contact any one of us if you have further questions about decentralized cloud storage in general, or the process of building Layr specifically.