Data Availability
The deprecated module x/blob
is the Celestia-compatible module of Sunrise.
This module allows L2 operators to post the data to the Sunrise network. The data will be stored in the Sunrise network until the L2 transactions are finalized in the L1 blockchain.
Off Chain Blob Data (Data Availability v2)
After successfully launching the Sunrise v1 as a specialized Data Availability Layer for Proof of Liquidity, we will introduce an upgrade for Blob features in Sunrise v2, to realize the usecases of Data Availability for fully on-chain AI, gaming, social and so on. Gluon will be the first place to realize the full on chain AI with Sunrise DA.
See Github for details.
In the Sunrise v1 architecture, data_hash
is replaced with the merkle root of the erasure-coded data with 2-dimension Reed Solomon encoding. The data means the txs data in the block. Data Availability Sampling technology plays a role of mitigating the running costs of full nodes with big blocks by enabling the light nodes to verify the data availability without downloading the entire block data.
In this design, trivially all full nodes have to transfer and download the txs data in the mempool. When the sizes of BlobTx
s get larger, the throughput of the network will be limited by the txs transfer in the mempool. This will be an obstacle to apply the Data Availability technology for the usage of large BLOB data on decentralized applications, for example, fully on-chain AI, gaming, social and so on.
To mitigate this bottleneck, we will do these things:
Off chain execution of erasure encoding to generate the erasure-coded BLOB data
Using off chain distributed file transfer system / storage like IPFS, Arweave, etc.
In this new design, MsgPublishData
will have the URI of metadata that has URIs of erasure-coded data shares. The value is assumed to be the URI of decentralized storage / file transfer system like IPFS "ipfs://[ipfs-cid]"
or Arweave "ar://[hash]"
, and it will not be contained by BlobTx
hence the blob data will not be on-chain of Sunrise.
In the consensus network, erasure encoding is not executed anymore. Only the double hash of erasure coded shard data will be included in MsgPublishData
.
Data Availability will be attested through zero knowledge proof using shard_double_hashed
by proving that the validators can know the hash of shard data without disclosing them.
Currently it is assumed to do this process in Vote Extension of ABCI 2.0.
In this design, "long term Data Retrievability" is easy to control by using external storage / file system like IPFS and Arweave whereas the Data Retrievability is not guaranteed by other ecosystem which serve Data Availability. The reason why long term Data Retrievability is not guaranteed by other ecosystem which serve Data Availability is that it is not needed to preserve the tx data of Optimistic Rollups after the challenge period for fraud proofs, or ZK Rollups after the submission of validity proofs.
In conclusion, there are benefits:
The throughput of the network will be increased due to the block size
Easy to control the long term Data Retrievability
Applications for fully on-chain AI, gaming, social and so on can be realized
The decentralization of the network will be improved
Specification for Zero-Knowledge Proof
Terms and Notation
The hash function:
Set of validators:
Set of data shards:
Set of parity shards:
Set of shards:
Overview
This system verifies the possession of data shard hash without exposing
Zero-Knowledge Proof System
The circuit is for one shard .
Public Inputs
Private Inputs
Circuit Constraints
The condition of Data Availability
Notations
Replication Factor (Based only on data shards):
Replication Factor (Based on including parity shards):
Set of proofs submitted by a validator
v
:
Set of valid proofs for a shard
s
:
Requirements for each shard to prove Data Availability
Set of shards which satisfy this condition will be
Requirements for tally to prove Data Availability
Example parameters
10 validators:
20 shards:
10 data shards
10 parity shards
Each validator submits 6 shards proofs
Case A: valid shard s_1
s_1
Validator , and 's proof contain shard and other 5 shards
Validator failed to contain the validity of shard in its proof
However validator and succeeded to contain the validity of shard in its proof, then
It satisfies
Case B: invalid shard s_2
s_2
Validator , and 's proof contain shard and other 5 shards
Validator and failed to contain the validity of shard in its proof
Only validator succeeded to contain the validity of shard in its proof, then
It doesn't satisfy
Case X: shard s_1, s_3-s_11 are valid with the condition above
It satisfies
Case Y: Only shard s_1, s_3 are valid with the condition above
It doesn't satisfy
Comparison Between On-chain DA attestation and Off-chain DA attestation
On-chain DA attestation | Off-chain DA attestation | |
---|---|---|
Data Corruption Durability | 〇 | 〇 |
Tx Mempool Scalability | × | 〇 |
Data Retrievability Control | × | 〇 |
Validators Load Mitigation | × | 〇 |
False-Positive DA Attestation Resistance | 〇 | 〇※ |
Examples | Celestia, Avail, EigenDA, Sunrise V1 | Sunrise V2, Walrus, 0G |
Data Corruption Durability
Tx Mempool Scalability
Data Retrievability Control
Validators Load Mitigation
False-Positive DA Attestation Resistance
Last updated