Data Availability
Off-Chain Blob Data (Data Availability v2)
Sunrise v2 is designed for high-throughput data availability, offering enhanced scalability and flexibility for applications even such as gaming. To achieve this, Sunrise combines Data Availability Sampling (DAS) with zero-knowledge proof (ZKP) verification, optimizing both network performance and data availability.
Key Features of Sunrise v2
Several enhancements in Sunrise v2 increase throughput, decentralization, and long-term data retrievability:
Off-chain Erasure Encoding: Blob data is erasure-coded off-chain, minimizing the computational and storage load on validators.
Off-chain Storage Integration: Utilizing decentralized storage solutions such as IPFS and Arweave, data shards are stored externally. MsgPublishData includes only a metadata URI pointing to these erasure-coded data shares, reducing the on-chain block size requirements for Blob transactions and enhancing scalability.
The architecture of Sunrise can be seen as a blend between DAC and DA layers with DAS, balancing throughput and trust models for a range of applications.
Design for Optimized Data Transfer
In this setup, full nodes must transfer and download transaction data within the mempool.
As BlobTx
sizes grow, the network’s throughput could be limited by these transaction transfers, creating challenges for applications handling large blob data.
When the sizes of BlobTxs
get larger, the throughput of the network will be limited by the txs transfer in the mempool.
To address this, Sunrise v2 implements the following solutions:
Off-chain Erasure Encoding: Blob data is encoded off-chain to reduce validator load.
External Storage: Blob data is stored on decentralized storage platforms like IPFS and Arweave. Rather than containing blob data on-chain, MsgPublishData holds a metadata URI pointing to erasure-coded data shares.
Data availability is confirmed through zero-knowledge proofs (ZKP) using double-hashed shard data (shard_double_hashed
), allowing validators to verify the presence of shard data without revealing it. This integration is done through Vote Extension of ABCI 2.0.
Benefits of Sunrise v2
Increased Network Throughput: Larger block sizes are achievable by reducing on-chain storage needs.
Enhanced Data Retrievability: Storing data off-chain allows for flexible, long-term data retention.
Improved Network Decentralization: By reducing validator load, Sunrise supports a more decentralized network structure.
Specification for Zero-Knowledge Proof
Terms and Notation
Overview
Zero-Knowledge Proof System
Public Inputs
Private Inputs
Circuit Constraints
The condition of Data Availability
Notations
Requirements for each shard to prove Data Availability
Requirements for tally to prove Data Availability
Example parameters
10 data shards
10 parity shards
Each validator submits 6 shards proofs
Case A: valid shard s_1
Case B: invalid shard s_2
Case X: shard s_1, s_3-s_11 are valid with the condition above
Case Y: Only shard s_1, s_3 are valid with the condition above
Slashing condition for each validator
Comparison Between On-chain DA attestation and Off-chain DA attestation
Data Corruption Durability
In both on-chain and off-chain DA attestations, data corruption durability refers to the ability of the system to detect and prevent corruption of the data.
On-chain attestation, such as Celestia or Sunrise V1, ensures that data is durably available because it is stored directly on-chain, and any tampering or loss of data can be immediately detected by validators.
Off-chain attestation (e.g., Sunrise V2) relies on external systems (like IPFS or Arweave) but can still achieve similar durability by verifying the integrity of the data through erasure coding and zero-knowledge proofs.
Tx Mempool Scalability
Transaction mempool scalability is a major limitation in on-chain DA systems. As the size of transactions (such as BlobTxs
) grows, the transaction mempool, which temporarily holds pending transactions, can become overloaded, limiting throughput and scalability.
In off-chain DA systems, this limitation is mitigated by storing large amounts of data externally, with only the necessary hashes or metadata being stored on-chain. This allows for greater scalability and the ability to process larger volumes of data without congesting the mempool.
Data Retrievability Control
In on-chain DA systems, data retrievability is often tied to the consensus mechanism, which means the data must remain available as long as it is needed for consensus (e.g., fraud proofs or validity proofs). However, long-term data retrievability is not always guaranteed once the consensus is finalized.
Off-chain DA systems, such as Sunrise V2, provide more flexible control over data retrievability because the data is stored in decentralized storage systems (like IPFS or Arweave). This allows for longer-term retention of data and better control over how long data remains accessible.
Validators Load Mitigation
On-chain DA attestation places a heavier load on validators since they are responsible for verifying the data availability directly on-chain. As transaction sizes grow, the computational and storage demands on validators increase, potentially limiting decentralization.
In contrast, off-chain DA attestation significantly reduces the load on validators by outsourcing data storage and retrieval to external systems. Validators only need to verify the availability of data shards through erasure coding and zero-knowledge proofs, which lightens their processing and storage requirements.
False-Positive DA Attestation Resistance
False-positive DA attestation refers to situations where a system incorrectly attests that data is available when, in reality, it is not.
On-chain DA attestation, used by systems like Celestia and Sunrise V1, has strong resistance to false positives since all the data is stored and verified directly on-chain, making it difficult to falsely claim that data is available when it is not.
In off-chain DA attestation (e.g., Sunrise V2), false-positive resistance is maintained through the use of zero-knowledge proofs and cryptographic commitments like erasure coding. By verifying the double-hashed values of shard data, validators can ensure that the data is indeed available without needing to store or directly access the entire data set.
However, there may still be edge cases where off-chain storage solutions or network latency could introduce opportunities for false-positive attestations, though these are minimized by careful design and redundancy in the verification process.
Last updated