Data Availability

The deprecated module x/blob is the Celestia-compatible module of Sunrise.

This module allows L2 operators to post the data to the Sunrise network. The data will be stored in the Sunrise network until the L2 transactions are finalized in the L1 blockchain.

Off Chain Blob Data (Data Availability v2)

After successfully launching the Sunrise v1 as a specialized Data Availability Layer for Proof of Liquidity, we will introduce an upgrade for Blob features in Sunrise v2, to realize the usecases of Data Availability for fully on-chain AI, gaming, social and so on. Gluon will be the first place to realize the full on chain AI with Sunrise DA.

See Github for details.

In the Sunrise v1 architecture, data_hash is replaced with the merkle root of the erasure-coded data with 2-dimension Reed Solomon encoding. The data means the txs data in the block. Data Availability Sampling technology plays a role of mitigating the running costs of full nodes with big blocks by enabling the light nodes to verify the data availability without downloading the entire block data.

CometBFT types.proto

// Header defines the structure of a block header.
message Header {
  // basic block info
  cometbft.version.v1.Consensus version  = 1 [(gogoproto.nullable) = false];
  string                        chain_id = 2 [(gogoproto.customname) = "ChainID"];
  int64                         height   = 3;
  google.protobuf.Timestamp     time     = 4 [(gogoproto.nullable) = false, (gogoproto.stdtime) = true];

  // prev block info
  BlockID last_block_id = 5 [(gogoproto.nullable) = false];

  // hashes of block data
  bytes last_commit_hash = 6;  // commit from validators from the last block
  bytes data_hash        = 7;  // transactions

  // hashes from the app output from the prev block
  bytes validators_hash      = 8;   // validators for the current block
  bytes next_validators_hash = 9;   // validators for the next block
  bytes consensus_hash       = 10;  // consensus params for current block
  bytes app_hash             = 11;  // state after txs from the previous block
  bytes last_results_hash    = 12;  // root hash of all results from the txs from the previous block

  // consensus info
  bytes evidence_hash    = 13;  // evidence included in the block
  bytes proposer_address = 14;  // original proposer of the block
}

In this design, trivially all full nodes have to transfer and download the txs data in the mempool. When the sizes of BlobTxs get larger, the throughput of the network will be limited by the txs transfer in the mempool. This will be an obstacle to apply the Data Availability technology for the usage of large BLOB data on decentralized applications, for example, fully on-chain AI, gaming, social and so on.

To mitigate this bottleneck, we will do these things:

  1. Off chain execution of erasure encoding to generate the erasure-coded BLOB data

  2. Using off chain distributed file transfer system / storage like IPFS, Arweave, etc.

In this new design, MsgPublishData will have the URI of metadata that has URIs of erasure-coded data shares. The value is assumed to be the URI of decentralized storage / file transfer system like IPFS "ipfs://[ipfs-cid]" or Arweave "ar://[hash]", and it will not be contained by BlobTx hence the blob data will not be on-chain of Sunrise.

In the consensus network, erasure encoding is not executed anymore. Only the double hash of erasure coded shard data will be included in MsgPublishData.

message MsgPublishData {
  option (cosmos.msg.v1.signer) = "sender";
  string sender = 1;
  string metadata_uri = 2;
  repeated bytes shard_double_hashes = 3;
}

message Metadata {
  uint64 shard_size = 1;
  uint64 shard_count = 2;
  repeated string shard_uris = 3;
}

Data Availability will be attested through zero knowledge proof using shard_double_hashed by proving that the validators can know the hash of shard data without disclosing them.

Currently it is assumed to do this process in Vote Extension of ABCI 2.0.

In this design, "long term Data Retrievability" is easy to control by using external storage / file system like IPFS and Arweave whereas the Data Retrievability is not guaranteed by other ecosystem which serve Data Availability. The reason why long term Data Retrievability is not guaranteed by other ecosystem which serve Data Availability is that it is not needed to preserve the tx data of Optimistic Rollups after the challenge period for fraud proofs, or ZK Rollups after the submission of validity proofs.

In conclusion, there are benefits:

  • The throughput of the network will be increased due to the block size

  • Easy to control the long term Data Retrievability

    • Applications for fully on-chain AI, gaming, social and so on can be realized

  • The decentralization of the network will be improved

Specification for Zero-Knowledge Proof

Terms and Notation

  • The hash function: HH

  • Set of validators: VV

  • Set of data shards: SdS_d

  • Set of parity shards: SpS_p

  • Set of shards: SS

S=Sd∪SpS = S_d \cup S_p

Overview

This system verifies the possession of data shard hash H(si)H(s_i) without exposing H(si)H(s_i)

Zero-Knowledge Proof System

The circuit is for one shard s∈Ss \in S.

Public Inputs

  • H_public2(s)H\_{\text{public}}^2(s)

Private Inputs

  • H_private(s)H\_{\text{private}}(s)

Circuit Constraints

Hpublic2(s)=H(Hprivate(s))H_{\text{public}}^2(s) = H(H_{\text{private}}(s))

The condition of Data Availability

Notations

  • Replication Factor (Based only on data shards): rr

  • Replication Factor (Based on including parity shards): rpr_p

rp=r∣Sd∣∣Sd∣+∣Sp∣r_p = r \frac{|S_d|}{|S_d| + |S_p|}
  • Set of proofs submitted by a validator v: ZvZ_v

∀v∈V, ∣Zv∣=rp∣Sd∣+∣Sp∣∣V∣=r∣Sd∣∣V∣\forall v \in V, \ |Z_v| = r_p \frac{|S_d| + |S_p|}{|V|} = r\frac{|S_d|}{|V|}
  • Set of valid proofs for a shard s: ZsZ_s

Requirements for each shard to prove Data Availability

∣Zs∣rp≥23\frac{|Z_s|}{r_p} \ge \frac{2}{3}

Set of shards which satisfy this condition will be

SavailableS^\text{available}

Requirements for tally to prove Data Availability

∣Savailable∣∣S∣≥∣Sd∣∣Sd∣+∣Sp∣⇒∣Savailable∣≥∣Sd∣\begin{aligned} \frac{|S^\text{available}|}{|S|} &\ge \frac{|S_d|}{|S_d| + |S_p|} \\ \Rightarrow |S^\text{available}| &\ge |S_d| \end{aligned}

Example parameters

  • 10 validators: v∗1,...,v∗10v*1 , ..., v*{10}

  • 20 shards: s∗1,...,s∗20s*1, ..., s*{20}

    • 10 data shards

    • 10 parity shards

  • r=6r = 6

  • rp=6×1010+10=3r_p = 6 \times \frac{10}{10 + 10} = 3

  • Each validator submits 6 shards proofs

    • 3×2010=63 \times \frac{20}{10} = 6

Case A: valid shard s_1

  • Validator v1v_1, v3v_3 and v9v_9 's proof contain shard s1s_1 and other 5 shards

  • Validator v3v_3 failed to contain the validity of shard s1s_1 in its proof

  • However validator v1v_1 and v9v_9 succeeded to contain the validity of shard s1s_1 in its proof, then

    • ∣Z_s1∣=2|Z\_{s_1}| = 2

    • It satisfies ∣Z_s1∣rp≥23\frac{|Z\_{s_1}|}{r_p} \ge \frac{2}{3}

Case B: invalid shard s_2

  • Validator v2v_2, v4v_4 and v10v_10 's proof contain shard s2s_2 and other 5 shards

  • Validator v2v_2 and v4v_4 failed to contain the validity of shard s2s_2 in its proof

  • Only validator v10v_10 succeeded to contain the validity of shard s2s_2 in its proof, then

    • ∣Z_s2∣=1|Z\_{s_2}| = 1

    • It doesn't satisfy ∣Z_s2∣rp≥23\frac{|Z\_{s_2}|}{r_p} \ge \frac{2}{3}

Case X: shard s_1, s_3-s_11 are valid with the condition above

  • ∣Savailable∣=10|S^\text{available}| = 10

  • ∣Sd∣=10|S_d| = 10

  • It satisfies ∣Savailable∣≥∣Sd∣|S^\text{available}| \ge |S_d|

Case Y: Only shard s_1, s_3 are valid with the condition above

  • ∣Savailable∣=2|S^\text{available}| = 2

  • ∣Sd∣=10|S_d| = 10

  • It doesn't satisfy ∣Savailable∣≥∣Sd∣|S^\text{available}| \ge |S_d|

Comparison Between On-chain DA attestation and Off-chain DA attestation

On-chain DA attestationOff-chain DA attestation

Data Corruption Durability

〇

〇

Tx Mempool Scalability

×

〇

Data Retrievability Control

×

〇

Validators Load Mitigation

×

〇

False-Positive DA Attestation Resistance

〇

〇※

Examples

Celestia, Avail, EigenDA, Sunrise V1

Sunrise V2, Walrus, 0G

Data Corruption Durability

Tx Mempool Scalability

Data Retrievability Control

Validators Load Mitigation

False-Positive DA Attestation Resistance

Last updated