Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

light/availability: reevaluate light availability success condition #2780

Closed
walldiss opened this issue Sep 29, 2023 · 3 comments · Fixed by #3239
Closed

light/availability: reevaluate light availability success condition #2780

walldiss opened this issue Sep 29, 2023 · 3 comments · Fixed by #3239
Labels
enhancement New feature or request

Comments

@walldiss
Copy link
Member

walldiss commented Sep 29, 2023

Implementation ideas

Lets evaluate light sampling algorithm for withholding attack case, since there seems to be a confusion how this attack should be simulated. #2697 (comment)

  1. Initial simulation script by @distractedm1nd assumes, that k amount of shares are selected for sampling. If any of selected shares are not available, light node will only those, that it was able to sample from initial selection.

  2. Another simulation made by me assumes, that light nodes stores k amount of shares, regardless if initial selection. In other words it tries to sample new shares until k amount is reached.

Current implementation of light availability is different and do not satisfy either of assumtuions. It select random 16 shares from eds and tries to sample them. It will store all successful samples to blockstore, but If any of samples fails, whole sampling operation will fail. Having some samples unavailable have a high probability in withholding attack case. Failed sampling operation causes to DASer to retry sampling attempt for the same height. New attempt will select new set of 16 random shares regardless of previous results. All successfully sampled shares from subsequent calls will also be stored as an addition to previous attempts. Since subsequent attempts will still have probabilistic nature of success/failures and can result in multiple retries. And on every attempt successfully amount of sampled shares will likely grow beyond initially preset sampling limit k.

Lets check an example. For sake for simplicity some values are lowered from our defaults:

  • sampling amount (k) is set to 4
  • eds size 4 (16 shares)
  • assume shares with index >9 are withheld
Attempt 1 (fail): 
  selected to be sampled: 3,6,8,12
  sampled: 3,6,8
  failed: 12
  in store:  3,6,8 (3)
  
Attempt 2 (fail): 
  selected to be sampled: 1,4,3,15
  sampled: 1,4,3,15
  failed: 15
  in store:  1,3,4,6,8 (5)
  
Attempt 3 (success): 
  selected to be sampled: 2,3,6,9
  sampled: 2,3,6,9
  failed: 
  in store:  1,2,3,4,6,8,9 (7)

In the example you can see, how more than k samples are stored by light node.

Lets evaluate how many shares needs to be sampled in case of withholding attack and modify light availability accordingly if needed.

@distractedm1nd
Copy link
Collaborator

Light nodes should clearly be doing (2), as it allows for the lowest c in a withholding scenario. If we simulate this case, how many light nodes are needed for BarelyRecoverable to hit 99%?

What is confusing to me is that I have roughly seen the numbers with the simulation match up in BarelyRecoverable case - which is before their sampling routines time out. If they don't store anything until after timing out, how is this test case consistently passing with the given parameters?

@walldiss
Copy link
Member Author

Honestly I have just though we had (2) assumption until I've checked the code. i just wonder if I might miss some security concerns, that lead to current implementation to be different.

Assumption 2 gives nice probability distribution for reconstruction too, but it is also a bit lower from one that is mentioned in paper, which makes me think there is some more details missing somewhere.
image

@distractedm1nd
Copy link
Collaborator

I think the missing detail here is that the paper is measuring something different in that section: Namely "The probability that the players collectively sample at least k(3k - 2) shares". This amount of shares ensures recoverability. The issue here is that it also assumes all shares are available for sampling. When we actually carry out the BarelyRecoverable case, we are limiting the amount of possible shares, which the equation does not account for.

walldiss added a commit that referenced this issue Mar 15, 2024
…ility call (#3239)

This PR introduces persistence for sample selection in random sampling.
It addresses the issue by storing all failed samples into the datastore,
allowing them to be reloaded on the next sampling attempt. This ensures
that if the availability call fails fully or partially during the last
sampling attempt, the sampling retry will use the same preselected
random coordinates of shares.

Provided solution is backwards compatible with previously stored empty
byte slice on sampling success, allowing the change to be non-breaking
for existing storage.


Additionally, this PR includes basic refactoring to simplify concurrency
logic in availability. It also ensures that errors returned by the call
are aligned with the interface declaration in
[availability.go](https://github.com/celestiaorg/celestia-node/blob/main/share/availability.go)
enhancing code consistency and maintainability.

Resolves #2780
renaynay pushed a commit to renaynay/celestia-node that referenced this issue Apr 3, 2024
…ility call (celestiaorg#3239)

This PR introduces persistence for sample selection in random sampling.
It addresses the issue by storing all failed samples into the datastore,
allowing them to be reloaded on the next sampling attempt. This ensures
that if the availability call fails fully or partially during the last
sampling attempt, the sampling retry will use the same preselected
random coordinates of shares.

Provided solution is backwards compatible with previously stored empty
byte slice on sampling success, allowing the change to be non-breaking
for existing storage.


Additionally, this PR includes basic refactoring to simplify concurrency
logic in availability. It also ensures that errors returned by the call
are aligned with the interface declaration in
[availability.go](https://github.com/celestiaorg/celestia-node/blob/main/share/availability.go)
enhancing code consistency and maintainability.

Resolves celestiaorg#2780
renaynay pushed a commit to renaynay/celestia-node that referenced this issue Apr 3, 2024
…ility call (celestiaorg#3239)

This PR introduces persistence for sample selection in random sampling.
It addresses the issue by storing all failed samples into the datastore,
allowing them to be reloaded on the next sampling attempt. This ensures
that if the availability call fails fully or partially during the last
sampling attempt, the sampling retry will use the same preselected
random coordinates of shares.

Provided solution is backwards compatible with previously stored empty
byte slice on sampling success, allowing the change to be non-breaking
for existing storage.


Additionally, this PR includes basic refactoring to simplify concurrency
logic in availability. It also ensures that errors returned by the call
are aligned with the interface declaration in
[availability.go](https://github.com/celestiaorg/celestia-node/blob/main/share/availability.go)
enhancing code consistency and maintainability.

Resolves celestiaorg#2780
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants