-
Notifications
You must be signed in to change notification settings - Fork 379
feat: soc dispersed replica v2 #5279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| headers := struct { | ||
| OnlyRootChunk bool `map:"Swarm-Only-Root-Chunk"` | ||
| OnlyRootChunk bool `map:"Swarm-Only-Root-Chunk"` | ||
| RedundancyLevel redundancy.Level `map:"Swarm-Redundancy-Level"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use here Paranoid as default as on other endpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about all other places where Swarm-Redundancy-Level is used as header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. In all other handlers getRedundancyLevel is used to get the redundancy from header and if it is not set, PARANOID is used. Here in feeds, replicas getter is not used if the header is not set. I do not know it that is the expected behaviour.
Maybe Viktor should know when he reviews this PR.
I am not sure if we need headers.RedundancyLevel > redundancy.NONE at all, as replicas getter should not make any replicas requests for NONE redundancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding redundancy.NONE, yes, that make sense, maybe we can directly use:
getter = replicas.NewSocGetter(s.storer.Download(true), rLevel)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
|
Hi @zelig. I would kindly ask you to review this PR for correctness and in general. It is a combined effort from multiple developers and we have tried to validate the behaviour through unit tests, but we would like to have your validation because of potential risks. Would you check also what should be default redundancy levels in api handlers if Swarm-Redundancy-Level is not set in the request? |
| mu.Unlock() | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zelig Would it make sense to check if the chunk data content validates with the requested address addr? Like this:
cac.Valid(swarm.NewChunk(addr, ch.Data()))
As the replica's data hash should be equal with the chunk original address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes absolutely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it is added.
zelig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to keep PRs focussed and clean please. More comments on the concurrency and cancellation needed. How is this used in feeds?
| SETUP_CONTRACT_IMAGE_TAG: "0.9.4" | ||
| BEELOCAL_BRANCH: "main" | ||
| BEEKEEPER_BRANCH: "master" | ||
| BEEKEEPER_BRANCH: "feat/soc-dispersed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for CI to use the updated beekeeper checks. Before this PR is merged, beekeeper main will be up to date. If beekeeper main has the changes for this PR, other PRs CI checks would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the beekeeper branch (PR) where are modifications that are able to test soc disparsed replica on CI. This needs to be reverted to master before merge. Also, beekeeper PR should be merged as well.
pkg/api/pin.go
Outdated
|
|
||
| getter := s.storer.Download(true) | ||
| traverser := traversal.New(getter, s.storer.Cache(), redundancy.DefaultLevel) | ||
| traverser := traversal.New(getter, s.storer.Cache(), redundancy.PARANOID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
pkg/hive/hive_test.go
Outdated
| } else { | ||
| n := (i % 3) + 1 | ||
| for j := 0; j < n; j++ { | ||
| for j := range n { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this change have anyhting to do with the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
| defer cancel() | ||
|
|
||
| var wg sync.WaitGroup | ||
| defer wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would you want to wait for all routines to terminate? unless you cancel them each when you get one response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually needed as a test hack (to use replicas.Wait only avalilable in tests) to avoid data races is removed by nugaon. The current code is race free in production, as well.
Canceling is done when when the chunk is found by calling cancel() on line 92.
If we do not care about other goroutines, we can revert changes in getter.go and getter_test.go and export_test.go so that tests do not report data race. Should we do that?
| mu.Unlock() | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes absolutely
| underlays := make([]ma.Multiaddr, n) | ||
|
|
||
| for i := 0; i < n; i++ { | ||
| for i := range n { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why here, why now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
janos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to keep PRs focussed and clean please.
Will do.
More comments on the concurrency and cancellation needed.
Added.
How is this used in feeds?
SocGetter is used in API handlers feedGetHandler, socGetHandler and serveReference. This is left as it is from commits from the nugaon's PR. What is else required to be added?
| SETUP_CONTRACT_IMAGE_TAG: "0.9.4" | ||
| BEELOCAL_BRANCH: "main" | ||
| BEEKEEPER_BRANCH: "master" | ||
| BEEKEEPER_BRANCH: "feat/soc-dispersed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for CI to use the updated beekeeper checks. Before this PR is merged, beekeeper main will be up to date. If beekeeper main has the changes for this PR, other PRs CI checks would fail.
pkg/api/pin.go
Outdated
|
|
||
| getter := s.storer.Download(true) | ||
| traverser := traversal.New(getter, s.storer.Cache(), redundancy.DefaultLevel) | ||
| traverser := traversal.New(getter, s.storer.Cache(), redundancy.PARANOID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
pkg/hive/hive_test.go
Outdated
| } else { | ||
| n := (i % 3) + 1 | ||
| for j := 0; j < n; j++ { | ||
| for j := range n { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
| defer cancel() | ||
|
|
||
| var wg sync.WaitGroup | ||
| defer wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually needed as a test hack (to use replicas.Wait only avalilable in tests) to avoid data races is removed by nugaon. The current code is race free in production, as well.
Canceling is done when when the chunk is found by calling cancel() on line 92.
If we do not care about other goroutines, we can revert changes in getter.go and getter_test.go and export_test.go so that tests do not report data race. Should we do that?
| mu.Unlock() | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it is added.
| underlays := make([]ma.Multiaddr, n) | ||
|
|
||
| for i := 0; i < n; i++ { | ||
| for i := range n { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
Checklist
Description
This PR is a continuation of the original PR #5057.
Original description
Add support for dispersed replicas on Single Owner Chunks (SOCs) to improve data availability and retrieval reliability in the Swarm network.
SOC replicas are generated by allowing additional addresses to represent the same SOC. It is achieved by lighten the validation of SOCs which ignores the first byte of the address. This makes possible to saturate dispersed replicas evenly across the whole network since nodes arrange into neighborhoods based on address prefix. The addresses created in a way to iterate over all variations in the given depth of the redundancy level + 1 (e.g. level is 2, and the original address starts with 101, then it uploads SOCs with addresses same after the 3 first bits where the first 3 bit variations are 001, 011, (101 is not because the original address has it), 111 and 100 (flipping the last bit).
New changes
Fixes SOC dispersed replica functionality by setting explicit default redundancy levels and fixing implementation issues.
PARANOIDfor handlerswg.Go()instead of manual goroutine managementOpen API Spec Version Changes (if applicable)
feed and soc replica PUT endpoints
swarm-redundancy-levelheader: create and push dispersed replicas according to the passed level: MEDIUM 2, STRONG 4, INSANE 8, PARANOID 16.feed and soc replica GET endpoints
swarm-redundancy-levelheader: for calibrating how deeply dispersed replicas should be checked. By default it is zero.Motivation and Context (Optional)
Related Issue (Optional)
Screenshots (if appropriate):