Skip to content

Conversation

@gacevicljubisa
Copy link
Member

@gacevicljubisa gacevicljubisa commented Nov 6, 2025

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

This PR is a continuation of the original PR #5057.

Original description

Add support for dispersed replicas on Single Owner Chunks (SOCs) to improve data availability and retrieval reliability in the Swarm network.

SOC replicas are generated by allowing additional addresses to represent the same SOC. It is achieved by lighten the validation of SOCs which ignores the first byte of the address. This makes possible to saturate dispersed replicas evenly across the whole network since nodes arrange into neighborhoods based on address prefix. The addresses created in a way to iterate over all variations in the given depth of the redundancy level + 1 (e.g. level is 2, and the original address starts with 101, then it uploads SOCs with addresses same after the 3 first bits where the first 3 bit variations are 001, 011, (101 is not because the original address has it), 111 and 100 (flipping the last bit).

New changes

Fixes SOC dispersed replica functionality by setting explicit default redundancy levels and fixing implementation issues.

  • Set default redundancy level to PARANOID for handlers
  • Use wg.Go() instead of manual goroutine management
  • Change redundancy level header to pointer type for proper optional handling
  • Refactor feeds factory to use functional options pattern
  • Add comprehensive tests for SOC replicas
  • Introduce the iterator for iterating over replica addreses one by one for better concurrency management and as the original implementation had bugs

Open API Spec Version Changes (if applicable)

feed and soc replica PUT endpoints swarm-redundancy-level header: create and push dispersed replicas according to the passed level: MEDIUM 2, STRONG 4, INSANE 8, PARANOID 16.

feed and soc replica GET endpoints swarm-redundancy-level header: for calibrating how deeply dispersed replicas should be checked. By default it is zero.

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

@gacevicljubisa gacevicljubisa changed the base branch from feat/soc-dispersed to master November 12, 2025 13:54
@gacevicljubisa gacevicljubisa changed the title fix: soc dispersed replica feat: soc dispersed replica v2 Nov 12, 2025
headers := struct {
OnlyRootChunk bool `map:"Swarm-Only-Root-Chunk"`
OnlyRootChunk bool `map:"Swarm-Only-Root-Chunk"`
RedundancyLevel redundancy.Level `map:"Swarm-Redundancy-Level"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use here Paranoid as default as on other endpoints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about all other places where Swarm-Redundancy-Level is used as header?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. In all other handlers getRedundancyLevel is used to get the redundancy from header and if it is not set, PARANOID is used. Here in feeds, replicas getter is not used if the header is not set. I do not know it that is the expected behaviour.

Maybe Viktor should know when he reviews this PR.

I am not sure if we need headers.RedundancyLevel > redundancy.NONE at all, as replicas getter should not make any replicas requests for NONE redundancy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding redundancy.NONE, yes, that make sense, maybe we can directly use:

	getter = replicas.NewSocGetter(s.storer.Download(true), rLevel)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@janos
Copy link
Member

janos commented Nov 14, 2025

Hi @zelig. I would kindly ask you to review this PR for correctness and in general. It is a combined effort from multiple developers and we have tried to validate the behaviour through unit tests, but we would like to have your validation because of potential risks. Would you check also what should be default redundancy levels in api handlers if Swarm-Redundancy-Level is not set in the request?

@janos janos requested a review from zelig November 14, 2025 11:11
mu.Unlock()
return
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zelig Would it make sense to check if the chunk data content validates with the requested address addr? Like this:

cac.Valid(swarm.NewChunk(addr, ch.Data()))

As the replica's data hash should be equal with the chunk original address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes absolutely

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it is added.

Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to keep PRs focussed and clean please. More comments on the concurrency and cancellation needed. How is this used in feeds?

SETUP_CONTRACT_IMAGE_TAG: "0.9.4"
BEELOCAL_BRANCH: "main"
BEEKEEPER_BRANCH: "master"
BEEKEEPER_BRANCH: "feat/soc-dispersed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for CI to use the updated beekeeper checks. Before this PR is merged, beekeeper main will be up to date. If beekeeper main has the changes for this PR, other PRs CI checks would fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the beekeeper branch (PR) where are modifications that are able to test soc disparsed replica on CI. This needs to be reverted to master before merge. Also, beekeeper PR should be merged as well.

pkg/api/pin.go Outdated

getter := s.storer.Download(true)
traverser := traversal.New(getter, s.storer.Cache(), redundancy.DefaultLevel)
traverser := traversal.New(getter, s.storer.Cache(), redundancy.PARANOID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

} else {
n := (i % 3) + 1
for j := 0; j < n; j++ {
for j := range n {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this change have anyhting to do with the PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

defer cancel()

var wg sync.WaitGroup
defer wg.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would you want to wait for all routines to terminate? unless you cancel them each when you get one response

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually needed as a test hack (to use replicas.Wait only avalilable in tests) to avoid data races is removed by nugaon. The current code is race free in production, as well.

Canceling is done when when the chunk is found by calling cancel() on line 92.

If we do not care about other goroutines, we can revert changes in getter.go and getter_test.go and export_test.go so that tests do not report data race. Should we do that?

mu.Unlock()
return
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes absolutely

underlays := make([]ma.Multiaddr, n)

for i := 0; i < n; i++ {
for i := range n {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why here, why now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to keep PRs focussed and clean please.

Will do.

More comments on the concurrency and cancellation needed.

Added.

How is this used in feeds?

SocGetter is used in API handlers feedGetHandler, socGetHandler and serveReference. This is left as it is from commits from the nugaon's PR. What is else required to be added?

SETUP_CONTRACT_IMAGE_TAG: "0.9.4"
BEELOCAL_BRANCH: "main"
BEEKEEPER_BRANCH: "master"
BEEKEEPER_BRANCH: "feat/soc-dispersed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for CI to use the updated beekeeper checks. Before this PR is merged, beekeeper main will be up to date. If beekeeper main has the changes for this PR, other PRs CI checks would fail.

pkg/api/pin.go Outdated

getter := s.storer.Download(true)
traverser := traversal.New(getter, s.storer.Cache(), redundancy.DefaultLevel)
traverser := traversal.New(getter, s.storer.Cache(), redundancy.PARANOID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

} else {
n := (i % 3) + 1
for j := 0; j < n; j++ {
for j := range n {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

defer cancel()

var wg sync.WaitGroup
defer wg.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually needed as a test hack (to use replicas.Wait only avalilable in tests) to avoid data races is removed by nugaon. The current code is race free in production, as well.

Canceling is done when when the chunk is found by calling cancel() on line 92.

If we do not care about other goroutines, we can revert changes in getter.go and getter_test.go and export_test.go so that tests do not report data race. Should we do that?

mu.Unlock()
return
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it is added.

underlays := make([]ma.Multiaddr, n)

for i := 0; i < n; i++ {
for i := range n {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

@janos janos requested a review from zelig November 21, 2025 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants