From 99e6cf90a4768b24d27e5339b205c18825b53322 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Viktor=20Levente=20T=C3=B3th?= Date: Thu, 4 Jan 2024 13:57:47 +0100 Subject: [PATCH] let's roll --- SWIPs/swip-draft_graffiti-soc.md | 173 +++++++++++++++++++++++++++++++ 1 file changed, 173 insertions(+) create mode 100644 SWIPs/swip-draft_graffiti-soc.md diff --git a/SWIPs/swip-draft_graffiti-soc.md b/SWIPs/swip-draft_graffiti-soc.md new file mode 100644 index 0000000..74d82f3 --- /dev/null +++ b/SWIPs/swip-draft_graffiti-soc.md @@ -0,0 +1,173 @@ +--- +SWIP: +title: GSOC +author: Viktor Levente Tóth (@nugaon) +discussions-to: +status: Draft +type: Standards Track +category (*only required for Standard Track): Core +created: 2024-01-04 +--- + + + +## Simple Summary + + +GSOC (Graffiti Several Owner Chunk) is the next evolutionary step of [Graffiti Feed](https://github.com/fairDataSociety/FIPs/blob/master/text/0062-graffiti-feed.md), hence it is +> a serverless off-chain information signaling on the Kademlia network. + +## Abstract + + +The architecture facilitates finding unknown resources and actors in relation to any particular data by using SOC derived from the data itself. +It can be useful for many use-cases such as forum, content labeling, initializing connection with others and else in a web3 fashion. +It does not require any running backround service to keep up a dynamic workflow and it also can +act as an _inbox_ or an _open port_ of the Bee node. + +Parties aware of the Graffiti consensus are able to read and write under a SOC address which may have multiple payloads. +These payloads can be retrieved one-by-one by a new backward compatible chunk request +or subscribed to if the GSOC address falls within the kademlia neighborhood of the node operator. + +## Motivation + + +### Improvemets on Graffiti Feed + +GSOC offers significantly better properties for aggregation, spam protection, concurrent uploads and pub/sub behaviour over its predecessor. + +### Defining the behavior of SOC with multiple payloads + +This proposal provides a solution for the unhandled situation as well when there are different payloads under one SOC address. +Currently, the retrieval of such a SOC is undeterministic because of nodes cache different versions, +there is no possibility to retrieve all payloads from nodes that store multiple versions +and syncing is possbile within a neighborhood only so this undefined behaviour cannot be eliminated. +One can argue that this is an incompleteness in the system since these unreachable data in question are payed for storing and storers get compensated for that. +This feature is also needed to keep the serviceless fashion of the Graffiti concept, meaning, +the dynamic workflow is maintained without any 3rd party operating and that is accessible for the whole network. + +### Concentrating related data into one neighborhood + +The SOC address can be seem as an _address space_ for Content Addressed Chunks +which allows to channel content into a specific neighborhood. +Thereby, +- the aggregation of related data is easier because a service node can react immediately on chunks under the specified SOC address, +- the postage batch prevents flooding the GSOC with trash content because all data belongs to one neighborhood, +- GSOC can substitute PSS messaging with lower resource requirement._*_ + +_*it does not need to mine CAC address (which requires to build up 7 levels of Binary Merkle Tree each time), only a SOC address for a random assymetric key once. The subscription is also significantly lighter because it does not try to cast all incoming chunk to message only those that have an exact SOC address. With the new SOC chunk request, there is no need to run node for accepting messages, therefore the messaging feature can be available on gateways as well._ + +## Specification + + +### Graffiti Consensus + +GSOC is technically a simple SOC for which an assymmetric keypair must be created and an identifier must be constructed. + +The proposal describes how to generate the keypair and construct the SOC address for data signaling along with its additional requirements. + +#### GSOC assymmetric keypair + +The GSOC private key (consensual private key = `cpk`) is solely derived from the `data` +that can be any binary data to which graffities relate. + +If the data is available on the Swarm network, the CPK equals to its Swarm Address `cpk = swarmAddress(data)`, (which is the topmost segment of the BMT/Chunk Tree) +else the CPK simply equals to its keccak256 hash `cpk = keccak256(data)`. + +The corresponding public key (along with the Ethereum Address `etha`) can be generated by using the ECDSA. + +#### GSOC identifier + +The identifier (GSOC Identifier = `gi`) is nothing else than a sequence of arbitrary 32 bytes which is one of the components used in the SOC address calculation (`socAddress = keccak256(gi, etha)`). + +For that, a hashed string should be used which mines the SOC address as well to the desired neighborhood: +`gi = nonce ? keccak256("AppName:v1") : keccak256("AppName:v1:" + nonce)`, were the _AppName_ is the name of the used consensus and _v1_ is the version of it. +This convention allows to define different consensus on the same resource and quickly mine SOC address to the neighborhood where the service node is active. + +#### Write and read + +GSOC payloads must satisfy a message format defined by the consensus or they will be filtered out as invalid data after retrieval. + +such a message format can be expressed in an interface like this: +```ts +interface GraffitySocRecord { + iaasIdentifier: EthAddress, // 40 chars longs string without 0x prefix +} +``` +on which an assertion function can check whether the message satisfies the conditions. + +The storage client must allow to upload chunk if the wrapped address of the SOC was never uploaded before. +Optionally, it can prevent to use up an additional postage stamp if the wrapped address belonging to the SOC address in question was stamped already. + +Anyone can write a GSOC who is aware of the consensus, nevertheless, the message payload can be signed by the key of the uploader (additionally to SOC signing) +or the message can contain a reference to a storage area that only the sender can write. +If the storage area points to a SOC or Feed of the sender, the identifier or topic should be `psi = keccak256(cpk + gi)` within their personal storage. + +### Change in the Chunk retrieval protocol + +The format of the current chunk request has only one parameter: +``` +ChunkRequest(chunkAddress bytes32) +``` +The _constrained SOC request_ format introduces three optional parameters, namely the `optionType`, `wrappedAddress` and `prefixLength`: +``` +ChunkRequest(chunkAddress bytes32, optionType uint8, wrappedAddress bytes32, prefixLength uint8) +``` + +These express the followings: +- `optionType`: defines what type of option parameters defined after the `chunkAddress`. It is `0` in case of constrained SOC request +- `wrappedAddress`: the address with which the address of the wrapped chunk (signed chunk address in the SOC) must match, +- `prefixLength`: defines how many prefix bits must match of the `wrappedAddress` with the actual wrapped chunk address. + +Nodes must reject the request even if the `chunkAddress` has related chunk but that does not satisfy the other conditions. + +### SOC subscription + +Defining new API endpoint: + +`GET soc/subscribe/{socAddress}` + +that opens a new websocket channel relaying every new incoming SOC payload. + +The same feature is elaborated in GitHub issue [#4333](https://github.com/ethersphere/bee/issues/4333) +though that approaches it from a different use-case. Here, the `socAddress` is obviously the address of the GSOC + +### Postage Stamp Indexing + +Since nodes can store different SOC payloads, new Postage Stamp restriction should be applied in order to allow multiple valid SOC payloads. +This must be introduced because of the incentivisation of the storer nodes; they have to prove that they store different payloads of a common and single SOC address. +The current stamping validation checks whether the stamp owner signed against the SOC address for which any content is valid that has valid SOC singature from the uploader. If it remains like this with GSOC, only the bandwith incentive active for the GSOC content upload that allows infinite content store. +In order to prevent this, the `timeStamp` PS attribute can be utilized to carry the prefix of the wrapped content address of the GSOC. +It has to be applied for the storage incentive SOC checks as well which requires smart contract modification. + +## Rationale + + +The new request format allows to bypass caching and make paid out SOC payloads queryable. +Several other chunk request formats could also satisfy this feature but these points were preferred: +- facilitating the serialization of the arguments and avoiding from overflow edge-cases (fixed lengthed parameters), +- let place for other request modifiers (`optionType`). + +Graffiti Feeds do not allow concurrency even if the users act honestly because those may write on each other's index, making the Graffiti record ambiguous on retrieval. +Moreover, indices in Graffiti Feed cannot keep any canonical order of updates since uploaders use Postage Stamps with different lifespans. +On the other hand, building the idea on handling multiple payloads seems to graduate the Graffiti concept meanwhile defining an undefined behavior in the system. + +As mentioned, storing records in a single neighborhood facilitates aggregation which is essential for popular topics and only possible if record addresses share the same prefix until the storage depth of the network. +This could be done by mining but that is slow, computational heavy and that does not indicate correlation between records. + +## Backwards Compatibility + +On incoming chunk request, clients must check the byte length of its arguments which may differ from the default 32 bytes. +If so, then `optionType` on the 33rd byte must be checked and according to that the request should be handled. +This workflow keeps the original chunk request intact. + +## Test Cases + +_TBD_ + +## Implementation + +_TBD_ + +## Copyright +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).