-
Notifications
You must be signed in to change notification settings - Fork 111
refactor(dataset): Redirect multipart upload through File Service #4130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(dataset): Redirect multipart upload through File Service #4130
Conversation
|
Why I prefer backing multipart uploads with a DB I am proposing to store multipart upload state in a DB instead of keeping everything fully stateless, for several reasons:
|
|
@carloea2 Please schedule a meeting to discuss the details then report the results here. |
aicam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM
| ) | ||
|
|
||
| val raw = | ||
| s"$Version|${payload.uploadId}|${payload.did}|${payload.uid}|$filePathB64|$physicalB64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are structured ways to use encryption instead of manually concatenating by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concatenation and use of | is just for convenience, would you prefer a JSON approach? still the encryption will use the raw chars of the JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed to JSON
| * - malformed structure | ||
| * - unsupported version | ||
| */ | ||
| def decode(token: String): UploadTokenPayload = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, decode function has too much manual work which I believe library should already provide high level functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still with JSON there will be some manual work to be done, how will you modularize this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version uses JSON
|
|
||
| val cipherText = cipher.doFinal(plain.getBytes(StandardCharsets.UTF_8)) | ||
|
|
||
| val combined = new Array[Byte](iv.length + cipherText.length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments to explain which part of algorithm is done by each line and why we need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| final case class MultipartUploadInfo(key: String, uploadId: String) | ||
|
|
||
| /** Minimal info about a completed part in an upload. */ | ||
| final case class PartInfo(partNumber: Int, eTag: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these case classes used as type definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I only keep the info needed and discard others
|
Thank you for your comments Ali, I will take a look at it. |
|
According to our discussion, we will be back to an DB approach, #4136 |
What changes were proposed in this PR?
Backend (
DatasetResource)Partially new multipart upload API:
POST /dataset/multipart-upload?type=init→ creates a LakeFS multipart upload session, encode uploadId|did|uid|filePathB64|physicalAddress into a server side encrypted uploadToken, and returns it.POST /dataset/multipart-upload/part?token=...&partNumber=...→ streams a single partPOST /dataset/multipart-upload?type=finish|abort→ completes or aborts the LakeFS multipart uploadKeep existing access control and dataset permissions enforced on all changed endpoints.
Frontend service (
dataset.service.ts)Main changes in
multipartUpload(...):initto get encrypteduploadToken./multipart-upload/partstreaming them with concurrency.Frontend component (
dataset-detail.component.ts)uploadTokento cancel/abort.Any related issues, documentation, discussions?
Closes #4110
How was this PR tested?
Manually uploaded large files via the dataset detail page (single and multiple), checked:
COMPLETED, LakeFS object present, dataset version creation works, use of files in workflow is correct).Was this PR authored or co-authored using generative AI tooling?
GPT partial use.