-
Notifications
You must be signed in to change notification settings - Fork 12
shelf: use files capped to certain size #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
cc @karalabe |
Codecov Report
@@ Coverage Diff @@
## main #19 +/- ##
==========================================
+ Coverage 87.08% 87.37% +0.28%
==========================================
Files 5 6 +1
Lines 395 483 +88
==========================================
+ Hits 344 422 +78
- Misses 36 42 +6
- Partials 15 19 +4 |
55f08b2 to
1207366
Compare
capped_file.go
Outdated
| f *os.File | ||
| err error | ||
| ) | ||
| if i == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this perhaps be if nFiles == 1? That way for single files I won't have the strange suffit, but for capped files, I will have a proper .cap.0 suffix for the first file too.
I guess the catch is if someone sets blobpool sizes below 2 GB and then above, ten they will flip-flop between the 2 naming schemes.
Any particular reason for supporting the "old" scheme? Can't we just always have "basename.%d"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning was that in most cases, there's only ever going to be one file used. And in that case, the one and only file (the zeroth file) could be just the basename.
Only if it does go above would we start adding suffixes. So in the 'normal case', there would be zero difference between a system using a capped file and one not using capped files.
capped_file.go
Outdated
| (offset+uint64(len(data)))/cf.cap >= uint64(len(cf.files)) { | ||
| return 0, ErrBadIndex | ||
| } | ||
| for ; written < totalLength; fileNum++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is strange because from one aspect, you are implementing the File interface, so this is necessary to support. From another perspective, if the cap is a multiple of the shelf size, then it should be impossible to overflow file boundaries, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The capped_file, from a library perspective, handles overflowing file boundaries fine.
Billy uses it in a way which doesn't overflow file boundaries, though, which also means we don't have to worry about concurrency.
But those are two separate things, IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could bleed some of the shelf-size logic into capped_file, then the looping wouldn't be needed, I guess
shelf.go
Outdated
| // Max 5 files @ 2GB each. | ||
| // We also want to ensure that the cap is a multiple of the slot size, so no | ||
| // slot crosses a file boundary. | ||
| cap := uint64(2*1024*1024 - 2*1024*1024%slotSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This counter does not take into account the billy magic and shelf header metadata. That would push all these "calculated" offsets to be wrong since the first file would have some extra junk that would shift everything (and would need a different cap for the first and subsequent data files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch! Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one way to handle if differently, without tying capped file too deeply with shelves, is to have the following rule:
- On write: if
offset + len(b) > cap, then write to the 'next' file. And similarly: - On read: if
offset + len(b) > cap, then read from the 'next' file.
In other words: let the max byte determine the position of the read/write. This has the following characteristics
- Reading
1byte may read from different location than readingnbytes and picking out the first. - The
capwill never be exceeded, - As long as read and writes are always done chunkwise (and always done with same
offset+lengths), - We would never spread a write over several files (assuming the size of the object is smaller than the cap)
everything works fine. Non-chunked read/writes will behave undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, scrap that, won't work. The other even more simple alternative is to allow chunks to overflow the file-cap, and not spread the read/write out across several files. Then the limit is more of a recommendation, which is fine in our case (we use 2GB but the 'hard' limit is 4GB so we've got space to spare)
|
Changed the behaviour now, but there's something not right still |
fixed now |
|
@karalabe I think the current implementation is the "most sane". Want to have a review chat about this at some point? |
This PR adds 'cappedFile' support. A cappedFile behaves like a regular
os.File, but it actually maps to a set of files, each capped to a max size.By swapping out the regular files to cappedFile, as backing for the shelves, billy will can be made to respect max file sizes in filesystem (e.g 4GB in fat32).
The cappedFile is not concurrency-safe for spread-out read/writes. That is, if the data to be read crosses file boundaries, then simultaneous read and write may cause data to be corrupted.
However, this can be easily avoided on the upper layer: the shelf can just ensure that the cappedFile limit is a multiple of the shelf size. So instead of using
2 * 1024 * 1024= 2097152, for shelf-size10, it could use2097150. If it did, then the write-offsets (2097140,2097150,2097160) all occur so no writes crosses file boundaries.