Skip to content

Conversation

@holiman
Copy link
Owner

@holiman holiman commented Jul 13, 2023

This PR adds 'cappedFile' support. A cappedFile behaves like a regular os.File, but it actually maps to a set of files, each capped to a max size.
By swapping out the regular files to cappedFile, as backing for the shelves, billy will can be made to respect max file sizes in filesystem (e.g 4GB in fat32).

The cappedFile is not concurrency-safe for spread-out read/writes. That is, if the data to be read crosses file boundaries, then simultaneous read and write may cause data to be corrupted.

However, this can be easily avoided on the upper layer: the shelf can just ensure that the cappedFile limit is a multiple of the shelf size. So instead of using 2 * 1024 * 1024= 2097152, for shelf-size 10, it could use 2097150. If it did, then the write-offsets (2097140, 2097150,2097160) all occur so no writes crosses file boundaries.

@holiman
Copy link
Owner Author

holiman commented Jul 13, 2023

cc @karalabe

@codecov
Copy link

codecov bot commented Jul 13, 2023

Codecov Report

Merging #19 (a920bfe) into main (1c7e68d) will increase coverage by 0.28%.
The diff coverage is 89.89%.

@@            Coverage Diff             @@
##             main      #19      +/-   ##
==========================================
+ Coverage   87.08%   87.37%   +0.28%     
==========================================
  Files           5        6       +1     
  Lines         395      483      +88     
==========================================
+ Hits          344      422      +78     
- Misses         36       42       +6     
- Partials       15       19       +4     

@holiman holiman changed the title wip: use files capped to certain size shelf: use files capped to certain size Jul 13, 2023
@holiman holiman marked this pull request as ready for review July 13, 2023 11:37
@holiman holiman force-pushed the capped_files branch 2 times, most recently from 55f08b2 to 1207366 Compare July 15, 2023 18:15
capped_file.go Outdated
f *os.File
err error
)
if i == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this perhaps be if nFiles == 1? That way for single files I won't have the strange suffit, but for capped files, I will have a proper .cap.0 suffix for the first file too.

I guess the catch is if someone sets blobpool sizes below 2 GB and then above, ten they will flip-flop between the 2 naming schemes.

Any particular reason for supporting the "old" scheme? Can't we just always have "basename.%d"?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning was that in most cases, there's only ever going to be one file used. And in that case, the one and only file (the zeroth file) could be just the basename.
Only if it does go above would we start adding suffixes. So in the 'normal case', there would be zero difference between a system using a capped file and one not using capped files.

capped_file.go Outdated
(offset+uint64(len(data)))/cf.cap >= uint64(len(cf.files)) {
return 0, ErrBadIndex
}
for ; written < totalLength; fileNum++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange because from one aspect, you are implementing the File interface, so this is necessary to support. From another perspective, if the cap is a multiple of the shelf size, then it should be impossible to overflow file boundaries, right?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The capped_file, from a library perspective, handles overflowing file boundaries fine.

Billy uses it in a way which doesn't overflow file boundaries, though, which also means we don't have to worry about concurrency.

But those are two separate things, IMO

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could bleed some of the shelf-size logic into capped_file, then the looping wouldn't be needed, I guess

shelf.go Outdated
// Max 5 files @ 2GB each.
// We also want to ensure that the cap is a multiple of the slot size, so no
// slot crosses a file boundary.
cap := uint64(2*1024*1024 - 2*1024*1024%slotSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This counter does not take into account the billy magic and shelf header metadata. That would push all these "calculated" offsets to be wrong since the first file would have some extra junk that would shift everything (and would need a different cap for the first and subsequent data files).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch! Good catch!

Copy link
Owner Author

@holiman holiman Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one way to handle if differently, without tying capped file too deeply with shelves, is to have the following rule:

  • On write: if offset + len(b) > cap, then write to the 'next' file. And similarly:
  • On read: if offset + len(b) > cap, then read from the 'next' file.

In other words: let the max byte determine the position of the read/write. This has the following characteristics

  • Reading 1 byte may read from different location than reading n bytes and picking out the first.
  • The cap will never be exceeded,
  • As long as read and writes are always done chunkwise (and always done with same offset+lengths),
  • We would never spread a write over several files (assuming the size of the object is smaller than the cap)

everything works fine. Non-chunked read/writes will behave undefined.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, scrap that, won't work. The other even more simple alternative is to allow chunks to overflow the file-cap, and not spread the read/write out across several files. Then the limit is more of a recommendation, which is fine in our case (we use 2GB but the 'hard' limit is 4GB so we've got space to spare)

@holiman
Copy link
Owner Author

holiman commented Jul 19, 2023

Changed the behaviour now, but there's something not right still

[user@work billyfuzz]$ go run . 
Opened ./
1 ops, 2 keys active
Reopening db, ops 1386, keys 703
Opened ./
Reopening db, ops 2288, keys 1019
Opened ./
2289 ops, 1018 keys active
panic: bad index: slot 245, slotsize 1048576, EOF

goroutine 1 [running]:
main.doFuzz(0xc00011c780?)
        /home/user/go/src/github.com/holiman/billy/cmd/billyfuzz/main.go:133 +0xc5b
github.com/urfave/cli/v2.(*Command).Run(0xc00011c780, 0xc00002e380, {0xc000014050, 0x1, 0x1})
        /home/user/go/pkg/mod/github.com/urfave/cli/v2@v2.24.1/command.go:271 +0x9eb
github.com/urfave/cli/v2.(*App).RunContext(0xc00017c000, {0x5ffbd8?, 0xc000018110}, {0xc000014050, 0x1, 0x1})
        /home/user/go/pkg/mod/github.com/urfave/cli/v2@v2.24.1/app.go:333 +0x665
github.com/urfave/cli/v2.(*App).Run(...)
        /home/user/go/pkg/mod/github.com/urfave/cli/v2@v2.24.1/app.go:310
main.main()
        /home/user/go/src/github.com/holiman/billy/cmd/billyfuzz/main.go:51 +0x1c5

@holiman
Copy link
Owner Author

holiman commented Jul 19, 2023

but there's something not right still

fixed now

@holiman
Copy link
Owner Author

holiman commented Sep 5, 2023

@karalabe I think the current implementation is the "most sane". Want to have a review chat about this at some point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants