-
Notifications
You must be signed in to change notification settings - Fork 340
Fix pad_using bug producing items when it shouldn't
#1079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix pad_using bug producing items when it shouldn't
#1079
Conversation
…next is correct. Also fix pad_using bug where calling next_back after next would still produce items when it shouldn't
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1079 +/- ##
==========================================
- Coverage 94.38% 93.72% -0.67%
==========================================
Files 48 50 +2
Lines 6665 6340 -325
==========================================
- Hits 6291 5942 -349
- Misses 374 398 +24 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Good catch! This isn't a breaking change since the added fields are private. In theory, we might have users depending on the current unintended behavior. Fortunately, we already having a breaking change on |
phimuemue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, thanks for this. Can we simplify your proposal?
In particular, saturating_[add|sub] in next_back makes me nervous because it introduces implicit special cases that are hard to reason about.
Do my suggestions make sense?
src/pad_tail.rs
Outdated
| if self.min == 0 { | ||
| self.iter.next_back() | ||
| } else if self.iter.len() >= self.min { | ||
| self.min -= 1; | ||
| self.iter.next_back() | ||
| let current_iter_len = self.iter.len(); | ||
| let original_iter_len = current_iter_len.saturating_add(self.pos); | ||
| if self.total_len < original_iter_len { | ||
| self.total_len = original_iter_len; | ||
| } | ||
|
|
||
| if self.pos + self.back >= self.total_len { | ||
| return None; | ||
| } | ||
|
|
||
| let padding_count = self.total_len.saturating_sub(current_iter_len + self.pos); | ||
|
|
||
| if self.back < padding_count { | ||
| let idx = self.total_len - self.back - 1; | ||
| self.back += 1; | ||
| Some((self.filler)(idx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect next_back to look sth like this:
if elements_from_next + elements_from_next_back < elements_required { // Note: Same condition as in `fn next`
let elements_remaining = elements_required - (elements_from_next + elements_from_next_back);
elements_from_next_back += 1;
if iter.len() < elements_remaining {
Some((self.filler)(elements_required - elements_from_next_back)) // TODO Is the index computed correctly?
} else {
let e = iter.next_back();
assert!(e.is_some); // If this triggers, incoming ExactSizeIterator is implemented incorrectly. I'd
e
}
} else {
iter.next_back()
}
This seems simpler to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is close but it still fails tests cause it doesn't compute the index properly.
I fixed it up and tested it like this
if self.elements_from_next + self.elements_from_next_back < self.elements_required {
// Note: Same condition as in `fn next`
let elements_remaining =
self.elements_required - (self.elements_from_next + self.elements_from_next_back);
self.elements_from_next_back += 1;
if self.iter.len() < elements_remaining {
Some((self.filler)(
self.elements_required - self.elements_from_next_back,
))
} else {
let e = self.iter.next_back();
assert!(e.is_some()); // If this triggers, incoming ExactSizeIterator is implemented incorrectly. I'd
e
}
} else {
None
}I rewrote your code like this, but it still does have the saturating sub. Tests will fail without it so I'm not so sure what to do there.
let total_consumed = self.elements_from_next + self.elements_from_next_back;
if self.iter.len() == 0 && total_consumed >= self.elements_required {
return None;
}
let elements_remaining = self.elements_required.saturating_sub(total_consumed);
self.elements_from_next_back += 1;
if self.iter.len() < elements_remaining {
Some((self.filler)(
self.elements_required - self.elements_from_next_back,
))
} else {
let e = self.iter.next_back();
assert!(e.is_some());
e
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'd also like to mention: there are sources of saturating_* in the codebase outside of pad_tail.
I think it would also be an improvement to have tests that can trigger cases that check for saturating ops saturating when they shouldn't if it's a concern for code reviews, since it would help the dev cycle.
Trying to write this pad_using properly, there's no way to do this without really good tests, this is really difficult (at least for me).
$ rg "saturating_"
tests/test_std.rs
501: let mut v: Vec<_> = (n..n.saturating_add(m as _)).collect();
537: let mut v: Vec<_> = (n..n.saturating_add(m as _)).collect();
1241: let len = binomial(usize::saturating_sub(n + k, 1), k);
tests/quick.rs
63: org_lower.saturating_sub(self.underestimate),
1985: let result = &v[v.len().saturating_sub(n)..];
src/pad_tail.rs
77: .saturating_add(self.elements_from_next_back);
80: .saturating_add(self.elements_from_next)
82: let lower_bound = total_lower.saturating_sub(consumed);
86: .saturating_add(self.elements_from_next)
88: total_upper.saturating_sub(consumed)
119: let elements_remaining = self.elements_required.saturating_sub(total_consumed);
145: let original_iter_len = iter_len.saturating_add(elements_from_next);
src/size_hint.rs
11: let min = a.0.saturating_add(b.0);
24: low = low.saturating_add(x);
33: low = low.saturating_sub(x);
34: hi = hi.map(|elt| elt.saturating_sub(x));
41: let low = a.0.saturating_mul(b.0);
54: low = low.saturating_mul(x);
src/iter_index.rs
59: (self.end() + 1).saturating_sub(*self.start())
src/peek_nth.rs
72: let unbuffered_items = (n + 1).saturating_sub(self.buf.len());
113: let unbuffered_items = (n + 1).saturating_sub(self.buf.len());
src/sources.rs
25:/// let next = x1.saturating_add(*x2);
src/lib.rs
3673: let mut iter = self.fuse().skip(low.saturating_sub(n));
src/combinations_with_replacement.rs
177: k.saturating_sub(1)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'd also like to mention: there are sources of
saturating_*in the codebase outside ofpad_tail.
True, but we should strive to avoid them if not strictly necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to write this
pad_usingproperly [...], this is really difficult (at least for me).
For me, too. But we must not patch the code here and there until all of our tests succeed.
Instead, we should look for a simple explanation how pad_using works. My explanation looks like this:
iter.pad_using(n, ...)will generate at leastnelements.- Iff
iteritself generatesnor more elements,fillerwill not supply elements.
=> If our iterator already generated more thannelements, we know that we must not callfilleranymore (because more thannelements means that all of them come fromiter). - As you correctly observed in your original PR, counting
nexted andnext_backed elements makes things easier.
I tried to execute this idea in code, and fwiw, fn next_back (as in my edited comment) successfully passes the tests on my machine:
if self.elements_from_next + self.elements_from_next_back < self.elements_required {
// Note: Same condition as in `fn next`
let elements_remaining =
self.elements_required - (self.elements_from_next + self.elements_from_next_back);
self.elements_from_next_back += 1;
if self.iter.len() < elements_remaining {
Some((self.filler)(
self.elements_required - self.elements_from_next_back,
))
} else {
let e = self.iter.next_back();
assert!(e.is_some()); // If this triggers, incoming ExactSizeIterator is implemented incorrectly. I'd
e
}
} else {
self.iter.next_back()
}
Of course I'm biased, but to me my next_back looks easier than your rewritten function. 12
Thinking of it, we may even apply similar logic (i.e. check if we so far generated less than elements_required elements and behave accordingly) for fn next.
Footnotes
-
Mine has no
saturating_sub.saturating_subbegs the question when exactly the operation will actually saturate and when not. ↩ -
Why exactly does yours test
self.iter.len() == 0 && total_consumed >= self.elements_required? In particular, islen==0redundant (because in this case just callingnext_backwould returnNoneanyway)? ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of which: When we establish the elements_from_next + elements_from_next_back < elements_required thing, couldn't we also use it to straightforwardly implement other operations?
fn size_hint(self) {
if (elements_from_next + elements_from_next_back < elements_required) {
let elements_remaining = ...;
let (lo, hi) = iter.size_hint();
if (lo >= elements_remaining) {
// iterator itself will supply all remaining elements
(lo, hi)
} else {
// both iterator and filler may supply remaining elements
... // TODO
}
} else {
// won't touch filler anymore
iter.size_hint()
}
}
I know that explicitly listing relevant cases may look ugly, but this way you can keep the simple cases really simple, and constrain the complexity to the other cases. I consider this better than relying on several calls to saturating_[add|sub] that still have case distinctions - but implicit ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see the edit, the else branch was supposed to be next_back instead of None. That makes sense. With that, my approach looks like this:
fn next_back(&mut self) -> Option<Self::Item> {
let total_consumed = self.elements_from_next + self.elements_from_next_back;
if total_consumed >= self.elements_required {
return self.iter.next_back();
}
let elements_remaining = self.elements_required - total_consumed;
self.elements_from_next_back += 1;
if self.iter.len() < elements_remaining {
Some((self.filler)(
self.elements_required - self.elements_from_next_back,
))
} else {
self.iter.next_back()
}
}It's the same implementation except I prefer guard clauses at the top, so I put the else branch at the top to short circuit, and there are no more saturating ops.
For next, I couldn't do exactly the same pattern cause len is only for ExactSizedIterator so it's slightly different but the same idea.
fn next(&mut self) -> Option<Self::Item> {
let total_consumed = self.elements_from_next + self.elements_from_next_back;
if total_consumed >= self.elements_required {
self.iter.next()
} else if let Some(e) = self.iter.next() {
self.elements_from_next += 1;
Some(e)
} else {
let e = (self.filler)(self.elements_from_next);
self.elements_from_next += 1;
Some(e)
}
}For size_hint, I swapped the saturating ops with max. No saturation required because there's no risk of overflow.
fn size_hint(&self) -> (usize, Option<usize>) {
let total_consumed = self.elements_from_next + self.elements_from_next_back;
if total_consumed >= self.elements_required {
return self.iter.size_hint();
}
let elements_remaining = self.elements_required - total_consumed;
let (low, high) = self.iter.size_hint();
let lower_bound = low.max(elements_remaining);
let upper_bound = high.map(|h| h.max(elements_remaining));
(lower_bound, upper_bound)
}
src/pad_tail.rs
Outdated
| f(acc, item) | ||
| }); | ||
| (pos..self.min).map(self.filler).fold(init, f) | ||
| (pos..self.elements_required).map(self.filler).fold(init, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If next_back has already been called, should we really go until elements_required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I forgot to edit this like I did for rfold. I see rfold seems to have testing whereas fold does not, so I'm going to add some test cases for it in specializations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a block that tests for fold just before the rfold block in test_double_ended_specializations. With the previous (incorrect) implementation:
let mut pos = self.elements_from_next;
init = self.iter.fold(init, |acc, item| {
pos += 1;
f(acc, item)
});
(pos..self.elements_required).map(self.filler).fold(init, f)All the tests passed (so I missed this cause I just find and replaced self.min with self.elements_from_next).
The test I added in test_double_ended_specializations will catch this bug:
check_specialized!(it, |i| {
let mut parameters_from_fold = vec![];
let fold_result = i.fold(vec![], |mut acc, v: I::Item| {
parameters_from_fold.push((acc.clone(), v.clone()));
acc.push(v);
acc
});
(parameters_from_fold, fold_result)
});With the fixed version,
init = self.iter.fold(init, |acc, item| {
start += 1;
f(acc, item)
});
let end = self.elements_required - self.elements_from_next_back;
(start..end).map(self.filler).fold(init, f)The tests all pass again, since accounts for calls to next_back as well as next for fold just like for rfold.
| let mut fwd = it.clone(); | ||
| let mut bwd = it.clone(); | ||
|
|
||
| for _ in fwd.by_ref() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plase make this while fwd.next_back.is_some (just as with bwd).
| if total_consumed >= self.elements_required { | ||
| self.iter.next() | ||
| } else if let Some(e) = self.iter.next() { | ||
| self.elements_from_next += 1; | ||
| Some(e) | ||
| } else { | ||
| let e = (self.filler)(self.elements_from_next); | ||
| self.elements_from_next += 1; | ||
| Some(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice imo. It tells
total_consumeddoes not overcountelements_required, which (presumably) avoids overflows.
phimuemue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, thanks for pulling this through. I like the idea of not overcounting elements_required.
Please either rewrite rfold so that it follows the other methods' logic or remove it altogether (it's better to be (explainably) correct than to be fast).
We're not in a rush, so in case of doubt, please take time and make sure all methods follow a coherent pattern (or state why they can't).
| if total_consumed >= self.elements_required { | ||
| return self.iter.size_hint(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and simple. Don't overcount elements_required, similar to next.
| if total_consumed >= self.elements_required { | ||
| return self.iter.next_back(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, nice. Similar to next.
| } else { | ||
| self.min -= 1; | ||
| Some((self.filler)(self.min)) | ||
| self.iter.next_back() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please assert that the result of next_back is_some. Otherwise incrementing elements_from_next_back does not make sense and we'd possibly overcount elements_required.
You could even think about incrementing in each branch explicitly (just as you did in next). (Or adjust next to match next_back: Aligned logic in next and next_back might be easier to understand.)
| let start = self.iter.len() + self.elements_from_next; | ||
| let remaining = self.elements_required.max(start); | ||
| let end = remaining - self.elements_from_next_back; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. I could not find counterexamples, but the logic here seems to differ from everything else you wrote.
Especially: Why is remaining = elements_required.max(start)? Trying to untangle the - implicit - case distinction incurred by max:
let start = iter.len() + elements_from_next;
if elements_required < start {
let end = start - elements_from_next_back;
init = (start..end).rev().map(self.filler).fold(init, &mut f);
} else {
let end = elements_required - elements_from_next_back;
init = (start..end).rev().map(self.filler).fold(init, &mut f);
}
If elements_required < start, doesn't this mean that the iter itself will be able to generate more elements than elements_required? If so, we don't even need to touch filler in that case, and we can simplify to:
let start = iter.len() + elements_from_next;
if start <= elements_required {
let end = elements_required - elements_from_next_back;
init = (start..end).rev().map(self.filler).fold(init, &mut f);
}
There's another implicit case distinction: start..end is empty if start>=end. Making this case explicit:
let start = iter.len() + elements_from_next;
if start <= elements_required {
let end = elements_required - elements_from_next_back;
if (start < end) {
init = (start..end).rev().map(self.filler).fold(init, &mut f);
}
}
Expanding and reformulating the condition start < end gives this:
start < enditer.len() + elements_from_next < elements_required - elements_from_next_backiter.len() + elements_from_next + elements_from_next_back < elements_required
Now this brings us to known territory - much closer to what you already have in next[_back]. Also note that the last line alone is already a stronger condition than the initial start <= elements_required, so start <= elements_required is not needed.
| let mut start = self.elements_from_next; | ||
| init = self.iter.fold(init, |acc, item| { | ||
| pos += 1; | ||
| start += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Would this overflow if iter itself generated more than usize::MAX elements? And if so, would we end up mistakenly calling filler at the end of fold?
I'm not sure there's a straightforward formulation that would avoid the problem1. I'd be willing to accept it.
Footnotes
-
Apart from messing with
checked_add, which is sth we do not do often. ↩
I added a test to the double ended iterator specialization that catches this bug in
pad_using, where if exhaust the iterator from either callingnextornext_backand then call the other pair, it would still produce elements.If you just add that test, you'll see
pad_usingfails. Next, I fixedpad_usingto keep track of how many items from the back it's taken, so we can use that to correctly exhaust the iterator when needed.Question: since this adds fields + debug formatted fields to a public struct, this is a breaking change, right?Apparently not.Fixes: #1065