Skip to content

Conversation

@bdeggleston
Copy link
Member

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

@bdeggleston bdeggleston requested a review from aweisberg December 2, 2025 18:06
Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing just noticed this

@aweisberg aweisberg self-requested a review December 2, 2025 22:14
Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished review of the implementation code. Still need to finish reviewing the test code.

For range reads you 100% need to adapt InteropTokenRangeTest to test this splitting code. There are a bunch of nasty corner cases when dealing with range reads and what ends up being generated as the bounds of the read command and that covers most of them.

I also think that property based tests would better for the things that are unit testable and maybe even dtests. It would be nice to create a partitioner that has a range of like +5 to -5 but is otherwise identical to a real partitioner and hashes integer values to their real value. Then property tests could enumerate the entire state space very quickly.

public Future<?> applyFuture(Mutation mutation, boolean writeCommitLog, boolean updateIndexes)
{
return getMetadata().useMutationTracking()
return MigrationRouter.isFullyTracked(mutation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's partially tracked and migrating to mutation tracking and don't apply the partially tracked portion as tracked then it forces background reconciliation to reconcile all the mutations? I don't get how it makes sense to bypass the mutation tracking plumbing during migration.

I thought that writes would flow through the mutation tracking system and background reconciliation would occur and it would be fine even if there it does extra/redundant work with read repair.

Basically I think for writes there isn't a range based form of "is it tracked" and that is only relevant for reads which can't be correct without relying on range based repairs Just throw it into the mutation tracking system and call it good?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't bypass mutation tracking plumbing during migration. The use of MigrationRouter.isFullyTracked(mutation) here is making the assumption that the mutation routing logic did it's job properly higher up the stack and has already produced mutations that are either fully tracked or fully untracked. Probably the right thing to do here is confirm that the mutation is fully one or the other and select the correct path accordingly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so the splitting is supposed to happen at the coordinator, but that is racy since things could change when we get here? Then we surface an error to the client?

StorageProxy.mutateWithTracking is also an unused method now.

.map(table -> table.id)
.collect(Collectors.toList());

migrationState = migrationState.withKeyspaceMigrating(diff.after.name, tableIds, nextEpoch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little sneaky that this is a reversal on whatever is already there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment or a name change to reflect the fact that it does reversal?

Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noticed that the check to see if the repair is in an epoch that can effect the ranges seems to be missing in MutationTrackingMigrationState

Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a happens before edge for repair where when repair starts and snapshots tables that it is guaranteed there are no inflight non-tracked mutations that might not be included in the repair.

The way to ensure this would be for nodes to refuse to apply mutations if they think the write should have been tracked and for repair to then wait out in flight writes after it has updated cluster metadata at each node as part of prepare.

WDYT?

@bdeggleston
Copy link
Member Author

I don't see a happens before edge for repair where when repair starts and snapshots tables that it is guaranteed there are no inflight non-tracked mutations that might not be included in the repair.

The way to ensure this would be for nodes to refuse to apply mutations if they think the write should have been tracked and for repair to then wait out in flight writes after it has updated cluster metadata at each node as part of prepare.

WDYT?

I added a "wait for epoch" in the prepare phase handler for repair which should serve that purpose. A repair starts, it waits for the epoch, and then no more untracked writes will be coordinated by it by the time it replies

@bdeggleston
Copy link
Member Author

Can you go into more detail about what you’d expect to see for the interop token range test and the property tests?

In the case of the interop token range tests, there’s not really an analog for interop reads. Reads are either tracked or untracked. Untracked reads are well tested, and tracked testing / hardening is a separate issue that’s already being worked on. The interop tests don’t check reads in partially migrated clusters afaict.

In the case of property tests, we’re mainly checking a handful of state changes driven by schema changes and repair here. We’re not dealing with the sort of multiple intersecting state changes and parameter spaces that property tests are good at uncovering bugs in. I could see value in property based simulator testing of reads and writes and state changes, but I don’t think that’s possible without a lot of simulator work.

return applyInternal(mutation, writeCommitLog, updateIndexes, true, true, new AsyncPromise<>());
case TRACKED:
return applyInternalTracked(mutation, new AsyncPromise<>());
case MIXED:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite problematic because it's going to happen when migrating away from mutation tracking and then your unlogged batches stop working?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we've gotten here we expect the logic higher up the call stack to have properly separated the mutation into tracked and untracked. That said, there is a race where things may have changed during the processing of the mutation. However, not doing this creates a possible (albeit unlikely) race where we apply an untracked mutation to a tracked keyspace, which will create a read monotonicity correctness issue.


metadata = ClusterMetadata.current();
KeyspaceMigrationInfo migrationInfo = metadata.mutationTrackingMigrationState.getKeyspaceInfo(ksName);
assertNotNull(migrationInfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really assert it is as expected, tables, ranges, epoch etc to make sure nothing is mangled from AlterSchema. I know a lot of this is already tested in KeyspaceMigrationInfoTest, but this does the end to end part with AlterSchema.

@aweisberg
Copy link
Contributor

aweisberg commented Dec 11, 2025

RE InteropTokenRange test and testing splitting of range reads
The take away from InteropTokenRangeTest is that you test queries with those token bounds and GT/GTE/LT/LTE to make sure that your migration code correctly splits range reads across tracked/untracked boundaries. There are just a ton of edge cases there. The thing you are testing here is the splitting and that the commands it outputs are valid and actually return the data that is supposed to be in the read.

InteropTokenRangeTest doesn't demonstrate different permutations of migration, but there a small number of tokens so you can enumerate and test all range migration combinations.

I added a "wait for epoch" in the prepare phase handler for repair which should serve that purpose. A repair starts, it waits for the epoch, and then no more untracked writes will be coordinated by it by the time it replies

That tells you that the epoch has been received but it doesn't tell you anything about in flight writes that didn't complete yet? For that you would do the op order thing where you have the node update to the epoch and then create an op order to make sure all in flight writes that don't use that epoch are retired.

There is still the separate issue of this can cause writes to fail and the errors get surfaced to the client which is not ideal.

RE Hints
Right now hints will get stuck for all keyspaces if you enable mutation tracking and there are hints for a keyspace with mutation tracking. That seems problematic. Hints are host oriented and contain a log of hints for multiple keyspaces. There doesn't seem to be graceful cleanup of hints for keyspaces that have mutation tracking enabled so hint delivery for other keyspaces can continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants