Skip to content

Conversation

@chandra-siri
Copy link
Collaborator

chore: Add Sys test large obj

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: storage Issues related to the googleapis/python-storage API. labels Dec 18, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chandra-siri, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving the robustness and integrity of asynchronous storage operations, particularly for large objects. It introduces a standardized mechanism to enforce the use of the C-accelerated google-crc32c library, crucial for efficient data integrity checks. Furthermore, it enhances the AsyncAppendableObjectWriter to proactively include checksums during uploads and significantly expands system tests to validate the handling of large objects and segmented writes, ensuring reliable performance and data consistency.

Highlights

  • CRC32C C-Extension Enforcement: Introduced a new utility function raise_if_no_fast_crc32c to ensure the C-accelerated google-crc32c implementation is available, preventing silent performance degradation.
  • Checksum Inclusion in Writes: The AsyncAppendableObjectWriter now includes CRC32C checksums with data chunks during object writes, enhancing data integrity.
  • Expanded System Tests: Added new system tests in test_zonal.py to cover large object uploads and downloads, including scenarios where objects are written in multiple slices, and verifying their checksums.
  • Refactored CRC32C Checks: Consolidated the CRC32C implementation check into the new utility function, which is now called by both AsyncAppendableObjectWriter and AsyncMultiRangeDownloader.
  • Updated Unit Tests: Modified unit tests for AsyncAppendableObjectWriter and AsyncMultiRangeDownloader to reflect the new FailedPrecondition exception type when the C-accelerated CRC32C is not found.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@chandra-siri chandra-siri changed the base branch from main to bidi_writes_checksum December 18, 2025 14:16
@chandra-siri
Copy link
Collaborator Author

/gcbrun(2b1d6ce)

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a check for the C-accelerated google-crc32c implementation to prevent silent performance degradation, refactoring this check into a shared utility function. It also adds CRC32C checksum validation for data chunks in AsyncAppendableObjectWriter. To support these changes, extensive system tests for large object uploads (both single and multi-part) have been added, along with corresponding unit tests. My feedback includes a minor correction to a docstring and a suggestion to reduce code duplication in the new system tests for better maintainability.

I am having trouble creating individual review comments. Click here to see my feedback.

google/cloud/storage/_experimental/asyncio/_utils.py (20-28)

medium

The docstring for raise_if_no_fast_crc32c incorrectly states that the function returns a boolean value. This function either raises an exception or returns None. The docstring should be updated to remove the returns and rtype sections. I've also taken the liberty to format the raises section to follow common style guides.

    """Check if the C-accelerated version of google-crc32c is available.

    If not, raise an error to prevent silent performance degradation.

    Raises:
        google.api_core.exceptions.FailedPrecondition: If the C extension is not available.
    """

tests/system/test_zonal.py (131-168)

medium

There is significant code duplication between test_basic_wrd and test_basic_wrd_in_slices. The logic for client instantiation, data generation, download, verification, and cleanup is identical in both tests.

To improve maintainability, you could refactor this common code into a helper function. The helper could accept a callback to perform the specific writing logic for each test case.

Here's an example of how you could structure it:

async def _perform_wrd_test(storage_client, blobs_to_delete, attempt_direct_path, object_size, write_callable):
    # ... common setup code ...
    object_name = f"test_basic_wrd-{str(uuid.uuid4())}"
    object_data = os.urandom(object_size)
    object_checksum = google_crc32c.value(object_data)
    grpc_client = AsyncGrpcClient(attempt_direct_path=attempt_direct_path).grpc_client

    writer = AsyncAppendableObjectWriter(grpc_client, _ZONAL_BUCKET, object_name)
    await writer.open()

    # Call the specific write logic
    await write_callable(writer, object_data)

    object_metadata = await writer.close(finalize_on_close=True)
    assert object_metadata.size == object_size
    # ... common verification and cleanup ...

# In test_basic_wrd:
await _perform_wrd_test(..., write_callable=lambda writer, data: writer.append(data))

# In test_basic_wrd_in_slices:
async def sliced_write(writer, data):
    mark1, mark2 = _get_equal_dist(0, len(data))
    await writer.append(data[0:mark1])
    await writer.append(data[mark1:mark2])
    await writer.append(data[mark2:])

await _perform_wrd_test(..., write_callable=sliced_write)

This approach would reduce code duplication and make the tests easier to read and maintain.

@chandra-siri chandra-siri marked this pull request as ready for review December 18, 2025 14:27
@chandra-siri chandra-siri requested review from a team as code owners December 18, 2025 14:27
Base automatically changed from bidi_writes_checksum to main December 19, 2025 10:51
@chandra-siri chandra-siri added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 19, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 19, 2025
@chandra-siri
Copy link
Collaborator Author

hi @suni72 , samples 3.14 are blocked because of b/470276398 , it's not releated to these changes and you can ignore them.

3.14 support was recently added and it's still hasn't stabilized yet.

@chandra-siri chandra-siri requested a review from suni72 December 19, 2025 13:36
@chandra-siri chandra-siri merged commit a0668ec into main Dec 19, 2025
16 of 17 checks passed
@chandra-siri chandra-siri deleted the sys_test_large_obj branch December 19, 2025 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the googleapis/python-storage API. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants