Trying to make csv processing faster #387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

jzbahrai wants to merge 4 commits into main from task/process-csv

Collaborator

jzbahrai commented Dec 30, 2025

Summary | Résumé

TODO: 1-3 sentence description of the changed you're proposing.

Related Issues | Cartes liées

Test instructions | Instructions pour tester la modification

TODO: Fill in test instructions for the reviewer.

Release Instructions | Instructions pour le déploiement

None.

Reviewer checklist | Liste de vérification du réviseur

This PR does not break existing functionality.
This PR does not violate GCNotify's privacy policies.
This PR does not raise new security concerns. Refer to our GC Notify Risk Register document on our Google drive.
This PR does not significantly alter performance.
Additional required documentation resulting of these changes is covered (such as the README, setup instructions, a related ADR or the technical documentation).

⚠ If boxes cannot be checked off before merging the PR, they should be moved to the "Release Instructions" section with appropriate steps required to verify before release. For example, changes to celery code may require tests on staging to verify that performance has not been affected.


          added test

006a091

Copilot AI review requested due to automatic review settings

December 30, 2025 22:07

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR introduces parallel processing for CSV recipient files to improve performance when handling large SMS datasets. The implementation uses ThreadPoolExecutor to process rows concurrently for SMS files with 1000+ rows, while maintaining sequential processing for smaller datasets and non-SMS templates.

Key Changes:

Added parallel processing with ThreadPoolExecutor for large SMS datasets (≥1000 rows)
Refactored get_rows() to support both sequential and parallel processing paths
Added comprehensive test coverage for parallel processing scenarios including error detection, row ordering, and international numbers

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
notifications_utils/recipients.py	Implements parallel processing logic with new methods `_process_single_row`, `_get_rows_parallel`, and worker function `_process_row_chunk`
tests/test_recipient_csv.py	Adds 9 new test cases covering parallel processing behavior, error handling, and edge cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

notifications_utils/recipients.py Outdated Show resolved Hide resolved

notifications_utils/recipients.py Show resolved Hide resolved

notifications_utils/recipients.py Outdated Show resolved Hide resolved

notifications_utils/recipients.py Outdated Show resolved Hide resolved

notifications_utils/recipients.py Outdated Show resolved Hide resolved

notifications_utils/recipients.py Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems

View reviewed changes

notifications_utils/recipients.py Fixed Show fixed Hide fixed

fix

2c9b329

jzbahrai requested a review from Copilot

December 31, 2025 16:15

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

notifications_utils/recipients.py

    
                      total_rows = min(len(rows_data), self.max_rows)

                      # Determine optimal chunk size and worker count

                      chunk_size, max_workers = control_chunk_and_worker_size(data_size=total_rows, chunk_size=None, max_workers=None)

Copilot AI Dec 31, 2025

The control_chunk_and_worker_size function is imported but never defined in this repository. This will cause a runtime error when parallel processing is triggered. Ensure this function is implemented in notifications_utils/parallel.py or imported from the correct module.

Copilot uses AI. Check for mistakes.

notifications_utils/recipients.py Outdated

    
                                      error_row = Row(

                                          OrderedDict(),

                                          index=idx,

                                          error_fn=lambda k, v: None,

Copilot AI Dec 31, 2025

The lambda function lambda k, v: None is duplicated in three places (lines 302, 319, and error handling). Consider extracting this into a named function or class constant to improve maintainability and clarify its purpose as a no-op error function.

Copilot uses AI. Check for mistakes.

notifications_utils/recipients.py

    
                                  self.placeholders_as_column_keys,

                                  self.template_type,

                                  self.international_sms,

                                  self.template.content if self.template else None,

Copilot AI Dec 31, 2025

Passing template.content alone is insufficient for message length validation. The validate_sms_message_length function expects template_content_length to be the actual content string (not its length), but this naming suggests confusion about what's being passed. Review the validation logic in _process_row_chunk (lines 813-817, 839-843) to ensure it correctly validates message length with just the template content string.

Copilot uses AI. Check for mistakes.

notifications_utils/recipients.py

Comment on lines +867 to +868

    
                      if has_message_too_long:

                          row_obj.message_too_long = True

Copilot AI Dec 31, 2025

The message_too_long flag is being set directly on the Row object after creation, which bypasses the Row's constructor and error handling logic. This differs from sequential processing where the Row's internal error function handles this validation. Consider whether this approach maintains consistency with the Row class's design.

Copilot uses AI. Check for mistakes.

notifications_utils/recipients.py Outdated

Comment on lines 837 to 843

    
                          # Check if this column has a message too long error

                          formatted_key = Columns.make_key(column_name)

                          if template_content and formatted_key in placeholders_keys and template_type == "sms":

                              try:

                                  validate_sms_message_length(column_value or "", template_content)

                              except ValueError:

                                  has_message_too_long = True

Copilot AI Dec 31, 2025

This message length validation logic is duplicated within the same function (also appears in lines 813-817 within chunk_error_fn). The duplication occurs because validation is needed both for error detection in chunk_error_fn and for setting the has_message_too_long flag. Consider refactoring to avoid this duplication.

Copilot uses AI. Check for mistakes.

notifications_utils/recipients.py

    
                              except Exception as e:

                                  # Log error if Flask app context is available

                                  try:

                                      current_app.logger.error(f"Error processing chunk starting at {chunk_start}: {e}")

Copilot AI Dec 31, 2025

The current_app from Flask is used but never imported in this file. Add from flask import current_app to the imports, or verify if Flask context is guaranteed to be available during parallel processing.

Copilot uses AI. Check for mistakes.

jzbahrai added 2 commits

January 6, 2026 20:34

fix

c1a85dc

fix

71cb7d0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet