-
Notifications
You must be signed in to change notification settings - Fork 8
Trying to make csv processing faster #387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces parallel processing for CSV recipient files to improve performance when handling large SMS datasets. The implementation uses ThreadPoolExecutor to process rows concurrently for SMS files with 1000+ rows, while maintaining sequential processing for smaller datasets and non-SMS templates.
Key Changes:
- Added parallel processing with ThreadPoolExecutor for large SMS datasets (≥1000 rows)
- Refactored
get_rows()to support both sequential and parallel processing paths - Added comprehensive test coverage for parallel processing scenarios including error detection, row ordering, and international numbers
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| notifications_utils/recipients.py | Implements parallel processing logic with new methods _process_single_row, _get_rows_parallel, and worker function _process_row_chunk |
| tests/test_recipient_csv.py | Adds 9 new test cases covering parallel processing behavior, error handling, and edge cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| total_rows = min(len(rows_data), self.max_rows) | ||
|
|
||
| # Determine optimal chunk size and worker count | ||
| chunk_size, max_workers = control_chunk_and_worker_size(data_size=total_rows, chunk_size=None, max_workers=None) |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The control_chunk_and_worker_size function is imported but never defined in this repository. This will cause a runtime error when parallel processing is triggered. Ensure this function is implemented in notifications_utils/parallel.py or imported from the correct module.
notifications_utils/recipients.py
Outdated
| error_row = Row( | ||
| OrderedDict(), | ||
| index=idx, | ||
| error_fn=lambda k, v: None, |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lambda function lambda k, v: None is duplicated in three places (lines 302, 319, and error handling). Consider extracting this into a named function or class constant to improve maintainability and clarify its purpose as a no-op error function.
| self.placeholders_as_column_keys, | ||
| self.template_type, | ||
| self.international_sms, | ||
| self.template.content if self.template else None, |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing template.content alone is insufficient for message length validation. The validate_sms_message_length function expects template_content_length to be the actual content string (not its length), but this naming suggests confusion about what's being passed. Review the validation logic in _process_row_chunk (lines 813-817, 839-843) to ensure it correctly validates message length with just the template content string.
| if has_message_too_long: | ||
| row_obj.message_too_long = True |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message_too_long flag is being set directly on the Row object after creation, which bypasses the Row's constructor and error handling logic. This differs from sequential processing where the Row's internal error function handles this validation. Consider whether this approach maintains consistency with the Row class's design.
notifications_utils/recipients.py
Outdated
| # Check if this column has a message too long error | ||
| formatted_key = Columns.make_key(column_name) | ||
| if template_content and formatted_key in placeholders_keys and template_type == "sms": | ||
| try: | ||
| validate_sms_message_length(column_value or "", template_content) | ||
| except ValueError: | ||
| has_message_too_long = True |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message length validation logic is duplicated within the same function (also appears in lines 813-817 within chunk_error_fn). The duplication occurs because validation is needed both for error detection in chunk_error_fn and for setting the has_message_too_long flag. Consider refactoring to avoid this duplication.
| except Exception as e: | ||
| # Log error if Flask app context is available | ||
| try: | ||
| current_app.logger.error(f"Error processing chunk starting at {chunk_start}: {e}") |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current_app from Flask is used but never imported in this file. Add from flask import current_app to the imports, or verify if Flask context is guaranteed to be available during parallel processing.
Summary | Résumé
TODO: 1-3 sentence description of the changed you're proposing.
Related Issues | Cartes liées
Test instructions | Instructions pour tester la modification
TODO: Fill in test instructions for the reviewer.
Release Instructions | Instructions pour le déploiement
None.
Reviewer checklist | Liste de vérification du réviseur