Skip to content

Conversation

@ahmedxgouda
Copy link
Collaborator

Proposed change

Resolves #3034

Add the PR description here.

Checklist

  • Required: I read and followed the contributing guidelines
  • Required: I ran make check-test locally and all tests passed
  • I used AI for code, documentation, or tests in this PR

@ahmedxgouda ahmedxgouda linked an issue Dec 25, 2025 that may be closed by this pull request
2 tasks
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 25, 2025

Summary by CodeRabbit

  • New Features

    • Integrated Redis caching services into E2E and fuzz testing environments for improved test performance and reliability.
  • Chores

    • Updated CI/CD workflows to include cache services in test execution pipelines.
    • Updated testing environment configurations to support Redis caching infrastructure.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

This PR integrates Redis caching into e2e and fuzz testing environments. It adds Redis 8.0.5-alpine3.21 services to Docker Compose configurations for both environments, updates GitHub Actions workflows with corresponding Redis services, configures environment-specific Redis credentials, removes in-memory cache configurations from Django settings, and eliminates the setup-e2e-environment action while updating Makefiles to orchestrate the cache service alongside other containers.

Changes

Cohort / File(s) Change Summary
CI/CD Workflow Configuration
.github/workflows/run-ci-cd.yaml, .github/workflows/setup-e2e-environment/action.yaml
Adds Redis cache service (redis:8.0.5-alpine3.21) with environment-specific passwords and health checks to frontend e2e and fuzz test jobs. Deletes the composite action that previously handled database setup, backend image build, and health probing.
Docker Compose Services
docker-compose/e2e/compose.yaml, docker-compose/fuzz/compose.yaml
Introduces new Redis cache service with password, health checks, and volume persistence. Updates relative paths for build contexts and volume mounts; adjusts data paths from ../backend to ../../backend in fuzz compose. Adds corresponding volume definitions.
Django Cache Configuration
backend/settings/e2e.py, backend/settings/fuzz.py
Removes explicit CACHES configuration blocks (LocMemCache) from both e2e and fuzz settings, delegating cache configuration to environment-level setup.
Environment Configuration
backend/.env.e2e.example, backend/.env.fuzz.example
Updates Redis connection variables: DJANGO_REDIS_HOST from None to cache and DJANGO_REDIS_PASSWORD to environment-specific passwords (nest-cache-e2e-password, nest-fuzz-cache-password).
Build Orchestration
backend/Makefile, frontend/Makefile
Updates docker-compose targets to reference new nested compose.yaml files and explicitly include cache service in container orchestration for e2e and fuzz test runs.
API Environment Customization
backend/apps/api/rest/v0/__init__.py
Adds fuzz environment detection block that disables authentication, configures fuzz server description, and removes throttling, mirroring existing e2e/staging patterns.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

docker

Suggested reviewers

  • kasya
  • arkid15r

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main change: adding Redis cache services to e2e and fuzz test environments.
Description check ✅ Passed The PR description references issue #3034 and confirms the contributor followed guidelines and ran tests locally, matching the changeset scope.
Linked Issues check ✅ Passed The PR implements the core requirement from #3034: integrating Redis cache into e2e and fuzz test environments across docker-compose files and CI/CD workflows.
Out of Scope Changes check ✅ Passed All changes are directly scoped to adding Redis cache support; no unrelated modifications to application logic, business rules, or unrelated features are present.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f9edbb and 2256ea4.

📒 Files selected for processing (4)
  • backend/Makefile
  • docker-compose/e2e/compose.yaml
  • docker-compose/fuzz/compose.yaml
  • frontend/Makefile
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/Makefile
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 0
File: :0-0
Timestamp: 2025-12-30T15:05:23.146Z
Learning: In the OWASP/Nest repository, four functions in backend/apps/slack/utils.py (get_gsoc_projects, get_news_data, get_staff_data, get_posts_data) have unbounded lru_cache decorators that can cause memory accumulation during fuzz testing. These should have explicit maxsize limits (e.g., maxsize=16) to prevent OOM issues.
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.
📚 Learning: 2025-12-26T06:57:19.911Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: docker-compose/fuzz.yaml:70-83
Timestamp: 2025-12-26T06:57:19.911Z
Learning: In the fuzz testing environment (docker-compose/fuzz.yaml), the data-loader and graphql services are orchestrated in separate phases via the Makefile's test-fuzz target, not run simultaneously. The data is loaded to completion first, then the graphql fuzzer is started separately against the populated database. The docker-compose dependencies reflect runtime health checks rather than orchestration order.

Applied to files:

  • backend/Makefile
  • docker-compose/fuzz/compose.yaml
📚 Learning: 2025-10-26T12:50:50.512Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.

Applied to files:

  • backend/Makefile
📚 Learning: 2025-12-26T06:08:58.549Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.

Applied to files:

  • backend/Makefile
📚 Learning: 2025-12-26T06:08:58.549Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: Ensure Redis image versions stay in sync across all environments by updating every relevant YAML file together (docker-compose files and CI/CD workflow configurations). When upgrading Redis, bump the image version in all docker-compose files and in all CI workflow YAMLs in one coordinated change to maintain parity across production, staging, local, E2E, and CI tests.

Applied to files:

  • docker-compose/e2e/compose.yaml
  • docker-compose/fuzz/compose.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Run fuzz tests
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (8)
backend/Makefile (2)

133-139: LGTM! Cache service correctly added to e2e and fuzz targets.

The run-backend-e2e and run-backend-fuzz targets now properly include the cache service alongside backend and db, ensuring Redis is available during these test runs. The path update to the new directory structure (docker-compose/e2e/compose.yaml and docker-compose/fuzz/compose.yaml) is consistent.


167-173: LGTM! Both test-fuzz phases now include the cache service.

The cache service is correctly included in both orchestration phases:

  1. First phase (data loading): db cache backend data-loader
  2. Second phase (fuzzing): db cache backend graphql

This ensures Redis is available during data loading (when clear_index_cache() may be triggered by signals) and during the actual fuzz testing.

docker-compose/e2e/compose.yaml (3)

10-17: LGTM! Backend service correctly depends on cache health.

The backend now waits for both db and cache services to be healthy before starting. This addresses the root cause identified in Issue #3034 where clear_index_cache() could fail if Redis wasn't ready during data loading.


92-112: LGTM! Path updates and volume declarations are correct.

The e2e-tests service paths are correctly updated for the new directory structure, and the e2e-cache-data volume is properly declared.


73-91: Cache service configuration is correct and Redis version is consistent across all environments.

The cache service is well-configured with:

  • Memory limit (--maxmemory 100mb) to prevent unbounded growth
  • LRU eviction policy appropriate for caching
  • Password authentication
  • Proper healthcheck

Redis version redis:8.0.5-alpine3.21 is consistent across all environments:

  • Local: docker-compose/local/compose.yaml
  • Production: docker-compose/production/compose.yaml
  • Staging: docker-compose/staging/compose.yaml
  • E2E: docker-compose/e2e/compose.yaml
  • Fuzz: docker-compose/fuzz/compose.yaml
  • CI/CD: .github/workflows/run-ci-cd.yaml
docker-compose/fuzz/compose.yaml (3)

10-17: LGTM! Backend correctly depends on cache for fuzz environment.

The cache dependency ensures Redis is healthy before the backend starts, addressing the crash issue where clear_index_cache() failed during data loading when Redis wasn't available.


92-112: LGTM! Path updates and volume declaration are correct.

The graphql service paths are correctly updated for the new directory structure (../../backend/docker and ../../backend/fuzzing_results), and the fuzz-cache-data volume is properly declared.


72-91: Verify Redis version consistency and password alignment across all environments.

The cache service configuration appears appropriate with memory limits and LRU eviction. However, confirm: (1) the default password nest-fuzz-cache-password matches DJANGO_REDIS_PASSWORD in backend/.env.fuzz.example, and (2) Redis version 8.0.5-alpine3.21 is consistent across all docker-compose files (production, staging, local, E2E) and CI/CD workflow configurations to maintain environment parity.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de86aab and 903f17c.

📒 Files selected for processing (4)
  • .github/workflows/run-ci-cd.yaml
  • backend/.env.e2e.example
  • backend/settings/e2e.py
  • docker-compose/e2e.yaml
💤 Files with no reviewable changes (1)
  • backend/settings/e2e.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-21T19:03:59.068Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: .github/workflows/setup-backend-environment/action.yaml:16-27
Timestamp: 2025-12-21T19:03:59.068Z
Learning: Composite actions (runs: using: composite) execute as steps within the calling job's context and can access the job context, including job.services.* properties (e.g., job.services.<service_id>.id, job.services.<service_id>.ports). Service containers must be defined at the job level, but a composite action's steps can reference them via the job context.

Applied to files:

  • .github/workflows/run-ci-cd.yaml
🪛 dotenv-linter (4.0.0)
backend/.env.e2e.example

[warning] 16-16: [UnorderedKey] The DJANGO_REDIS_HOST key should go before the DJANGO_SETTINGS_MODULE key

(UnorderedKey)


[warning] 17-17: [UnorderedKey] The DJANGO_REDIS_PASSWORD key should go before the DJANGO_SETTINGS_MODULE key

(UnorderedKey)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (3)
docker-compose/e2e.yaml (3)

14-15: LGTM!

The cache dependency with service_healthy condition ensures the backend waits for Redis to be fully operational before starting, which is the correct approach.


49-66: Well-configured Redis service for E2E testing.

The cache service configuration demonstrates good practices:

  • Memory limit (100mb) with allkeys-lru eviction policy prevents unbounded memory growth
  • Health check ensures service readiness before dependent services start
  • Data volume provides persistence across container restarts
  • Network-only access (no exposed ports) maintains isolation

89-89: LGTM!

The volume definition correctly supports Redis data persistence.

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 26, 2025
@arkid15r arkid15r dismissed coderabbitai[bot]’s stale review December 29, 2025 19:51

The merge-base changed after approval.

@arkid15r arkid15r force-pushed the feature/e2e-backend branch from de86aab to 0159b0d Compare December 29, 2025 19:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0728fdc and 4521dac.

📒 Files selected for processing (6)
  • .github/workflows/run-ci-cd.yaml
  • backend/.env.e2e.example
  • backend/Makefile
  • backend/settings/e2e.py
  • docker-compose/e2e.yaml
  • frontend/Makefile
💤 Files with no reviewable changes (1)
  • backend/settings/e2e.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/Makefile
  • .github/workflows/run-ci-cd.yaml
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.
📚 Learning: 2025-12-26T06:08:58.549Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: Ensure Redis image versions stay in sync across all environments by updating every relevant YAML file together (docker-compose files and CI/CD workflow configurations). When upgrading Redis, bump the image version in all docker-compose files and in all CI workflow YAMLs in one coordinated change to maintain parity across production, staging, local, E2E, and CI tests.

Applied to files:

  • docker-compose/e2e.yaml
📚 Learning: 2025-10-26T12:50:50.512Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.

Applied to files:

  • frontend/Makefile
📚 Learning: 2025-12-26T06:57:19.911Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: docker-compose/fuzz.yaml:70-83
Timestamp: 2025-12-26T06:57:19.911Z
Learning: In the fuzz testing environment (docker-compose/fuzz.yaml), the data-loader and graphql services are orchestrated in separate phases via the Makefile's test-fuzz target, not run simultaneously. The data is loaded to completion first, then the graphql fuzzer is started separately against the populated database. The docker-compose dependencies reflect runtime health checks rather than orchestration order.

Applied to files:

  • frontend/Makefile
📚 Learning: 2025-12-26T06:08:58.549Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.

Applied to files:

  • frontend/Makefile
  • backend/.env.e2e.example
🪛 dotenv-linter (4.0.0)
backend/.env.e2e.example

[warning] 16-16: [UnorderedKey] The DJANGO_REDIS_HOST key should go before the DJANGO_SETTINGS_MODULE key

(UnorderedKey)


[warning] 17-17: [UnorderedKey] The DJANGO_REDIS_PASSWORD key should go before the DJANGO_SETTINGS_MODULE key

(UnorderedKey)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run fuzz tests
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (4)
backend/.env.e2e.example (1)

16-17: LGTM! Password mismatch resolved.

The Redis configuration correctly points to the cache service and uses the password that matches the CI/CD workflow configuration. The critical password mismatch flagged in the previous review appears to be resolved.

frontend/Makefile (1)

63-63: LGTM! Cache service properly integrated.

The cache service is correctly added to both docker-compose commands in the E2E test workflow. This ensures Redis is available during both the data-loading phase and the test-execution phase.

Also applies to: 65-65

docker-compose/e2e.yaml (2)

15-16: LGTM! Backend cache dependency configured correctly.

The backend service properly depends on the cache service with a service_healthy condition, ensuring Redis is ready before the backend starts.


112-112: LGTM! Cache volume properly declared.

The e2e-cache-data volume declaration correctly supports the cache service's persistent storage requirements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docker-compose/fuzz.yaml (1)

2-28: Critical: Backend missing dependency on cache service.

The backend service is configured to use Redis (via DJANGO_REDIS_HOST=cache in backend/.env.fuzz.example), but does not declare a dependency on the cache service. This creates a race condition where the backend may attempt to connect to Redis before it's ready, causing startup failures.

🔎 Proposed fix to add cache dependency
     depends_on:
       db:
         condition: service_healthy
+      cache:
+        condition: service_healthy

This mirrors the correct pattern in docker-compose/e2e.yaml (lines 15-16).

🧹 Nitpick comments (1)
backend/.env.fuzz.example (1)

16-17: LGTM! Redis configuration aligns with the fuzz cache service.

The Redis host and password values correctly match the cache service defined in docker-compose/fuzz.yaml.


Optional: Consider alphabetical ordering of environment variables.

The static analysis tool suggests placing these Redis variables before DJANGO_SETTINGS_MODULE to maintain alphabetical order. This is a minor style convention.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4521dac and f667727.

📒 Files selected for processing (8)
  • .github/workflows/run-ci-cd.yaml
  • .github/workflows/setup-e2e-environment/action.yaml
  • backend/.env.fuzz.example
  • backend/Makefile
  • backend/apps/api/rest/v0/__init__.py
  • backend/settings/fuzz.py
  • docker-compose/e2e.yaml
  • docker-compose/fuzz.yaml
💤 Files with no reviewable changes (2)
  • .github/workflows/setup-e2e-environment/action.yaml
  • backend/settings/fuzz.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/Makefile
  • .github/workflows/run-ci-cd.yaml
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.
📚 Learning: 2025-12-26T06:08:58.549Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.

Applied to files:

  • backend/.env.fuzz.example
📚 Learning: 2025-12-26T06:08:58.549Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: Ensure Redis image versions stay in sync across all environments by updating every relevant YAML file together (docker-compose files and CI/CD workflow configurations). When upgrading Redis, bump the image version in all docker-compose files and in all CI workflow YAMLs in one coordinated change to maintain parity across production, staging, local, E2E, and CI tests.

Applied to files:

  • docker-compose/e2e.yaml
  • docker-compose/fuzz.yaml
📚 Learning: 2025-12-26T06:57:19.911Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: docker-compose/fuzz.yaml:70-83
Timestamp: 2025-12-26T06:57:19.911Z
Learning: In docker-compose/fuzz.yaml, document and enforce that the data-loader and graphql services are started in separate phases as defined by the Makefile target test-fuzz: first load data to completion, then start the graphql fuzzer against the populated database. Do not rely on orchestration order from docker-compose dependencies; instead ensure the Makefile orchestrates the phased startup and that health checks reflect runtime readiness rather than enforcing a start order. This guideline is specific to this file and to the fuzz testing setup in the OWASP/Nest repository.

Applied to files:

  • docker-compose/fuzz.yaml
🪛 dotenv-linter (4.0.0)
backend/.env.fuzz.example

[warning] 16-16: [UnorderedKey] The DJANGO_REDIS_HOST key should go before the DJANGO_SETTINGS_MODULE key

(UnorderedKey)


[warning] 17-17: [UnorderedKey] The DJANGO_REDIS_PASSWORD key should go before the DJANGO_SETTINGS_MODULE key

(UnorderedKey)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run fuzz tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (4)
docker-compose/e2e.yaml (2)

15-16: LGTM! Backend properly depends on cache availability.

The depends_on with service_healthy condition ensures Redis is ready before the backend starts, preventing connection failures.


73-91: LGTM! Cache service configuration is complete and correct.

The Redis cache service is properly configured with:

  • Consistent Redis version (8.0.5) matching other environments as per learnings
  • Password protection with healthcheck authentication
  • Memory limits and eviction policy (100MB, allkeys-lru)
  • Dedicated volume for persistence

Previous concerns about password mismatches and version inconsistencies have been addressed.

Also applies to: 112-112

docker-compose/fuzz.yaml (1)

71-89: LGTM! Cache service configuration is correct and consistent.

The Redis cache service is properly configured with:

  • Consistent Redis version (8.0.5) matching the e2e environment
  • Password matching backend/.env.fuzz.example
  • Memory limits and eviction policy
  • Dedicated volume for persistence

The configuration correctly mirrors the e2e environment pattern.

Also applies to: 110-110

backend/apps/api/rest/v0/__init__.py (1)

71-81: LGTM! Fuzz environment API configuration is correct.

The fuzz environment configuration appropriately:

  • Disables authentication and throttling for automated testing
  • Sets a descriptive server URL
  • Mirrors the established E2E environment pattern

This implementation is consistent with the overall approach for test environments.

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 30, 2025
@ahmedxgouda ahmedxgouda changed the title Add redis cache to e2e tests Add redis cache to e2e and fuzz tests Dec 30, 2025
coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 30, 2025
@ahmedxgouda
Copy link
Collaborator Author

@coderabbitai The backend system goes down after some fuzz requests. This is new. It didn't happen before the last update to the feature branch to be up to date with the main branch. Could you find the problem and suggest fixes?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# First, let's see the recent merge commits from main branch
echo "=== Recent merge commits from main branch ==="
git log --oneline --merges -10 feature/e2e-backend

echo -e "\n=== Recent commits on feature/e2e-backend ==="
git log --oneline -20 feature/e2e-backend

echo -e "\n=== Check if there's a merge from main recently ==="
git log --oneline --grep="Merge.*main" -5 feature/e2e-backend

Length of output: 745


🏁 Script executed:

#!/bin/bash

# Check the fuzz settings file
echo "=== Current fuzz settings (backend/settings/fuzz.py) ==="
cat backend/settings/fuzz.py

echo -e "\n=== Check base settings for cache configuration ==="
cat backend/settings/base.py | grep -A 20 "CACHES"

Length of output: 972


🏁 Script executed:

#!/bin/bash

# Check how Redis is being used in the backend
echo "=== Search for Redis connection/usage in backend ==="
rg -n --type=py -C 3 "redis" backend/ -i | head -100

echo -e "\n=== Check for cache imports and usage ==="
rg -n --type=py -C 3 "django.core.cache" backend/ | head -50

Length of output: 5510


🏁 Script executed:

#!/bin/bash

# Check the docker-compose fuzz configuration
echo "=== Fuzz docker-compose configuration ==="
cat docker-compose/fuzz.yaml

echo -e "\n=== Fuzz environment variables ==="
cat backend/.env.fuzz.example

Length of output: 3586


🏁 Script executed:

#!/bin/bash

# Look for any middleware or signal handlers that might cause issues
echo "=== Check for middleware configuration ==="
rg -n --type=py "MIDDLEWARE" backend/settings/ -A 10

echo -e "\n=== Check for any database or cache related signals ==="
rg -n --type=py "django.db.backends" backend/ -C 3

Length of output: 2075


🏁 Script executed:

#!/bin/bash

# Check if backend depends on cache in fuzz.yaml vs e2e.yaml
echo "=== Backend dependencies in fuzz.yaml ==="
cat docker-compose/fuzz.yaml | grep -A 10 "backend:" | grep -A 5 "depends_on"

echo -e "\n=== Backend dependencies in e2e.yaml ==="
cat docker-compose/e2e.yaml | grep -A 10 "backend:" | grep -A 5 "depends_on"

Length of output: 740


🏁 Script executed:

#!/bin/bash

# Let's also check when Django tries to connect to Redis
echo "=== Check for Redis connection on startup ==="
rg -n --type=py "RedisCache" backend/ -C 5

echo -e "\n=== Check if cache is used during app initialization ==="
rg -n --type=py "ready\(" backend/apps/ -A 10 | head -80

Length of output: 1822


🏁 Script executed:

#!/bin/bash

# Check the Makefile to see how fuzz tests are run
echo "=== Fuzz test commands in Makefile ==="
cat backend/Makefile | grep -A 10 "test-fuzz"

Length of output: 621


🏁 Script executed:

#!/bin/bash

# Let's check if there's any cache usage in signal handlers or app ready methods
echo "=== Search for AppConfig ready methods ==="
fd -e py "apps.py" backend/apps/ --exec cat {}

echo -e "\n=== Check for signal registrations ==="
rg -n --type=py "signals" backend/apps/ -C 3 | head -100

Length of output: 4092


🏁 Script executed:

#!/bin/bash

# Check what the mentorship signals do and if they use cache
echo "=== Mentorship signals that are auto-loaded ==="
cat backend/apps/mentorship/signals/__init__.py

echo -e "\n=== Program signal implementation ==="
cat backend/apps/mentorship/signals/program.py

echo -e "\n=== Check clear_index_cache function ==="
cat backend/apps/core/utils/index.py | grep -A 20 "def clear_index_cache"

Length of output: 1653


🏁 Script executed:

#!/bin/bash

# Let's also check if there are other signals
echo "=== All signal files in mentorship ==="
fd . backend/apps/mentorship/signals/ --type f

echo -e "\n=== Content of all signal files ==="
for file in backend/apps/mentorship/signals/*.py; do
  echo "=== File: $file ==="
  cat "$file"
  echo -e "\n"
done

Length of output: 1178


🏁 Script executed:

#!/bin/bash

# Let me verify the exact dependencies in both files
echo "=== Full backend service definition in fuzz.yaml ==="
cat docker-compose/fuzz.yaml | sed -n '/^  backend:/,/^  [^ ]/p' | head -n -1

echo -e "\n=== Full backend service definition in e2e.yaml ==="
cat docker-compose/e2e.yaml | sed -n '/^  backend:/,/^  [^ ]/p' | head -n -1

Length of output: 1364


🏁 Script executed:

#!/bin/bash

# Check if cache.iter_keys is used and requires Redis connection
echo "=== Full clear_index_cache function ==="
cat backend/apps/core/utils/index.py | grep -A 30 "def clear_index_cache"

echo -e "\n=== Check if iter_keys is a django-redis specific method ==="
rg -n "iter_keys" backend/ --type=py

Length of output: 1010


🏁 Script executed:

#!/bin/bash

# Check when data-loader runs and if it could trigger Program saves
echo "=== Data loader in fuzz test ==="
cat docker-compose/fuzz.yaml | sed -n '/^  data-loader:/,/^  [^ ]/p' | head -n -1

echo -e "\n=== Check if Program model exists in the data dump ==="
rg -i "program" backend/data/ 2>/dev/null || echo "Cannot search in data directory (binary files)"

Length of output: 811


@ahmedxgouda I found the problem! The backend is crashing because of a missing dependency in docker-compose/fuzz.yaml.

Root Cause

After merging from main, the mentorship app now includes a signal handler that gets triggered when Program models are saved. This signal calls clear_index_cache(), which uses cache.iter_keys() - a Redis-specific operation that requires an active Redis connection.

Here's the issue in your docker-compose configuration:

In docker-compose/fuzz.yaml (INCORRECT):

  backend:
    depends_on:
      db:
        condition: service_healthy
    # ❌ Missing cache dependency!

In docker-compose/e2e.yaml (CORRECT):

  backend:
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_healthy  # ✅ Cache dependency present

When the data loader runs pg_restore to load the database dump, it triggers post_save signals for Program models. The signal handler tries to clear the cache, but if Redis isn't ready yet (because backend didn't wait for it), the cache.iter_keys() call fails and crashes the backend.

Fix

Update docker-compose/fuzz.yaml to add the cache dependency to the backend service:

  backend:
    container_name: fuzz-nest-backend
    command: >
      sh -c '
        python manage.py migrate &&
        gunicorn wsgi:application --bind 0.0.0.0:9500
      '
    build:
      context: ../backend
      dockerfile: docker/Dockerfile
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_healthy  # Add this
    env_file: ../backend/.env.fuzz.example
    networks:
      - fuzz-nest-network
    ports:
      - 9500:9500
    healthcheck:
      interval: 10s
      retries: 10
      test: >
        sh -c '
          wget --spider http://backend:9500/a/
        '
      timeout: 10s
      start_period: 5s

This ensures the backend waits for Redis to be healthy before starting, preventing connection failures when the signal handlers try to access the cache during data loading.


🧠 Learnings used
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.

@ahmedxgouda
Copy link
Collaborator Author

@coderabbitai It is not related. The backend goes down also in ci/cd, and I added the dependency and it goes down after some fuzz requests with this error:
fuzz-nest-backend | File "/home/owasp/.venv/lib/python3.13/site-packages/gunicorn/workers/base.py", line 204, in handle_abort
fuzz-nest-backend | sys.exit(1)
fuzz-nest-backend | ~~~~~~~~^^^
fuzz-nest-backend | SystemExit: 1
fuzz-nest-backend | [2025-12-30 14:28:58 +0000] [59] [INFO] Worker exiting (pid: 59)
fuzz-nest-backend | [2025-12-30 14:28:59 +0000] [58] [ERROR] Worker (pid:59) was sent SIGKILL! Perhaps out of memory?
fuzz-nest-backend | [2025-12-30 14:28:59 +0000] [141] [INFO] Booting worker with pid: 141

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain



🏁 Script executed:

#!/bin/bash

# Check the base settings for Redis cache configuration
echo "=== Base settings cache configuration ==="
cat backend/settings/base.py | grep -A 30 "CACHES"

echo -e "\n=== Check Gunicorn configuration ==="
cat backend/wsgi.py
fd -e py "gunicorn" backend/ --exec cat {}

echo -e "\n=== Check for gunicorn config file ==="
fd "gunicorn" backend/

echo -e "\n=== Check docker-compose for gunicorn command ==="
cat docker-compose/fuzz.yaml | grep -A 5 "gunicorn"
cat docker-compose/e2e.yaml | grep -A 5 "gunicorn"

Length of output: 1744


🏁 Script executed:

#!/bin/bash

# Check recent changes merged from main
echo "=== Get the merge commit details ==="
git log --oneline --graph --decorate -15 feature/e2e-backend

echo -e "\n=== Show commits that were merged from main ==="
# Find the most recent merge commit
MERGE_COMMIT=$(git log --merges --oneline -1 feature/e2e-backend | awk '{print $1}')
echo "Most recent merge commit: $MERGE_COMMIT"

if [ ! -z "$MERGE_COMMIT" ]; then
  echo -e "\n=== Files changed in the merge ==="
  git show --stat $MERGE_COMMIT | head -50
fi

Length of output: 512


🏁 Script executed:

#!/bin/bash

# Check Redis connection options and see if there are connection pooling settings
echo "=== Search for Redis connection pool or connection settings ==="
rg -n --type=py -C 5 "CONNECTION_POOL|REDIS.*OPTIONS|RedisCache.*OPTIONS" backend/

echo -e "\n=== Check if django-redis is in requirements ==="
cat backend/requirements/base.txt | grep -i redis
cat backend/requirements/test.txt | grep -i redis 2>/dev/null || echo "No test.txt or no redis in test.txt"

Length of output: 273


🏁 Script executed:

#!/bin/bash

# Check for any infinite loops, recursive operations, or memory-intensive operations
echo "=== Check for cache.iter_keys usage which might load all keys ==="
rg -n --type=py -C 10 "iter_keys" backend/

echo -e "\n=== Check for any QuerySet operations without limits in views ==="
rg -n --type=py -C 3 "\.all\(\)" backend/apps/api/ | head -80

echo -e "\n=== Check middleware that might accumulate data ==="
cat backend/settings/base.py | grep -A 20 "MIDDLEWARE"

Length of output: 2783







🏁 Script executed:

#!/bin/bash

# Find and check requirements files
echo "=== Find requirements files ==="
fd requirements backend/

echo -e "\n=== Check main requirements ==="
fd -e txt requirements backend/ --exec cat {}

Length of output: 894


🏁 Script executed:

#!/bin/bash

# Check what cache operations are being performed during requests
echo "=== Find all cache.set operations ==="
rg -n --type=py -C 3 "cache\.set" backend/

echo -e "\n=== Check for any batch operations or caching in views ==="
rg -n --type=py -C 3 "cache\." backend/apps/api/rest/ | head -100

Length of output: 5344


🏁 Script executed:

#!/bin/bash

# Check memory limits and resource constraints
echo "=== Check Docker memory limits in fuzz.yaml ==="
cat docker-compose/fuzz.yaml | grep -i "mem"

echo -e "\n=== Check Redis maxmemory settings ==="
cat docker-compose/fuzz.yaml | grep -A 2 "redis-server"

echo -e "\n=== Check backend container resources ==="
cat docker-compose/fuzz.yaml | sed -n '/^  backend:/,/^  [^ ]/p' | head -40

Length of output: 998


🏁 Script executed:

#!/bin/bash

# Check if there's a gunicorn configuration file or settings
echo "=== Look for gunicorn configuration ==="
fd gunicorn backend/
fd -e conf . backend/
fd -e py . backend/docker/

echo -e "\n=== Check backend Dockerfile for gunicorn setup ==="
cat backend/docker/Dockerfile | grep -i gunicorn

echo -e "\n=== Check if there are environment variables for gunicorn ==="
cat backend/.env.fuzz.example | grep -i gunicorn
cat backend/.env.fuzz.example | grep -i worker

Length of output: 155


🏁 Script executed:

#!/bin/bash

# Check the data loader to see what kind of data is being loaded
echo "=== Check fuzz data loader ==="
cat docker-compose/fuzz.yaml | sed -n '/^  data-loader:/,/^  [^ ]/p'

echo -e "\n=== Check if there are many Program models being created ==="
rg -n --type=py "class Program" backend/apps/mentorship/

echo -e "\n=== Check the signal that triggers on Program save ==="
cat backend/apps/mentorship/signals/program.py

Length of output: 2550







🏁 Script executed:

#!/bin/bash

# Check the Dockerfile to understand the setup
echo "=== Backend Dockerfile ==="
cat backend/docker/Dockerfile

echo -e "\n=== Check Python packages installed ==="
fd -e txt . backend/ | grep -i req | head -5
cat backend/requirements.txt 2>/dev/null || echo "No requirements.txt in root"

Length of output: 1827


🏁 Script executed:

#!/bin/bash

# Check how much data is in the dump and how many Programs might exist
echo "=== Check data dump size ==="
ls -lh backend/data/

echo -e "\n=== Check if there's info about the data ==="
fd README backend/data/
fd -e md . backend/data/

echo -e "\n=== Count approximate Program records by searching for 'program' in SQL ==="
# This might not work for binary dumps, but let's try
file backend/data/nest.dump

Length of output: 331


🏁 Script executed:

#!/bin/bash

# Check if there are Gunicorn configuration defaults or if we need to add them
echo "=== Search for any gunicorn config in code ==="
rg -n "workers|worker-class|worker-tmp-dir|max-requests" backend/

echo -e "\n=== Check if there's a default gunicorn config in Django ==="
cat backend/wsgi.py

echo -e "\n=== Check startup scripts or entrypoints ==="
fd entrypoint backend/
fd start backend/

Length of output: 541


🏁 Script executed:

#!/bin/bash

# Check what happens during cache.iter_keys - is it loading ALL keys?
echo "=== Full clear_index_cache implementation ==="
cat backend/apps/core/utils/index.py | grep -B 5 -A 25 "def clear_index_cache"

echo -e "\n=== Check CACHE_PREFIX value ==="
rg -n "CACHE_PREFIX.*=" backend/apps/core/utils/index.py -B 2 -A 2

echo -e "\n=== Search for how many times this might be called ==="
rg -n "clear_index_cache" backend/apps/ --type=py | head -20

Length of output: 1264


🏁 Script executed:

#!/bin/bash

# Check if there are other signals that might compound the issue
echo "=== Find all post_save signal handlers ==="
rg -n "post_save.*receiver|@receiver.*post_save" backend/apps/ --type=py -B 2 -A 5 | head -100

echo -e "\n=== Check how many signal handlers auto-load ==="
fd "__init__.py" backend/apps/*/signals/ --exec cat {}

Length of output: 886






🏁 Script executed:

#!/bin/bash

# Check if there's a CACHE_PREFIX defined
echo "=== Search for CACHE_PREFIX definition ==="
rg -n "CACHE_PREFIX" backend/ --type=py -C 3

echo -e "\n=== Check if the signal is being triggered during pg_restore ==="
cat backend/apps/mentorship/signals/program.py

Length of output: 5158


🏁 Script executed:

#!/bin/bash

# Check the poetry/pyproject to see gunicorn version and settings
echo "=== Check pyproject.toml for gunicorn ==="
cat backend/pyproject.toml | grep -A 5 -B 5 gunicorn

echo -e "\n=== Check if there's a gunicorn config file ==="
fd gunicorn backend/docker/
cat backend/docker/entrypoint.sh

Length of output: 525


🏁 Script executed:

#!/bin/bash

# Estimate how many Programs might be in the dump
echo "=== Try to get info about the dump ==="
head -c 1000 backend/data/nest.dump | strings | head -20

echo -e "\n=== Check if there are performance settings in base.py ==="
cat backend/settings/base.py | grep -i "conn" | head -20

Length of output: 144


🏁 Script executed:

#!/bin/bash

# Check how django-redis iter_keys works and if it loads everything
echo "=== Check poetry.lock for django-redis version ==="
cat backend/poetry.lock | grep -A 10 "name = \"django-redis\""

echo -e "\n=== Check if there's documentation about connection pool ==="
rg -n "CONNECTION_POOL_KWARGS|SOCKET_CONNECT_TIMEOUT|SOCKET_TIMEOUT" backend/

Length of output: 546






🏁 Script executed:

#!/bin/bash

# Check if there are management commands or scripts that might disable signals
echo "=== Search for signal management ==="
rg -n "disconnect|signals_disabled|receiver_connected" backend/ --type=py -C 3

echo -e "\n=== Check the data loading process ==="
cat docker-compose/fuzz.yaml | sed -n '/^  data-loader:/,/^  [^ ]/p'

echo -e "\n=== Check if Django has signal disabling context manager ==="
rg -n "signals" backend/apps/*/management/commands/ --type=py | head -20

Length of output: 820


🏁 Script executed:

#!/bin/bash

# Let's understand the problem better - check default gunicorn settings
echo "=== Check gunicorn defaults ==="
echo "Default gunicorn workers: 1 (single worker)"
echo "Default worker class: sync"
echo "Default timeout: 30 seconds"
echo "Default max-requests: 0 (no recycling)"

echo -e "\n=== Check docker compose commands more carefully ==="
cat docker-compose/fuzz.yaml | grep -B 5 -A 15 "gunicorn"

Length of output: 700


🏁 Script executed:

#!/bin/bash

# Check if there's fuzz-specific entrypoint that might differ
echo "=== Check fuzz entrypoint ==="
cat backend/docker/entrypoint.fuzz.sh

echo -e "\n=== Check if it's being used ==="
cat docker-compose/fuzz.yaml | grep "entrypoint"

Length of output: 854


🏁 Script executed:

#!/bin/bash

# Let's also check the CI/CD workflow to see gunicorn config there
echo "=== Check CI/CD gunicorn configuration ==="
cat .github/workflows/run-ci-cd.yaml | grep -A 10 -B 5 "gunicorn"

Length of output: 1047


@ahmedxgouda I've found the root cause of the OOM issue! It's a combination of problems introduced by the recent merge from main:

Root Cause AnalysisPerfect! The web search confirms the issue. Here are the three problems causing the OOM:

  1. No Worker Recycling: Gunicorn's max-requests setting limits the number of requests a worker will process before automatically restarting, which is a simple method to help limit the damage of memory leaks. Your configuration uses the default (0), meaning workers never restart and memory accumulates indefinitely.

  2. Inefficient Cache Clearing: The clear_index_cache() function does list(cache.iter_keys(pattern)), which loads ALL matching cache keys into memory at once. During fuzz testing with many requests, this repeatedly loads keys into memory.

  3. Single Worker Under Heavy Load: The default is 1 worker, which handles all fuzz requests without any parallelization.

Complete Solution

Here are the fixes to apply across all environments:

1. Update docker-compose/fuzz.yaml - Add Gunicorn configuration:

  backend:
    container_name: fuzz-nest-backend
    command: >
      sh -c '
        python manage.py migrate &&
        gunicorn wsgi:application --bind 0.0.0.0:9500 --workers 2 --max-requests 1000 --max-requests-jitter 50 --worker-tmp-dir /dev/shm
      '
    build:
      context: ../backend
      dockerfile: docker/Dockerfile
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_healthy
    env_file: ../backend/.env.fuzz.example
    networks:
      - fuzz-nest-network
    ports:
      - 9500:9500
    healthcheck:
      interval: 10s
      retries: 10
      test: >
        sh -c '
          wget --spider http://backend:9500/a/
        '
      timeout: 10s
      start_period: 5s

2. Update docker-compose/e2e.yaml - Same configuration:

  backend:
    container_name: e2e-nest-backend
    command: >
      sh -c '
        python manage.py migrate &&
        gunicorn wsgi:application --bind 0.0.0.0:9000 --workers 2 --max-requests 1000 --max-requests-jitter 50 --worker-tmp-dir /dev/shm
      '
    build:
      context: ../backend
      dockerfile: docker/Dockerfile
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_healthy
    env_file: ../backend/.env.e2e.example
    networks:
      - e2e-nest-network
    ports:
      - 9000:9000
    healthcheck:
      interval: 10s
      retries: 10
      test: >
        sh -c '
          wget --spider http://backend:9000/a/
        '
      timeout: 10s
      start_period: 5s

3. Update .github/workflows/run-ci-cd.yaml - Fix both E2E and fuzz test jobs:

For the E2E backend (around line 142):

            sh -c '
              python manage.py migrate &&
              gunicorn wsgi:application --bind 0.0.0.0:9000 --workers 2 --max-requests 1000 --max-requests-jitter 50 --worker-tmp-dir /dev/shm
          '

For the fuzz backend (around line 388):

            sh -c '
              python manage.py migrate &&
              gunicorn wsgi:application --bind 0.0.0.0:9500 --workers 2 --max-requests 1000 --max-requests-jitter 50 --worker-tmp-dir /dev/shm
            '

4. Optimize backend/apps/core/utils/index.py - Fix memory-intensive cache clearing:

def clear_index_cache(index_name: str) -> None:
    """Clear Algolia proxy cache entries from the cache store that match a given index name.

    Args:
        index_name (str): The specific index to clear cache for.
            If empty, the function does nothing.

    Returns:
        None

    """
    if not index_name:
        logger.info("No index name provided, skipping cache clear.")
        return

    pattern = f"{CACHE_PREFIX}:{index_name}*"
    
    # Use iter_keys as a generator to avoid loading all keys into memory
    deleted_count = 0
    for key in cache.iter_keys(pattern):
        logger.debug("Deleting key: %s", key)
        cache.delete(key)
        deleted_count += 1

    if deleted_count == 0:
        logger.info("No matching cache keys found for pattern: %s", pattern)
    else:
        logger.info("Deleted %d cache keys for pattern: %s", deleted_count, pattern)

Explanation of the Fixes

  1. The max-requests parameter defines the maximum number of requests a worker will process before it restarts, and max-requests-jitter randomizes worker restarts, preventing all workers from restarting simultaneously.

  2. --workers 2: Adds parallel processing capability for better load handling.

  3. --max-requests 1000 --max-requests-jitter 50: Workers restart after 100 + k requests, where k is a random jitter value ensuring staggered restarts (adjusted to 1000 for your use case).

  4. --worker-tmp-dir /dev/shm: Uses a tmpfs mount to avoid worker blocking issues with disk-backed filesystems.

  5. Cache clearing optimization: Avoids loading all keys into memory by using the generator directly instead of converting to a list.

These changes will prevent memory accumulation and ensure workers are periodically recycled, solving your OOM issue! 🎯

🐰✨


🧠 Learnings used
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.

Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: docker-compose/fuzz.yaml:70-83
Timestamp: 2025-12-26T06:57:19.911Z
Learning: In the fuzz testing environment (docker-compose/fuzz.yaml), the data-loader and graphql services are orchestrated in separate phases via the Makefile's test-fuzz target, not run simultaneously. The data is loaded to completion first, then the graphql fuzzer is started separately against the populated database. The docker-compose dependencies reflect runtime health checks rather than orchestration order.

@ahmedxgouda
Copy link
Collaborator Author

@coderabbitai Still the same error. Also, the clear_index_cache method last update was 4 months ago.

@ahmedxgouda ahmedxgouda marked this pull request as ready for review December 30, 2025 14:49
coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 30, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Find what was merged from main and what files changed
echo "=== Find the merge commit and what it brought in ==="
git log --oneline --graph feature/e2e-backend | head -30

echo -e "\n=== Get the actual merge commit hash ==="
MERGE_COMMIT=$(git log --merges --oneline -1 feature/e2e-backend | awk '{print $1}')
echo "Merge commit: $MERGE_COMMIT"

echo -e "\n=== Show what files changed in the merge ==="
git diff --name-only "${MERGE_COMMIT}^1" "${MERGE_COMMIT}"

echo -e "\n=== Show the actual changes in backend code from the merge ==="
git diff "${MERGE_COMMIT}^1" "${MERGE_COMMIT}" -- backend/apps/ | head -300

Length of output: 669


🏁 Script executed:

#!/bin/bash

# Check git log to see what commits came from main
echo "=== Recent commits on feature/e2e-backend with details ==="
git log --oneline -20 feature/e2e-backend

echo -e "\n=== Check when clear_index_cache was last modified ==="
git log --oneline -1 -- backend/apps/core/utils/index.py

Length of output: 369


🏁 Script executed:

#!/bin/bash

# Look for any new signal handlers that were added
echo "=== Find all signal handler files ==="
fd -e py . backend/apps/*/signals/

echo -e "\n=== Check for recent modifications to signal files ==="
git log --oneline --since="2 months ago" -- backend/apps/*/signals/

Length of output: 180


🏁 Script executed:

#!/bin/bash

# Check if there are new middleware or settings changes
echo "=== Check recent changes to settings files ==="
git log --oneline --since="2 months ago" -- backend/settings/

echo -e "\n=== Check recent changes to middleware ==="
git log --oneline --since="2 months ago" -- backend/apps/*/middleware/

Length of output: 513


🏁 Script executed:

#!/bin/bash

# Look at what's in the mentorship signals since that's where we found issues before
echo "=== All files in mentorship signals ==="
ls -la backend/apps/mentorship/signals/

echo -e "\n=== Git history of mentorship signals ==="
git log --oneline --since="3 months ago" -- backend/apps/mentorship/signals/

echo -e "\n=== Check all signal files content ==="
for file in backend/apps/mentorship/signals/*.py; do
    echo "=== $file ==="
    cat "$file"
    echo -e "\n"
done

Length of output: 1354


🏁 Script executed:

#!/bin/bash

# Check what branch we're on and recent commits
echo "=== Current branch ==="
git branch --show-current

echo -e "\n=== Recent commits on current branch ==="
git log --oneline -20

echo -e "\n=== Show commits with actual changes, not merges ==="
git log --oneline --no-merges -20

Length of output: 2246


🏁 Script executed:

#!/bin/bash

# Look at what the fuzz tests actually do
echo "=== Fuzz test configuration ==="
cat docker-compose/fuzz.yaml | sed -n '/^  graphql:/,/^  [^ ]/p'

echo -e "\n=== Check if there's a fuzzer script ==="
fd -e py fuzz backend/
fd -e js fuzz frontend/
fd -e yaml fuzz .

echo -e "\n=== Check what graphql queries the fuzzer uses ==="
rg -n "graphql" docker-compose/fuzz.yaml -A 5

Length of output: 772


🏁 Script executed:

#!/bin/bash

# Look for recent changes in GraphQL resolvers or queries
echo "=== Find GraphQL resolver files ==="
fd -e py . backend/apps/ | xargs grep -l "graphene\|GraphQL\|resolver" | head -20

echo -e "\n=== Check recent changes to GraphQL code ==="
git log --oneline --since="2 months ago" -- backend/apps/api/graphql/

Length of output: 1127


🏁 Script executed:

#!/bin/bash

# Check if there are any querysets without limits or pagination issues
echo "=== Search for .all() without pagination in GraphQL resolvers ==="
rg -n "\.all\(\)" backend/apps/api/graphql/ -C 3 | head -100

echo -e "\n=== Search for large querysets in resolvers ==="
rg -n "def resolve_" backend/apps/api/graphql/ -A 10 | head -150

Length of output: 246


🏁 Script executed:

#!/bin/bash

# Check for N+1 query issues or prefetch_related/select_related missing
echo "=== Check GraphQL schema and queries ==="
cat backend/apps/api/graphql/schema.py 2>/dev/null || echo "No schema.py file"

echo -e "\n=== Find all GraphQL type files ==="
fd -e py . backend/apps/api/graphql/

echo -e "\n=== List directory structure ==="
ls -la backend/apps/api/graphql/

Length of output: 315


🏁 Script executed:

#!/bin/bash

# Look at what commit introduced the issue by checking merge history
echo "=== All commits in last month ==="
git log --oneline --since="1 month ago" --all

echo -e "\n=== Check for merge commits ==="
git log --oneline --merges --since="1 month ago"

echo -e "\n=== Show detailed history with commit dates ==="
git log --oneline --date=short --pretty=format:"%h %ad %s" -15

Length of output: 8813


🏁 Script executed:

#!/bin/bash

# Check the fuzz Dockerfile to see what the fuzzer does
echo "=== Fuzz Dockerfile ==="
cat backend/docker/Dockerfile.fuzz

echo -e "\n=== Check if there's a fuzzing script ==="
fd -e py . backend/ | xargs grep -l "fuzz" | head -10

Length of output: 1218


🏁 Script executed:

#!/bin/bash

# Look at the PR #2528 that introduced GraphQL caching
echo "=== Find commits from PR #2528 (GraphQL resolver caching) ==="
git show fa1b9496 --stat

echo -e "\n=== Show the actual changes ==="
git show fa1b9496 --no-patch --pretty=full

Length of output: 1782


🏁 Script executed:

#!/bin/bash

# Check what files were changed in PR #2528
echo "=== Files changed in caching PR ==="
git diff fa1b9496^..fa1b9496 --name-only

echo -e "\n=== Show the GraphQL caching changes ==="
git diff fa1b9496^..fa1b9496 -- backend/apps/ | head -500

Length of output: 2577


🏁 Script executed:

#!/bin/bash

# Check the actual GraphQL code structure
echo "=== Find all internal API nodes and queries ==="
fd -e py . backend/apps/*/api/internal/ | head -20

echo -e "\n=== Check for cache decorators in GraphQL code ==="
rg -n "@cache\|@cached\|cache_page" backend/apps/*/api/internal/ | head -50

Length of output: 1108


🏁 Script executed:

#!/bin/bash

# Look at Redis cache settings in base.py
echo "=== Full Redis cache configuration in base.py ==="
cat backend/settings/base.py | grep -A 40 "CACHES"

echo -e "\n=== Check Redis connection settings ==="
rg -n "REDIS.*TIMEOUT|REDIS.*POOL|CONNECTION_POOL" backend/settings/

Length of output: 1382


🏁 Script executed:

#!/bin/bash

# Check if there are any issues with the way cache keys are generated
echo "=== Search for cache key generation ==="
rg -n "cache\.set\|cache\.get\|cache_key" backend/apps/*/api/internal/ -C 3 | head -100

Length of output: 39


🏁 Script executed:

#!/bin/bash

# Check the GraphQL settings and if the cache extension can be disabled
echo "=== GraphQL settings ==="
cat backend/settings/graphql.py

echo -e "\n=== Check for GRAPHQL_RESOLVER_CACHE settings ==="
rg -n "GRAPHQL_RESOLVER_CACHE" backend/settings/

Length of output: 1225


🏁 Script executed:

#!/bin/bash

# Check the extensions.py file to understand the caching mechanism
echo "=== Full CacheExtension implementation ==="
cat backend/apps/common/extensions.py

echo -e "\n=== Check how extensions are configured ==="
rg -n "CacheExtension\|SchemaExtension" backend/ --type=py -C 3

Length of output: 2188


🏁 Script executed:

#!/bin/bash

# Check what settings are used in fuzz vs base
echo "=== Fuzz settings ==="
cat backend/settings/fuzz.py

echo -e "\n=== E2E settings ==="
cat backend/settings/e2e.py

echo -e "\n=== Check if there's a GRAPHQL_EXTENSIONS setting ==="
rg -n "extensions=\|EXTENSIONS" backend/settings/ -C 3

Length of output: 979


🏁 Script executed:

#!/bin/bash

# Look at entrypoint.fuzz.sh to see what the fuzzer does
echo "=== Fuzz entrypoint script ==="
cat backend/docker/entrypoint.fuzz.sh

echo -e "\n=== Check fuzzing results directory ==="
ls -la backend/fuzzing_results/ 2>/dev/null || echo "Directory doesn't exist or is empty"

Length of output: 898


🏁 Script executed:

#!/bin/bash

# Check if Redis has connection pool settings or if we can add them
echo "=== Check django-redis documentation or defaults ==="
rg -n "CLIENT_CLASS\|DefaultClient" backend/settings/ -C 5

echo -e "\n=== Check if there are OPTIONS for connection pool ==="
cat backend/settings/base.py | grep -A 15 "CACHES"

Length of output: 558


🌐 Web query:

django-redis connection pool configuration options

💡 Result:

Key django-redis connection-pool settings and how to use them

  • Where to set them

    • In CACHES OPTIONS: "CONNECTION_POOL_KWARGS" (dict) to pass keyword args to the redis-py ConnectionPool, and "CONNECTION_POOL_CLASS" (string or importable class) to use a custom pool class. Example: OPTIONS = {"CONNECTION_POOL_KWARGS": {...}, "CONNECTION_POOL_CLASS": "redis.connection.BlockingConnectionPool"}. [1]
  • Common pool kwargs you can pass via CONNECTION_POOL_KWARGS (these are forwarded to redis-py ConnectionPool / BlockingConnectionPool):

    • max_connections — maximum number of connections in the pool. [1][2]
    • timeout — (BlockingConnectionPool) seconds to wait for a free connection (or None to block indefinitely). [2]
    • connection_class — class used for individual connections (redis.connection.Connection or custom). [2]
    • queue_class — queue implementation used by BlockingConnectionPool (e.g., LifoQueue). [2]
    • socket_connect_timeout / socket_timeout — TCP/connect and read timeouts passed to connection creation. [2]
    • retry_on_timeout — whether to retry commands on timeout (bool). [2]
    • encoding / decode_responses / parser_class / other redis.Connection kwargs — any keyword argument accepted by redis.Connection is generally accepted and forwarded. [1][2]
  • Using a custom pool class

    • Set "CONNECTION_POOL_CLASS" to the import path (string) or class object of a pool (e.g., "redis.sentinel.SentinelConnectionPool" or your subclass). You can still pass class-specific kwargs through CONNECTION_POOL_KWARGS. [1]
  • Example (settings.py)

    • CACHES = {
      "default": {
      "BACKEND": "django_redis.cache.RedisCache",
      "LOCATION": "redis://127.0.0.1:6379/0",
      "OPTIONS": {
      "CONNECTION_POOL_KWARGS": {
      "max_connections": 100,
      "retry_on_timeout": True,
      "socket_connect_timeout": 2
      },
      "CONNECTION_POOL_CLASS": "redis.connection.BlockingConnectionPool"
      }
      }
      } [1][2]

Sources

  • django-redis README (connection pools). [1]
  • redis-py ConnectionPool / BlockingConnectionPool docs (pool kwargs and meanings). [2]

References
[1] django-redis project README / docs.
[2] redis-py documentation (ConnectionPool / BlockingConnectionPool).


@ahmedxgouda I found it! The real issue is from PR #2528 (GraphQL resolver caching) that was merged from main on December 19, 2025. This introduced the CacheExtension which wraps every GraphQL query with Redis caching operations.

Root Cause

The fuzzer (graphqler) runs for 300 seconds making rapid-fire GraphQL requests. Each query triggers the CacheExtension which calls cache.get_or_set(). The problem is that django-redis has no connection pool limit configured, so it can create unlimited connections to Redis. Each connection consumes memory in the Gunicorn worker process. After hundreds of unique query combinations over 300 seconds, the worker's memory grows until it hits the OOM limit and gets SIGKILL'd.

The Fix

Add connection pool limits to prevent unlimited connection creation:

Update backend/settings/base.py:

CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:6379",
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
            "CONNECTION_POOL_KWARGS": {
                "max_connections": 50,
                "retry_on_timeout": True,
                "socket_connect_timeout": 5,
                "socket_timeout": 5,
            },
        },
        "TIMEOUT": 300,
    }
}

Alternative: Disable caching in fuzz environment

Since caching isn't needed for fuzz testing, you can disable the CacheExtension in the fuzz environment. Update backend/settings/graphql.py:

"""GraphQL schema."""

import strawberry
from django.conf import settings

from apps.api.internal.mutations import ApiMutations
from apps.api.internal.queries import ApiKeyQueries
from apps.common.extensions import CacheExtension
from apps.github.api.internal.queries import GithubQuery
from apps.mentorship.api.internal.mutations import (
    ModuleMutation,
    ProgramMutation,
)
from apps.mentorship.api.internal.queries import (
    MentorshipQuery,
    ModuleQuery,
    ProgramQuery,
)
from apps.nest.api.internal.mutations import NestMutations
from apps.owasp.api.internal.queries import OwaspQuery


@strawberry.type
class Mutation(
    ApiMutations,
    ModuleMutation,
    NestMutations,
    ProgramMutation,
):
    """Schema mutations."""


@strawberry.type
class Query(
    ApiKeyQueries,
    GithubQuery,
    MentorshipQuery,
    ModuleQuery,
    OwaspQuery,
    ProgramQuery,
):
    """Schema queries."""


# Disable caching in fuzz environment to prevent memory issues
extensions = [] if getattr(settings, "IS_FUZZ_ENVIRONMENT", False) else [CacheExtension]

schema = strawberry.Schema(mutation=Mutation, query=Query, extensions=extensions)

Recommendation

I recommend implementing both fixes:

  1. Add the connection pool limits (this is a best practice for all environments)
  2. Disable caching in fuzz environment (caching serves no purpose during fuzzing and only adds overhead)

This will solve the OOM issue completely! 🎯


🧠 Learnings used
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.

Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: docker-compose/fuzz.yaml:70-83
Timestamp: 2025-12-26T06:57:19.911Z
Learning: In the fuzz testing environment (docker-compose/fuzz.yaml), the data-loader and graphql services are orchestrated in separate phases via the Makefile's test-fuzz target, not run simultaneously. The data is loaded to completion first, then the graphql fuzzer is started separately against the populated database. The docker-compose dependencies reflect runtime health checks rather than orchestration order.

@ahmedxgouda
Copy link
Collaborator Author

@coderabbitai still the same error.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Check for memory limits in docker-compose files
echo "=== Memory limits in fuzz.yaml ==="
cat docker-compose/fuzz.yaml | grep -i "mem_limit\|memory:"

echo -e "\n=== Memory limits in e2e.yaml ==="
cat docker-compose/e2e.yaml | grep -i "mem_limit\|memory:"

echo -e "\n=== Check CI/CD for memory limits ==="
cat .github/workflows/run-ci-cd.yaml | grep -i "memory\|mem_limit" -C 2

Length of output: 108


🏁 Script executed:

#!/bin/bash

# Look at what was actually merged from main recently
echo "=== Find commits merged from main in last 2 months ==="
git log --oneline --merges --since="2 months ago" feature/e2e-backend

echo -e "\n=== Check the most recent merge ==="
LATEST_MERGE=$(git log --merges --oneline -1 feature/e2e-backend | awk '{print $1}')
echo "Latest merge commit: $LATEST_MERGE"

if [ ! -z "$LATEST_MERGE" ]; then
  echo -e "\n=== Files changed in latest merge ==="
  git diff --name-status "${LATEST_MERGE}^1" "${LATEST_MERGE}" | head -50
fi

Length of output: 513


🏁 Script executed:

#!/bin/bash

# Check for any model loading or heavy initialization
echo "=== Search for model loading in AppConfig ready() ==="
rg -n "def ready" backend/apps/ --type=py -A 15 | head -100

echo -e "\n=== Search for preload or cache warming ==="
rg -n "preload|warm.*cache|load.*model|initialize.*cache" backend/ --type=py -i -C 3 | head -100

Length of output: 1934


🏁 Script executed:

#!/bin/bash

# Check for large data structures or querysets being loaded
echo "=== Search for .all() without limits in common places ==="
rg -n "\.all\(\)" backend/apps/api/rest/ --type=py -C 2 | head -100

echo -e "\n=== Search for select_related or prefetch_related chains ==="
rg -n "select_related|prefetch_related" backend/apps/ --type=py -C 2 | head -80

Length of output: 6978


🏁 Script executed:

#!/bin/bash

# Check middleware that might accumulate data
echo "=== Custom middleware ==="
fd middleware.py backend/apps/

echo -e "\n=== Middleware content ==="
for file in $(fd middleware.py backend/apps/); do
  echo "=== $file ==="
  cat "$file"
  echo -e "\n"
done

Length of output: 53


🏁 Script executed:

#!/bin/bash

# Look for any Django signals that might be creating objects or loading data
echo "=== All signal handlers ==="
fd -e py . backend/apps/*/signals/ --exec basename {}

echo -e "\n=== Check for any bulk operations in signals ==="
rg -n "bulk_create|bulk_update|objects\.all\(\)" backend/apps/*/signals/ --type=py -C 3

Length of output: 100


🏁 Script executed:

#!/bin/bash

# Check if there are any decorators that might cache or store data
echo "=== Search for custom decorators ==="
rg -n "@.*cache|@.*memoize" backend/apps/ --type=py -C 2 | head -80

echo -e "\n=== Check for functools.lru_cache ==="
rg -n "lru_cache|@cache" backend/ --type=py -C 3 | head -50

Length of output: 6692


🏁 Script executed:

#!/bin/bash

# Check the GraphQL extension more carefully
echo "=== Full CacheExtension code ==="
cat backend/apps/common/extensions.py

echo -e "\n=== Check how it's being used ==="
rg -n "CacheExtension" backend/ --type=py -C 5

Length of output: 6704


🏁 Script executed:

#!/bin/bash

# Check if Slack is being used or called in API/GraphQL code
echo "=== Check if Slack functions are imported or used in API ==="
rg -n "from apps.slack|import.*slack" backend/apps/api/ backend/apps/mentorship/api/ backend/apps/owasp/ --type=py -i

echo -e "\n=== Check Slack-related queries in GraphQL ==="
rg -n "slack|gsoc|news_data|staff_data|posts_data" backend/apps/*/api/internal/ --type=py -i -C 2

Length of output: 3082


🏁 Script executed:

#!/bin/bash

# Check fuzz environment settings for Slack
echo "=== Fuzz settings for Slack ==="
cat backend/settings/fuzz.py

echo -e "\n=== E2E settings for Slack ==="
cat backend/settings/e2e.py

echo -e "\n=== Base settings for Slack ==="
cat backend/settings/base.py | grep -i "slack" -A 5 -B 5

Length of output: 1490


🏁 Script executed:

#!/bin/bash

# Check the actual Slack app configuration
echo "=== Slack app configuration ==="
cat backend/apps/slack/apps.py

echo -e "\n=== Check what configure_slack_events does ==="
cat backend/apps/slack/events.py | head -100

Length of output: 2269


🏁 Script executed:

#!/bin/bash

# Check all @lru_cache usage without maxsize
echo "=== All lru_cache without explicit maxsize ==="
rg -n "^@lru_cache$" backend/ --type=py -C 3

Length of output: 1441


🏁 Script executed:

#!/bin/bash

# Check Python version to understand default lru_cache behavior
echo "=== Python version in Dockerfile ==="
cat backend/docker/Dockerfile | grep -i "python" | head -10

echo -e "\n=== Python version in pyproject.toml ==="
cat backend/pyproject.toml | grep -i "python" | head -10

Length of output: 324


@ahmedxgouda I found it! The root cause is unbounded @lru_cache decorators in backend/apps/slack/utils.py causing memory accumulation during fuzzing.

Root Cause

The LRU cache retains references to all of the arguments of the decorated function in its cache, and if we invoke such a decorated function with an object as an argument, that object will persist in memory indefinitely, or until the program terminates or the cache is cleared.

You have 4 functions in backend/apps/slack/utils.py with unbounded @lru_cache decorators:

  1. get_gsoc_projects(year) - fetches from Algolia, returns list
  2. get_news_data(limit, timeout) - fetches from external API, returns list of dicts
  3. get_staff_data(timeout) - fetches from external API, returns list
  4. get_posts_data(limit) - returns QuerySet (this is the biggest problem)

During 300 seconds of fuzzing, the GraphQL fuzzer generates random queries that access fields like is_gsoc_mentor (in the User node) and slack_workspace_stats (in stats resolver). Each unique combination of parameters caches the result forever (maxsize defaults to 128 in Python 3.9+, but with large return values like QuerySets, even 128 entries consume massive memory).

Setting maxsize to None can lead to unbounded cache growth, which, if unchecked, can consume all available memory and lead to performance degradation, and the same problem occurs when cached values are large objects like QuerySets.

The Fix

1. Update backend/apps/slack/utils.py - Add explicit maxsize limits:

from functools import lru_cache

# ... existing code ...

@lru_cache(maxsize=16)  # Limit cache size
def get_gsoc_projects(year: int) -> list:
    """Get GSoC projects.

    Args:
        year (int): The year of GSoC.

    Returns:
        list: GSoC projects.

    """
    return algolia_client.get_index(
        # ... existing code ...
    )["hits"]


@lru_cache(maxsize=16)  # Limit cache size
def get_news_data(limit: int = 10, timeout: float | None = 30) -> list[dict[str, str]]:
    """Get news data.

    Args:
        limit (int): The number of news items to return.
        timeout (float | None): Request timeout in seconds.

    Returns:
        list[dict[str, str]]: News data.

    """
    # ... existing code ...


@lru_cache(maxsize=16)  # Limit cache size
def get_staff_data(timeout: float | None = 30) -> list | None:
    """Get staff data.

    Args:
        timeout (float | None): Request timeout in seconds.

    Returns:
        list | None: Staff data or None if error.

    """
    # ... existing code ...


@lru_cache(maxsize=16)  # Limit cache size
def get_posts_data(limit: int = 5) -> QuerySet | None:
    """Get posts data.

    Args:
        limit (int): The number of posts to return.

    Returns:
        QuerySet | None: Posts queryset or None if error.

    """
    # ... existing code ...

2. Add memory limits to Docker containers (defense in depth):

Update docker-compose/fuzz.yaml:

  backend:
    container_name: fuzz-nest-backend
    command: >
      sh -c '
        python manage.py migrate &&
        gunicorn wsgi:application --bind 0.0.0.0:9500
      '
    build:
      context: ../backend
      dockerfile: docker/Dockerfile
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_healthy
    env_file: ../backend/.env.fuzz.example
    networks:
      - fuzz-nest-network
    ports:
      - 9500:9500
    deploy:
      resources:
        limits:
          memory: 1G  # Add memory limit
    healthcheck:
      interval: 10s
      retries: 10
      test: >
        sh -c '
          wget --spider http://backend:9500/a/
        '
      timeout: 10s
      start_period: 5s

Update docker-compose/e2e.yaml similarly:

  backend:
    # ... existing config ...
    deploy:
      resources:
        limits:
          memory: 1G  # Add memory limit

3. Update .github/workflows/run-ci-cd.yaml - Add memory limits to service containers:

For the fuzz test backend service (around line 376):

      backend:
        image: ghcr.io/${{ github.repository }}/nest-backend-fuzz:${{ github.sha }}
        env:
          DJANGO_CONFIGURATION: Fuzz
          # ... other env vars ...
        options: >-
          --health-cmd "wget --spider http://localhost:9500/a/"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 10
          --health-start-period 5s
          --memory 1g

For the e2e test backend service (around line 130):

      backend:
        image: ghcr.io/${{ github.repository }}/nest-backend-e2e:${{ github.sha }}
        env:
          DJANGO_CONFIGURATION: E2E
          # ... other env vars ...
        options: >-
          --health-cmd "wget --spider http://localhost:9000/a/"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 10
          --health-start-period 5s
          --memory 1g

Why This Fixes It

  1. Limits cache growth: maxsize=16 allows only 16 unique parameter combinations to be cached per function
  2. Prevents QuerySet accumulation: The most problematic function (get_posts_data) won't accumulate unlimited QuerySets in memory
  3. Memory limits provide safety: Even if there are other memory issues, the container won't grow beyond 1GB

This should completely resolve your OOM issue! 🎯


✏️ Learnings added
Learnt from: ahmedxgouda
URL: https://github.com/OWASP/Nest/pull/3041

Timestamp: 2025-12-30T15:05:23.146Z
Learning: In the OWASP/Nest repository, four functions in backend/apps/slack/utils.py (get_gsoc_projects, get_news_data, get_staff_data, get_posts_data) have unbounded lru_cache decorators that can cause memory accumulation during fuzz testing. These should have explicit maxsize limits (e.g., maxsize=16) to prevent OOM issues.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.

Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: docker-compose/fuzz.yaml:70-83
Timestamp: 2025-12-26T06:57:19.911Z
Learning: In the fuzz testing environment (docker-compose/fuzz.yaml), the data-loader and graphql services are orchestrated in separate phases via the Makefile's test-fuzz target, not run simultaneously. The data is loaded to completion first, then the graphql fuzzer is started separately against the populated database. The docker-compose dependencies reflect runtime health checks rather than orchestration order.

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docker-compose/fuzz/compose.yaml (1)

1-8: Add explicit maxsize limits to unbounded lru_cache decorators in backend/apps/slack/utils.py.

Four functions have unbounded lru_cache decorators that cause memory accumulation during fuzz testing: get_gsoc_projects, get_news_data, get_staff_data, and get_posts_data. Add explicit maxsize limits (e.g., maxsize=16) to each decorator to prevent OOM issues.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f9edbb and 2256ea4.

📒 Files selected for processing (4)
  • backend/Makefile
  • docker-compose/e2e/compose.yaml
  • docker-compose/fuzz/compose.yaml
  • frontend/Makefile
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/Makefile
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 0
File: :0-0
Timestamp: 2025-12-30T15:05:23.146Z
Learning: In the OWASP/Nest repository, four functions in backend/apps/slack/utils.py (get_gsoc_projects, get_news_data, get_staff_data, get_posts_data) have unbounded lru_cache decorators that can cause memory accumulation during fuzz testing. These should have explicit maxsize limits (e.g., maxsize=16) to prevent OOM issues.
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.
📚 Learning: 2025-12-26T06:57:19.911Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 1139
File: docker-compose/fuzz.yaml:70-83
Timestamp: 2025-12-26T06:57:19.911Z
Learning: In the fuzz testing environment (docker-compose/fuzz.yaml), the data-loader and graphql services are orchestrated in separate phases via the Makefile's test-fuzz target, not run simultaneously. The data is loaded to completion first, then the graphql fuzzer is started separately against the populated database. The docker-compose dependencies reflect runtime health checks rather than orchestration order.

Applied to files:

  • backend/Makefile
  • docker-compose/fuzz/compose.yaml
📚 Learning: 2025-10-26T12:50:50.512Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.

Applied to files:

  • backend/Makefile
📚 Learning: 2025-12-26T06:08:58.549Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: For the OWASP/Nest repository, Redis image versions should remain consistent across all environments (production, staging, local, E2E, and CI/CD E2E tests). When upgrading Redis, update all docker-compose files and CI/CD workflow configurations together to maintain environment parity.

Applied to files:

  • backend/Makefile
📚 Learning: 2025-12-26T06:08:58.549Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 3041
File: .github/workflows/run-ci-cd.yaml:233-243
Timestamp: 2025-12-26T06:08:58.549Z
Learning: Ensure Redis image versions stay in sync across all environments by updating every relevant YAML file together (docker-compose files and CI/CD workflow configurations). When upgrading Redis, bump the image version in all docker-compose files and in all CI workflow YAMLs in one coordinated change to maintain parity across production, staging, local, E2E, and CI tests.

Applied to files:

  • docker-compose/e2e/compose.yaml
  • docker-compose/fuzz/compose.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Run fuzz tests
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (8)
backend/Makefile (2)

133-139: LGTM! Cache service correctly added to e2e and fuzz targets.

The run-backend-e2e and run-backend-fuzz targets now properly include the cache service alongside backend and db, ensuring Redis is available during these test runs. The path update to the new directory structure (docker-compose/e2e/compose.yaml and docker-compose/fuzz/compose.yaml) is consistent.


167-173: LGTM! Both test-fuzz phases now include the cache service.

The cache service is correctly included in both orchestration phases:

  1. First phase (data loading): db cache backend data-loader
  2. Second phase (fuzzing): db cache backend graphql

This ensures Redis is available during data loading (when clear_index_cache() may be triggered by signals) and during the actual fuzz testing.

docker-compose/e2e/compose.yaml (3)

10-17: LGTM! Backend service correctly depends on cache health.

The backend now waits for both db and cache services to be healthy before starting. This addresses the root cause identified in Issue #3034 where clear_index_cache() could fail if Redis wasn't ready during data loading.


92-112: LGTM! Path updates and volume declarations are correct.

The e2e-tests service paths are correctly updated for the new directory structure, and the e2e-cache-data volume is properly declared.


73-91: Cache service configuration is correct and Redis version is consistent across all environments.

The cache service is well-configured with:

  • Memory limit (--maxmemory 100mb) to prevent unbounded growth
  • LRU eviction policy appropriate for caching
  • Password authentication
  • Proper healthcheck

Redis version redis:8.0.5-alpine3.21 is consistent across all environments:

  • Local: docker-compose/local/compose.yaml
  • Production: docker-compose/production/compose.yaml
  • Staging: docker-compose/staging/compose.yaml
  • E2E: docker-compose/e2e/compose.yaml
  • Fuzz: docker-compose/fuzz/compose.yaml
  • CI/CD: .github/workflows/run-ci-cd.yaml
docker-compose/fuzz/compose.yaml (3)

10-17: LGTM! Backend correctly depends on cache for fuzz environment.

The cache dependency ensures Redis is healthy before the backend starts, addressing the crash issue where clear_index_cache() failed during data loading when Redis wasn't available.


92-112: LGTM! Path updates and volume declaration are correct.

The graphql service paths are correctly updated for the new directory structure (../../backend/docker and ../../backend/fuzzing_results), and the fuzz-cache-data volume is properly declared.


72-91: Verify Redis version consistency and password alignment across all environments.

The cache service configuration appears appropriate with memory limits and LRU eviction. However, confirm: (1) the default password nest-fuzz-cache-password matches DJANGO_REDIS_PASSWORD in backend/.env.fuzz.example, and (2) Redis version 8.0.5-alpine3.21 is consistent across all docker-compose files (production, staging, local, E2E) and CI/CD workflow configurations to maintain environment parity.

@arkid15r arkid15r marked this pull request as draft January 1, 2026 20:02
@arkid15r arkid15r marked this pull request as ready for review January 2, 2026 00:30
@arkid15r arkid15r enabled auto-merge (squash) January 2, 2026 00:30
@arkid15r arkid15r merged commit 5e7e5d4 into OWASP:feature/e2e-backend Jan 2, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Redis cache to e2e and fuzz tests locally and in CI/CD

2 participants