Skip to content

Conversation

@are-ces
Copy link
Contributor

@are-ces are-ces commented Dec 16, 2025

Description

We want to run CI e2e tests on each PR, however this is not needed for all inference providers.
Running e2e test for each provider can be very costly. Our assumption is that testing them once a day would be sufficient .

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

NA

Related Tickets & Documents

  • Related Issue # 1076
  • Closes # 1076

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Chores
    • Simplified E2E testing workflow by consolidating environment configurations and reducing credential handling overhead.
    • Added new scheduled testing workflow for inference providers across multiple deployment modes.

✏️ Tip: You can customize this high-level summary in your review settings.

@are-ces are-ces requested review from radofuchs and tisnik December 16, 2025 15:36
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Walkthrough

The PR reorganizes end-to-end testing workflows by removing provider-specific logic (Azure, VertexAI) from the main e2e_tests.yaml workflow and creating a new dedicated e2e_tests_providers.yaml workflow to handle multi-cloud provider testing with separate credential and configuration management.

Changes

Cohort / File(s) Summary
E2E Workflow Consolidation
.github/workflows/e2e_tests.yaml
Reduced matrix environment from ["ci", "azure", "vertexai"] to ["ci"]. Removed environment-specific credential steps (Azure token acquisition, VertexAI service account key setup). Eliminated public cloud credential variables (AZURE_API_KEY, VERTEX_AI_LOCATION, VERTEX_AI_PROJECT, GOOGLE_APPLICATION_CREDENTIALS, GCP_KEYS_PATH) and secret injections (CLIENT_ID, CLIENT_SECRET, TENANT_ID) from server and library execution steps.
New Provider-Specific E2E Workflow
.github/workflows/e2e_tests_providers.yaml
New workflow file added for provider-specific E2E testing with daily schedule and manual trigger. Defines matrix across mode (server, library) and environment (azure, vertexai). Includes credential setup for Azure (token acquisition) and VertexAI (service account key generation), environment-specific configuration loading, docker-compose orchestration for service startup, connectivity validation, and test execution with detailed error reporting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Key areas requiring attention:
    • Verify Azure credential handling in new workflow (CLIENT_ID, CLIENT_SECRET, TENANT_ID secret usage)
    • Validate VertexAI service account key file generation and cleanup logic (GOOGLE_APPLICATION_CREDENTIALS path handling)
    • Confirm matrix configuration correctly iterates across [server, library] and [azure, vertexai] combinations
    • Ensure main e2e_tests.yaml remains functional for CI-only testing after removals
    • Check that environment variables are properly scoped and not leaking between matrix runs

Possibly related PRs

Suggested reviewers

  • tisnik
  • radofuchs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: moving e2e inference provider tests from every PR to daily scheduled runs, matching the workflow additions in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
.github/workflows/e2e_tests.yaml (1)

119-124: Condition is now redundant after matrix simplification.

Since matrix.environment is now always "ci", the condition matrix.environment != 'vertexai' will always be true. Consider simplifying by removing the condition entirely, or removing the step if the dummy GCP directory is no longer needed for CI-only runs.

-      - name: Create dummy GCP keys directory
-        if: matrix.environment != 'vertexai'
-        run: |
-          echo "Creating dummy GCP keys directory for non-VertexAI environment..."
-          mkdir -p ./tmp/.gcp-keys-dummy
-          echo "✅ Dummy directory created."
+      - name: Create dummy GCP keys directory
+        run: |
+          echo "Creating dummy GCP keys directory for non-VertexAI environment..."
+          mkdir -p ./tmp/.gcp-keys-dummy
+          echo "✅ Dummy directory created."
.github/workflows/e2e_tests_providers.yaml (2)

29-40: Checkout comments reference PR_TARGET, but workflow doesn't trigger on pull_request_target.

The workflow only triggers on schedule and workflow_dispatch, so the comments about PR_TARGET behavior are misleading. The fallback expressions will work correctly (using github.repository and github.sha), but the comments suggest functionality that doesn't apply to this workflow.

Consider simplifying the checkout step since this workflow only runs on schedule/dispatch:

       - uses: actions/checkout@v4
-        with:
-          # On PR_TARGET → the fork (or same repo) that opened the PR.
-          # On push      → falls back to the current repository.
-          repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}
-
-          # On PR_TARGET → the PR head *commit* (reproducible).
-          # On push      → the pushed commit that triggered the workflow.
-          ref: ${{ github.event.pull_request.head.ref || github.sha }}
-
-          # Don't keep credentials when running untrusted PR code under PR_TARGET.
-          persist-credentials: ${{ github.event_name != 'pull_request_target' }}

244-247: Consider replacing fixed sleep with health check polling.

A fixed 20-second sleep may be too short for slower startups or wasteful for quick ones. Consider polling the health endpoint with retries.

       - name: Wait for services
         run: |
-          echo "Waiting for services to be healthy..."
-          sleep 20
+          echo "Waiting for services to be healthy..."
+          for i in {1..30}; do
+            if curl -sf http://localhost:8080/v1/models > /dev/null 2>&1; then
+              echo "✅ Services are healthy after $((i * 2)) seconds"
+              exit 0
+            fi
+            echo "Waiting... ($i/30)"
+            sleep 2
+          done
+          echo "❌ Services did not become healthy within 60 seconds"
+          exit 1
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bace7c4 and f49ef75.

📒 Files selected for processing (2)
  • .github/workflows/e2e_tests.yaml (1 hunks)
  • .github/workflows/e2e_tests_providers.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: E2E: server mode / vertexai
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: library mode / vertexai
  • GitHub Check: E2E: library mode / azure
  • GitHub Check: E2E: server mode / azure
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: build-pr
🔇 Additional comments (5)
.github/workflows/e2e_tests.yaml (1)

13-13: LGTM! Matrix simplified to CI-only environment.

The simplification aligns with the PR objective of moving provider-specific tests to a separate daily workflow.

.github/workflows/e2e_tests_providers.yaml (4)

70-94: LGTM! Azure token acquisition is well-implemented.

The step correctly uses the OAuth 2.0 client credentials flow, validates the response, and securely passes the token via GITHUB_ENV.


96-133: LGTM! VertexAI credentials setup is comprehensive.

Good implementation with base64 decoding, JSON validation, and appropriate permissions for container access.


197-221: LGTM! Service startup steps correctly pass provider-specific credentials.

Both server and library mode steps properly include Azure and VertexAI environment variables, with appropriate error handling for failed startups.

Also applies to: 223-242


190-195: No action needed. The GCP keys paths are correctly configured with proper fallback handling. Docker-compose uses ${GCP_KEYS_PATH:-./tmp/.gcp-keys-dummy} as its mount source, which allows the environment variable to override the default. For VertexAI, GCP_KEYS_PATH=./tmp/.gcp-keys is explicitly set; for non-VertexAI, the variable is unset and docker-compose uses the default ./tmp/.gcp-keys-dummy, which matches what the workflow creates. The dummy directory name is intentional, not a misconfiguration.

Likely an incorrect or invalid review comment.

Comment on lines +1 to +2
# .github/workflows/e2e_tests.yml
name: E2E Inference Provider Tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Filename mismatch in comment.

The comment indicates .github/workflows/e2e_tests.yml but the actual filename is .github/workflows/e2e_tests_providers.yaml. This could cause confusion during maintenance.

-# .github/workflows/e2e_tests.yml
+# .github/workflows/e2e_tests_providers.yaml
 name: E2E Inference Provider Tests
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# .github/workflows/e2e_tests.yml
name: E2E Inference Provider Tests
# .github/workflows/e2e_tests_providers.yaml
name: E2E Inference Provider Tests
🤖 Prompt for AI Agents
In .github/workflows/e2e_tests_providers.yaml around lines 1 to 2, the inline
comment references the wrong filename (.github/workflows/e2e_tests.yml) which is
misleading; update that comment to reference the actual file name
(.github/workflows/e2e_tests_providers.yaml) or remove the redundant filename
line so it accurately reflects the workflow file and avoids future confusion.

Comment on lines +169 to +181
- name: Show final configuration
run: |
echo "=== Configuration Summary ==="
echo "Deployment mode: ${{ matrix.mode }}"
echo "Environment: ${{ matrix.environment }}"
echo "Source config: tests/e2e/configs/run-ci.yaml"
echo ""
echo "=== Configuration Preview ==="
echo "Providers: $(grep -c "provider_id:" run.yaml)"
echo "Models: $(grep -c "model_id:" run.yaml)"
echo ""
echo "=== lightspeed-stack.yaml ==="
grep -A 3 "llama_stack:" lightspeed-stack.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Bug: Hardcoded config path doesn't match actual environment.

Line 174 hardcodes "run-ci.yaml" but this workflow uses azure and vertexai environments. The actual config file used is run-${{ matrix.environment }}.yaml. This causes misleading log output.

       - name: Show final configuration
         run: |
           echo "=== Configuration Summary ==="
           echo "Deployment mode: ${{ matrix.mode }}"
           echo "Environment: ${{ matrix.environment }}"
-          echo "Source config: tests/e2e/configs/run-ci.yaml"
+          echo "Source config: tests/e2e/configs/run-${{ matrix.environment }}.yaml"
           echo ""
           echo "=== Configuration Preview ==="
           echo "Providers: $(grep -c "provider_id:" run.yaml)"
           echo "Models: $(grep -c "model_id:" run.yaml)"
           echo ""
           echo "=== lightspeed-stack.yaml ==="
           grep -A 3 "llama_stack:" lightspeed-stack.yaml
🤖 Prompt for AI Agents
In .github/workflows/e2e_tests_providers.yaml around lines 169 to 181 the
"Source config" echo is hardcoded to "tests/e2e/configs/run-ci.yaml" while the
workflow actually uses run-${{ matrix.environment }}.yaml (e.g.,
azure/vertexai); update the echo to reference the matrix environment variable so
the log shows the real config path (e.g., tests/e2e/configs/run-${{
matrix.environment }}.yaml) and ensure any other log lines referencing run.yaml
are consistent with the chosen environment variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant