LCORE-1076: Run daily e2e tests for inference providers (e.g., Azure, WatsonX, etc..) #927

are-ces · 2025-12-16T15:36:30Z

Description

We want to run CI e2e tests on each PR, however this is not needed for all inference providers.
Running e2e test for each provider can be very costly. Our assumption is that testing them once a day would be sufficient .

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

NA

Related Tickets & Documents

Related Issue # 1076
Closes # 1076

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Chores
- Simplified E2E testing workflow by consolidating environment configurations and reducing credential handling overhead.
- Added new scheduled testing workflow for inference providers across multiple deployment modes.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-16T15:36:45Z

Walkthrough

The PR reorganizes end-to-end testing workflows by removing provider-specific logic (Azure, VertexAI) from the main e2e_tests.yaml workflow and creating a new dedicated e2e_tests_providers.yaml workflow to handle multi-cloud provider testing with separate credential and configuration management.

Changes

Cohort / File(s)	Summary
E2E Workflow Consolidation `.github/workflows/e2e_tests.yaml`	Reduced matrix environment from ["ci", "azure", "vertexai"] to ["ci"]. Removed environment-specific credential steps (Azure token acquisition, VertexAI service account key setup). Eliminated public cloud credential variables (AZURE_API_KEY, VERTEX_AI_LOCATION, VERTEX_AI_PROJECT, GOOGLE_APPLICATION_CREDENTIALS, GCP_KEYS_PATH) and secret injections (CLIENT_ID, CLIENT_SECRET, TENANT_ID) from server and library execution steps.
New Provider-Specific E2E Workflow `.github/workflows/e2e_tests_providers.yaml`	New workflow file added for provider-specific E2E testing with daily schedule and manual trigger. Defines matrix across mode (server, library) and environment (azure, vertexai). Includes credential setup for Azure (token acquisition) and VertexAI (service account key generation), environment-specific configuration loading, docker-compose orchestration for service startup, connectivity validation, and test execution with detailed error reporting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Key areas requiring attention:
- Verify Azure credential handling in new workflow (CLIENT_ID, CLIENT_SECRET, TENANT_ID secret usage)
- Validate VertexAI service account key file generation and cleanup logic (GOOGLE_APPLICATION_CREDENTIALS path handling)
- Confirm matrix configuration correctly iterates across [server, library] and [azure, vertexai] combinations
- Ensure main e2e_tests.yaml remains functional for CI-only testing after removals
- Check that environment variables are properly scoped and not leaking between matrix runs

Possibly related PRs

LCORE-306: setup e2e test env #306 — Modifies the main E2E workflow configuration and overall test environment structure
LCORE-792: Lightspeed core needs to fully support VertexAI LLM provider #924 — Adjusts E2E workflows and VertexAI provider support matrix handling and credential management

Suggested reviewers

tisnik
radofuchs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: moving e2e inference provider tests from every PR to daily scheduled runs, matching the workflow additions in the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

.github/workflows/e2e_tests.yaml (1)

119-124: Condition is now redundant after matrix simplification.

Since matrix.environment is now always "ci", the condition matrix.environment != 'vertexai' will always be true. Consider simplifying by removing the condition entirely, or removing the step if the dummy GCP directory is no longer needed for CI-only runs.

-      - name: Create dummy GCP keys directory
-        if: matrix.environment != 'vertexai'
-        run: |
-          echo "Creating dummy GCP keys directory for non-VertexAI environment..."
-          mkdir -p ./tmp/.gcp-keys-dummy
-          echo "✅ Dummy directory created."
+      - name: Create dummy GCP keys directory
+        run: |
+          echo "Creating dummy GCP keys directory for non-VertexAI environment..."
+          mkdir -p ./tmp/.gcp-keys-dummy
+          echo "✅ Dummy directory created."

.github/workflows/e2e_tests_providers.yaml (2)

29-40: Checkout comments reference PR_TARGET, but workflow doesn't trigger on pull_request_target.

The workflow only triggers on schedule and workflow_dispatch, so the comments about PR_TARGET behavior are misleading. The fallback expressions will work correctly (using github.repository and github.sha), but the comments suggest functionality that doesn't apply to this workflow.

Consider simplifying the checkout step since this workflow only runs on schedule/dispatch:

       - uses: actions/checkout@v4
-        with:
-          # On PR_TARGET → the fork (or same repo) that opened the PR.
-          # On push      → falls back to the current repository.
-          repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}
-
-          # On PR_TARGET → the PR head *commit* (reproducible).
-          # On push      → the pushed commit that triggered the workflow.
-          ref: ${{ github.event.pull_request.head.ref || github.sha }}
-
-          # Don't keep credentials when running untrusted PR code under PR_TARGET.
-          persist-credentials: ${{ github.event_name != 'pull_request_target' }}

244-247: Consider replacing fixed sleep with health check polling.

A fixed 20-second sleep may be too short for slower startups or wasteful for quick ones. Consider polling the health endpoint with retries.

       - name: Wait for services
         run: |
-          echo "Waiting for services to be healthy..."
-          sleep 20
+          echo "Waiting for services to be healthy..."
+          for i in {1..30}; do
+            if curl -sf http://localhost:8080/v1/models > /dev/null 2>&1; then
+              echo "✅ Services are healthy after $((i * 2)) seconds"
+              exit 0
+            fi
+            echo "Waiting... ($i/30)"
+            sleep 2
+          done
+          echo "❌ Services did not become healthy within 60 seconds"
+          exit 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bace7c4 and f49ef75.

📒 Files selected for processing (2)

.github/workflows/e2e_tests.yaml (1 hunks)
.github/workflows/e2e_tests_providers.yaml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: E2E: server mode / vertexai
GitHub Check: E2E: server mode / ci
GitHub Check: E2E: library mode / vertexai
GitHub Check: E2E: library mode / azure
GitHub Check: E2E: server mode / azure
GitHub Check: E2E: library mode / ci
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: build-pr

🔇 Additional comments (5)

.github/workflows/e2e_tests.yaml (1)

13-13: LGTM! Matrix simplified to CI-only environment.

The simplification aligns with the PR objective of moving provider-specific tests to a separate daily workflow.

.github/workflows/e2e_tests_providers.yaml (4)

70-94: LGTM! Azure token acquisition is well-implemented.

The step correctly uses the OAuth 2.0 client credentials flow, validates the response, and securely passes the token via GITHUB_ENV.

96-133: LGTM! VertexAI credentials setup is comprehensive.

Good implementation with base64 decoding, JSON validation, and appropriate permissions for container access.

197-221: LGTM! Service startup steps correctly pass provider-specific credentials.

Both server and library mode steps properly include Azure and VertexAI environment variables, with appropriate error handling for failed startups.

Also applies to: 223-242

190-195: No action needed. The GCP keys paths are correctly configured with proper fallback handling. Docker-compose uses ${GCP_KEYS_PATH:-./tmp/.gcp-keys-dummy} as its mount source, which allows the environment variable to override the default. For VertexAI, GCP_KEYS_PATH=./tmp/.gcp-keys is explicitly set; for non-VertexAI, the variable is unset and docker-compose uses the default ./tmp/.gcp-keys-dummy, which matches what the workflow creates. The dummy directory name is intentional, not a misconfiguration.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-12-16T15:42:41Z

.github/workflows/e2e_tests_providers.yaml

+# .github/workflows/e2e_tests.yml
+name: E2E Inference Provider Tests


⚠️ Potential issue | 🟡 Minor

Filename mismatch in comment.

The comment indicates .github/workflows/e2e_tests.yml but the actual filename is .github/workflows/e2e_tests_providers.yaml. This could cause confusion during maintenance.

-# .github/workflows/e2e_tests.yml +# .github/workflows/e2e_tests_providers.yaml name: E2E Inference Provider Tests

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# .github/workflows/e2e_tests.yml

name: E2E Inference Provider Tests

# .github/workflows/e2e_tests_providers.yaml

name: E2E Inference Provider Tests

🤖 Prompt for AI Agents

In .github/workflows/e2e_tests_providers.yaml around lines 1 to 2, the inline comment references the wrong filename (.github/workflows/e2e_tests.yml) which is misleading; update that comment to reference the actual file name (.github/workflows/e2e_tests_providers.yaml) or remove the redundant filename line so it accurately reflects the workflow file and avoids future confusion.

coderabbitai · 2025-12-16T15:42:41Z

.github/workflows/e2e_tests_providers.yaml

+      - name: Show final configuration
+        run: |
+          echo "=== Configuration Summary ==="
+          echo "Deployment mode: ${{ matrix.mode }}"
+          echo "Environment: ${{ matrix.environment }}"
+          echo "Source config: tests/e2e/configs/run-ci.yaml"
+          echo ""
+          echo "=== Configuration Preview ==="
+          echo "Providers: $(grep -c "provider_id:" run.yaml)"
+          echo "Models: $(grep -c "model_id:" run.yaml)"
+          echo ""
+          echo "=== lightspeed-stack.yaml ==="
+          grep -A 3 "llama_stack:" lightspeed-stack.yaml


⚠️ Potential issue | 🟡 Minor

Bug: Hardcoded config path doesn't match actual environment.

Line 174 hardcodes "run-ci.yaml" but this workflow uses azure and vertexai environments. The actual config file used is run-${{ matrix.environment }}.yaml. This causes misleading log output.

- name: Show final configuration run: | echo "=== Configuration Summary ===" echo "Deployment mode: ${{ matrix.mode }}" echo "Environment: ${{ matrix.environment }}" - echo "Source config: tests/e2e/configs/run-ci.yaml" + echo "Source config: tests/e2e/configs/run-${{ matrix.environment }}.yaml" echo "" echo "=== Configuration Preview ===" echo "Providers: $(grep -c "provider_id:" run.yaml)" echo "Models: $(grep -c "model_id:" run.yaml)" echo "" echo "=== lightspeed-stack.yaml ===" grep -A 3 "llama_stack:" lightspeed-stack.yaml

🤖 Prompt for AI Agents

In .github/workflows/e2e_tests_providers.yaml around lines 169 to 181 the "Source config" echo is hardcoded to "tests/e2e/configs/run-ci.yaml" while the workflow actually uses run-${{ matrix.environment }}.yaml (e.g., azure/vertexai); update the echo to reference the matrix environment variable so the log shows the real config path (e.g., tests/e2e/configs/run-${{ matrix.environment }}.yaml) and ensure any other log lines referencing run.yaml are consistent with the chosen environment variable.

Divide inference provider e2e tests from ci e2e test

f49ef75

are-ces requested review from radofuchs and tisnik December 16, 2025 15:36

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCORE-1076: Run daily e2e tests for inference providers (e.g., Azure, WatsonX, etc..) #927

LCORE-1076: Run daily e2e tests for inference providers (e.g., Azure, WatsonX, etc..) #927

Uh oh!

are-ces commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 16, 2025

Uh oh!

coderabbitai bot Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# .github/workflows/e2e_tests.yml
		name: E2E Inference Provider Tests

LCORE-1076: Run daily e2e tests for inference providers (e.g., Azure, WatsonX, etc..) #927

Are you sure you want to change the base?

LCORE-1076: Run daily e2e tests for inference providers (e.g., Azure, WatsonX, etc..) #927

Uh oh!

Conversation

are-ces commented Dec 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

are-ces commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 16, 2025 •

edited

Loading