Skip to content
This repository was archived by the owner on Dec 10, 2025. It is now read-only.
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions github-metrics/.drone.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
kind: pipeline
type: kubernetes
name: github-metrics-test

trigger:
branch:
- main
event:
- push # Runs when PR is merged to main
- pull_request # Also runs on PRs for early feedback

steps:
- name: check-changes
image: alpine/git
commands:
- |
# Check if any files in github-metrics/ directory changed
if [ "$DRONE_BUILD_EVENT" = "pull_request" ]; then
# For PRs, compare against target branch
git diff --name-only origin/$DRONE_TARGET_BRANCH...HEAD | grep -q "^github-metrics/" && echo "Changes detected" || (echo "No changes in github-metrics/, skipping" && exit 78)
else
# For pushes, compare against previous commit
git diff --name-only $DRONE_COMMIT_BEFORE $DRONE_COMMIT_AFTER | grep -q "^github-metrics/" && echo "Changes detected" || (echo "No changes in github-metrics/, skipping" && exit 78)
fi

- name: test
image: node:20-alpine
commands:
- cd github-metrics
- npm ci
- node --version
- npm --version
- echo "Validating package.json and dependencies..."

---
depends_on: ['github-metrics-test']
kind: pipeline
type: kubernetes
name: github-metrics-build

trigger:
branch:
- main
event:
- push
- tag

steps:
# Builds and publishes Docker image for production
- name: publish-production
image: plugins/kaniko-ecr
settings:
create_repository: true
registry: 795250896452.dkr.ecr.us-east-1.amazonaws.com
repo: docs/github-metrics
tags:
- git-${DRONE_COMMIT_SHA:0:7}
- latest
access_key:
from_secret: ecr_access_key
secret_key:
from_secret: ecr_secret_key
context: github-metrics
dockerfile: github-metrics/Dockerfile

---
depends_on: ['github-metrics-build']
kind: pipeline
type: kubernetes
name: github-metrics-deploy

trigger:
branch:
- main
event:
- push
- tag

steps:
# Deploys cronjob to production using Helm
- name: deploy-production
image: quay.io/mongodb/drone-helm:v3
settings:
chart: mongodb/cronjobs
chart_version: 1.21.2
add_repos: [mongodb=https://10gen.github.io/helm-charts]
namespace: docs
release: github-metrics
values: image.tag=git-${DRONE_COMMIT_SHA:0:7},image.repository=795250896452.dkr.ecr.us-east-1.amazonaws.com/docs/github-metrics
values_files: ['github-metrics/cronjobs.yml']
api_server: https://api.prod.corp.mongodb.com
kubernetes_token:
from_secret: kubernetes_token
28 changes: 28 additions & 0 deletions github-metrics/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
FROM node:20-alpine

# Set working directory
WORKDIR /app

# Copy package files first (for better Docker layer caching)
COPY package.json package-lock.json ./

# Install dependencies (use ci for reproducible builds)
RUN npm ci --only=production

# Copy the rest of the application files
COPY . .

# Create a non-root user for security best practices
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001 && \
chown -R nodejs:nodejs /app

# Switch to non-root user
USER nodejs

# Set NODE_ENV to production
ENV NODE_ENV=production

# Command to run the application
# This will be executed by the Kubernetes CronJob
CMD ["node", "index.js"]
186 changes: 186 additions & 0 deletions github-metrics/SCHEDULING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# GitHub Metrics Collection Scheduling

## Overview

The GitHub metrics collection job is designed to run **every 14 days** to collect repository metrics from GitHub. This interval ensures we capture metrics within GitHub's 14-day data retention window while avoiding unnecessary API calls.

## How It Works

### File-Based Tracking

The system uses a file-based approach to track when the job last ran successfully:

1. **Persistent Storage**: A Kubernetes persistent volume is mounted at `/data` to store the last run timestamp
2. **Last Run File**: The file `/data/last-run.json` contains the timestamp of the last successful run
3. **Automatic Checking**: Each time the cronjob executes, it checks if 14 days have passed since the last run
4. **Skip Logic**: If less than 14 days have passed, the job exits successfully without collecting metrics

### Cronjob Schedule

The Kubernetes cronjob is configured to run **every Sunday at 8am UTC** (`0 8 * * 0`).

- The job runs weekly, but the application logic determines whether to actually collect metrics
- This approach is more reliable than trying to schedule exactly every 14 days with cron syntax
- If a run is missed (e.g., due to maintenance), the next weekly run will catch it

### Example Timeline

```
Week 1, Sunday: Job runs → Collects metrics → Records timestamp
Week 2, Sunday: Job runs → Checks timestamp → Skips (only 7 days)
Week 3, Sunday: Job runs → Checks timestamp → Collects metrics (14+ days) → Records timestamp
Week 4, Sunday: Job runs → Checks timestamp → Skips (only 7 days)
Week 5, Sunday: Job runs → Checks timestamp → Collects metrics (14+ days) → Records timestamp
```

## Implementation Details

### Last Run Tracker Module

The `last-run-tracker.js` module provides three main functions:

#### `shouldRunMetricsCollection()`
Checks if 14 days have passed since the last run.

**Returns:**
```javascript
{
shouldRun: boolean, // true if 14+ days have passed
lastRun: Date|null, // timestamp of last run
daysSinceLastRun: number|null // days since last run
}
```

#### `recordSuccessfulRun()`
Records the current timestamp as the last successful run.

#### `getLastRunInfo()`
Gets information about the last run without checking if we should run (useful for debugging).

### Last Run File Format

The `/data/last-run.json` file contains:

```json
{
"lastRun": "2025-12-03T08:00:00.000Z",
"timestamp": 1733212800000
}
```

## Configuration

### Cronjob Configuration (`cronjobs.yml`)

```yaml
persistence:
enabled: true
storageClass: "standard"
accessMode: ReadWriteOnce
size: 1Gi
mountPath: /data

cronJobs:
- name: github-metrics-collection
schedule: "0 8 * * 0" # Every Sunday at 8am UTC
command:
- node
- index.js
```

### Key Configuration Points

- **Persistent Volume**: Required to maintain state between cronjob executions
- **Mount Path**: `/data` - where the last run file is stored
- **Schedule**: Weekly execution allows the application to decide when to run
- **Exit Code**: The job exits with code 0 (success) even when skipping, so Kubernetes doesn't mark it as failed

## Monitoring

### Checking Last Run Status

To check when the job last ran, you can:

1. **View the last-run file** in the persistent volume:
```bash
kubectl exec -it <pod-name> -n docs -- cat /data/last-run.json
```

2. **Check job logs** for skip messages:
```bash
kubectl logs -n docs -l job-name=github-metrics-collection --tail=50
```

### Expected Log Output

**When running:**
```
No previous run detected. This is the first run.
Starting metrics collection...
✓ Metrics collection completed successfully
✓ Recorded successful run at 2025-12-03T08:00:00.000Z
```

**When skipping:**
```
Last run: 2025-12-03T08:00:00.000Z
Days since last run: 7
✗ Only 7 days have passed. Skipping metrics collection.
Next run should occur in approximately 7 days.
Skipping metrics collection - not enough time has passed since last run.
Last run was 7 days ago on 2025-12-03T08:00:00.000Z
```

## Troubleshooting

### Job Never Runs

If the job keeps skipping even though 14+ days have passed:

1. Check the last-run file timestamp:
```bash
kubectl exec -it <pod-name> -n docs -- cat /data/last-run.json
```

2. Manually delete the file to force a run:
```bash
kubectl exec -it <pod-name> -n docs -- rm /data/last-run.json
```

### Persistent Volume Issues

If the persistent volume isn't working:

1. Check if the PVC is bound:
```bash
kubectl get pvc -n docs
```

2. Check pod events for volume mount errors:
```bash
kubectl describe pod <pod-name> -n docs
```

### Force a Run

To force the job to run immediately regardless of the last run time:

1. Delete the last-run file:
```bash
kubectl exec -it <pod-name> -n docs -- rm /data/last-run.json
```

2. Manually trigger the cronjob:
```bash
kubectl create job --from=cronjob/github-metrics-collection manual-run-$(date +%s) -n docs
```

## Benefits of This Approach

1. **Reliable 14-day interval**: Ensures metrics are collected every 14 days without complex cron syntax
2. **Resilient to missed runs**: If a run is missed, the next execution will catch it
3. **Simple to monitor**: Clear log messages indicate whether the job ran or skipped
4. **Easy to override**: Can force a run by deleting the last-run file
5. **Kubernetes-native**: Uses persistent volumes for state management
6. **No external dependencies**: Doesn't require a database or external service to track state

44 changes: 44 additions & 0 deletions github-metrics/cronjobs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
# `image` can be skipped if the values are being set in your .drone.yml file
image:
repository: 795250896452.dkr.ecr.us-east-1.amazonaws.com/docs/code-example-tooling
tag: latest

# Service account configuration for IRSA (IAM Roles for Service Accounts)
serviceAccount:
enabled: true
irsa:
accountId: "216656347858"
roleName: devDocsCodeToolingServiceRole

# global secrets are references to k8s Secrets
globalEnvSecrets:
GITHUB_TOKEN: github-token
ATLAS_CONNECTION_STRING: atlas-connection-string

# Persistent volume for storing last run timestamp
# This allows the cronjob to track when it last ran successfully
persistence:
enabled: true
storageClass: "standard"
accessMode: ReadWriteOnce
size: 1Gi
mountPath: /data

cronJobs:
- name: github-metrics-collection
# Run every Sunday at 8am UTC
# The job will check if 14 days have passed since the last run
# and skip execution if not enough time has elapsed
schedule: "0 8 * * 0"
command:
- node
- index.js
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi

Loading