feat: sync alert jobs #3059

ruslandoga · 2025-12-22T23:20:34Z

ANL-1154 -- Periodically perform a Alerts scheduler sync, every 60 minutes
ANL-1156 -- Fetch the alert definition just before executing the query, to avoid using stale data -- this is done by storing and using alert_query_id instead of %AlertQuery{} struct in the Quantum job.

Ziinc · 2025-12-31T07:45:11Z

lib/logflare/alerting.ex

          :ok | {:error, :not_enabled} | {:error, :below_min_cluster_size}
+  def run_alert(alert_id, :scheduled) when is_integer(alert_id) do
+    # sync the alert job for the next run
+    sync_alert_job(alert_id)


would be slightly better to return the alert job (if present) as an :ok tuple from the sync job, then that would allow us to avoid a 2nd query to re-fetch the alert job.

Which second query do you mean? The run_alert/2 below seems to operate on AlertQuery, not Quantum.Job, and the job only knows about alert_id now (since we don't really need to put %AlertQuery{} in the job anymore if we refetch the alert definition each time).

sync_alert_job would perform 1 db query on the scheduler_node, then perform another db query at get_alert_query_by at line 285, so that would result in 2 db queries being performed.

alternative is to fetch the alert_query first then pass it to sync_alert_job, reversing the order and reducing db queries to 1

Do we actually need to call sync_alert_job which "re-adds" the job from run_query, or is it enough to unschedule the job for alerts that no longer exist, i.e., 4519e14. Or maybe it can just be no-op and sync would then be done periodically (every 60 minutes), but that has another problem: #3059 (comment)

Re-adding the job from run_query/1 feels off for cron jobs somehow, but in light of #3059 (comment) it might be the only way to reliably re-sync active jobs. One possible problem with that approach is when an rare alert gets modified to become less rare, the update won't take place until it runs at least once which might be surprising.

I think I'd like to go with no syncing in run_alert/1 since it already uses the up-to-date alert query thanks to fetching it by id on each run (and being no-op if it doesn't exist), and try and update sync_alert_jobs to be a bit smarter (instead of "full" delete followed by "full" insert) to avoid the problem in #3059 (comment)

Ziinc · 2026-01-07T05:55:32Z

lib/logflare/alerting.ex

          :ok | {:error, :not_enabled} | {:error, :below_min_cluster_size}
+  def run_alert(alert_id, :scheduled) when is_integer(alert_id) do
+    # sync the alert job for the next run
+    sync_alert_job(alert_id)


sync_alert_job would perform 1 db query on the scheduler_node, then perform another db query at get_alert_query_by at line 285, so that would result in 2 db queries being performed.

alternative is to fetch the alert_query first then pass it to sync_alert_job, reversing the order and reducing db queries to 1

Ziinc · 2026-01-07T05:59:18Z

test/logflare/alerting_test.exs

+      # ensure config allows execution
+      old_config = Application.get_env(:logflare, Logflare.Alerting)
+      Application.put_env(:logflare, Logflare.Alerting, min_cluster_size: 0, enabled: true)
+      on_exit(fn -> Application.put_env(:logflare, Logflare.Alerting, old_config) end)


should put this in a common setup

Done in 55fc122

Ziinc · 2026-01-07T05:59:58Z

test/logflare/alerting_test.exs

+    end
+
+    test "run_alert/2 unschedules job if alert is missing", %{user: user} do
+      {:ok, alert} =


this is missing the configuration setup, if config is disabled then this would pass without verifying the logic.

Done in 55fc122

ruslandoga · 2026-01-07T13:44:18Z

lib/logflare/alerting.ex

+        Cluster.Utils.rpc_call(node, func)
+
+      nil ->
+        raise "Alerting scheduler node not found"


Previously that function was able to silently be a no-op if scheduler node wasn't found, I wonder if this raise is a good idea? It would make these cases more noticeable, if they ever happen.

ruslandoga · 2026-01-07T14:24:13Z

config/config.exs

+    ],
+    alerts_scheduler_sync: [
+      run_strategy: Quantum.RunStrategy.Local,
+      schedule: "0 * * * *",


I wonder if it's possible for alert schedule be rarer than alerts_scheduler_sync schedule, less than once in 60 minutes? That would probably mean that those jobs would never run, since in the current do_sync_alert_jobs they would keep getting re-added (which in Quantum doesn't seem (?) to execute the job right away but kind of reschedules and effectively postpones them indefinitely).

Yes there are jobs that run on a minutely basis, or every hour

Hmm that behaviour would not be good. Perhaps separate sync schedule jobs, syncing once a day for hourly schedules and once a minute for jobs less than an hour

ruslandoga · 2026-01-07T14:30:03Z

lib/logflare/alerting.ex

-        AlertsScheduler.delete_job(job.name)
-      end
+      AlertsScheduler.delete_job(to_job_name(alert_id))
+      {:error, :not_found}


It's not an error though, since sync completed as intended. More like {:ok, :removed} or :ok or :removed maybe?

ruslandoga added 2 commits December 18, 2025 21:48

vide coded

5299d1d

another one

8586dc4

ruslandoga changed the title ~~draft: sync altert jobs~~ draft: sync alert jobs Dec 22, 2025

github-actions bot assigned ruslandoga Dec 22, 2025

Merge branch 'main' into sync-alert-jobs

ce68f8a

Ziinc reviewed Dec 31, 2025

View reviewed changes

ruslandoga added 8 commits January 5, 2026 19:05

Merge branch 'main' into sync-alert-jobs

fe79e92

update specs

6fed4f6

ensure alert query has id

8d0800f

more error info

6b05ebe

use is_nil

65a196a

Merge branch 'main' into sync-alert-jobs

5b301b6

add two tests

0f58560

test name change

d2b35c7

ruslandoga marked this pull request as ready for review January 6, 2026 20:07

Merge branch 'main' into sync-alert-jobs

6135d10

ruslandoga changed the title ~~draft: sync alert jobs~~ feat: sync alert jobs Jan 6, 2026

Ziinc requested changes Jan 7, 2026

View reviewed changes

ruslandoga added 2 commits January 7, 2026 16:41

raise on missing scheduler

492095f

refactor tests

55fc122

ruslandoga commented Jan 7, 2026

View reviewed changes

ruslandoga added 2 commits January 7, 2026 17:07

no sync, just delete

4519e14

ensure :ok

21a834f

ruslandoga requested a review from Ziinc January 7, 2026 14:20

ruslandoga commented Jan 7, 2026

View reviewed changes

feat: sync alert jobs #3059

Are you sure you want to change the base?

feat: sync alert jobs #3059

Uh oh!

Conversation

ruslandoga commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruslandoga Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruslandoga Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruslandoga Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruslandoga Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ruslandoga commented Dec 22, 2025 •

edited

Loading

ruslandoga Jan 5, 2026 •

edited

Loading

ruslandoga Jan 7, 2026 •

edited

Loading

ruslandoga Jan 7, 2026 •

edited

Loading

ruslandoga Jan 7, 2026 •

edited

Loading