Consolidate and formalize model report for feature analysis #401

Damonamajor · 2025-10-28T14:36:06Z

This merges two existing EI issues to create a consolidated QC testing framework to run once each year upon final upload of data. It uses 4 reports, 3 derived from this and 1 from this.

The feature missingness creates a series of thresholds to test if values change a significant amount. For example, all acs values should not match, but the percentages should not change by more than 10%.

I also refactored the data_changes script from the previous EI issue so it runs in about half the time.

Because the file takes a long time to run, and it has limited impact once data has been finalized. It should be run manually, not integrated in the model pipeline.

In total the run time appears to be about 2 hours for the actual script, and for some reason, the rendering takes approximately 1 hour.

output is too large to upload, so it is stored here: O:\CCAODATA\tmp\model_features.html

Damonamajor · 2025-10-28T15:31:30Z

reports/quality_control/data_changes.qmd

+)
+
+# Sample for readability
+if (nrow(unmatched_wide) > 10000) {


If everything is included but data hasen't been finalized i.e. school data not being uploaded, the report will just break.
Do we want to include this, since we should be running it after things like that have been resolved? And, I don't think we will look at more than 10,000 observations.

We also randomize so that we don't get lower numbered pins.

…-for-feature-analysis

Damonamajor · 2025-10-29T16:34:30Z

@jeancochrane
One other thought is, do we want a check for empty strings or anything like that in the data_changes.qmd file?

Damonamajor · 2025-12-19T15:02:57Z

new file is in the same location called model_features.qmd representing the new folder/file name
O:\CCAODATA\tmp

…analysis' of github.com:ccao-data/model-res-avm into 393-consolidate-and-formalize-model-report-for-feature-analysis

jeancochrane

This is just about ready to go! A few small suggestions below.

One higher-level suggestion that we should tackle before we merge: I realized while messing around with the other reports that we typically use the renv environment in the root of the repo for managing the R dependencies that our reports rely on. I checked and it seems like that root renv environment already has all the dependencies that these new reports need. As such, I think we can delete the following renv and R artifacts from this branch, and just use the root renv environment instead:

reports/model_features/renv/*
reports/model_features/.Rprofile
reports/model_features/model_features.Rproj
reports/model_features/renv.lock

reports/model_features/model_features.qmd

reports/README.md

reports/model_features/model_features.qmd

reports/model_features/_baseline_query_data.R

reports/model_features/data_changes.qmd

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

…analysis' of github.com:ccao-data/model-res-avm into 393-consolidate-and-formalize-model-report-for-feature-analysis

Damonamajor · 2025-12-30T21:29:58Z

I rendered the doc, and nothing seems to cause issues, but I'll make sure it finishes fine as well.

jeancochrane

Just kicked off one final render to check to make sure the output still looks good after our most recent changes, but in the meantime, I identified a couple minor issues!

reports/model_features/baseline_categorical.qmd

jeancochrane · 2026-01-02T20:58:36Z

reports/model_features/_baseline_query_data.R

+library(arrow)
+library(data.table)
+library(dplyr)
+library(DT)
+library(ggplot2)
+library(glue)
+library(kableExtra)
+library(knitr)
+library(leaflet)
+library(noctua)
+library(stringr)
+library(tidyr)
+
+source("_utils.R")


[Suggestion, optional] Sorry for yet again recommending a change to the structure of _baseline_query_data.R and _utils.R, but I noticed while testing document rendering that the library() calls in this script are interfering with the caching implemented in the Quarto docs that source this script. In short, Quarto can't property cache library() calls, and since the results of _baseline_query_data.R are cached in all the Quarto docs that source it, those docs will fail during render if _baseline_query_data.R is cached, because package functions are not available in the document namespace. (Happy to talk this through in more detail if it's not clear.)

There are two ways we could deal with this issue:

Stop caching _baseline_query_data.R in downstream data consumers, because the script already implements a form of caching using the if (!exists(<variable>)) conditional branches

This would be a very easy change to make, but it's suboptimal in that _baseline_query_data.R doesn't implement true caching -- it will always need to run all the queries the first time it is called in the context of a Quarto doc, even if it hasn't changed

Keep caching _baseline_query_data.R in downstream data consumers, but move all the library() calls from _baseline_query_data.R to _utils.R, and switch things up so that we source("_utils.R") in the context of the Quarto docs, not in the context of _baseline_query_data.R

This is a slightly more complicated change, but it would preserve true caching in the docs

I would recommend approach 2, but ultimately I'm fine with either one! I tested approach 2 and it seems to work well so far.

Does this mean we also remove #| cache-file-2: !expr rlang::hash_file("_utils.R") from each file as well?
@jeancochrane

Yes @Damonamajor!

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

Damonamajor added 9 commits October 16, 2025 15:16

initial push

c55c790

initial push

f59d7a6

working draft

1c48cec

pre data-changes edits

13c4021

update data_changes

eebc13c

remove headers

774c949

Add test script

5b0a32d

text changes

1d923ec

delete dist by price

2c7f093

Damonamajor self-assigned this Oct 28, 2025

Damonamajor linked an issue Oct 28, 2025 that may be closed by this pull request

Consolidate and formalize model report for feature analysis #393

Open

sort tolerance order

ccf4ca4

Damonamajor commented Oct 28, 2025

View reviewed changes

Damonamajor and others added 6 commits October 28, 2025 15:52

remove outliers

1ae7672

remove missingness by geo

c9949ef

remove missingness by geo

f3bc6bb

remove missingness by geoid

b347bb4

Merge branch 'master' into 393-consolidate-and-formalize-model-report…

a949901

…-for-feature-analysis

Delete old structure

f7f0015

Damonamajor marked this pull request as ready for review October 28, 2025 17:58

Damonamajor requested review from jeancochrane and wrridgeway as code owners October 28, 2025 17:58

Damonamajor added 7 commits October 28, 2025 18:02

typo

de2c672

remove authors

dcfcf1d

re-order run_id's and fix rbind to work with mismatched rows

30fe499

switch all of to any of

30e430d

update tolerances

adebf23

remove header from baseline categorical

7bda4e6

update tolerances

1672878

Damonamajor added 4 commits December 15, 2025 22:24

rename

b213804

re-add baseline query

a250eef

re-order file structures

4716265

remove duplicated query

5b832ba

Damonamajor requested a review from jeancochrane December 19, 2025 03:58

Update README.md

8c23f03

Damonamajor added 6 commits December 19, 2025 18:04

remove caching from setup script

6198627

Merge branch '393-consolidate-and-formalize-model-report-for-feature-…

e69d90c

…analysis' of github.com:ccao-data/model-res-avm into 393-consolidate-and-formalize-model-report-for-feature-analysis

standardize to baseline_assessment_data

f12b7c3

rename to assessment_data

53ebf8d

remove embed resources

dda26e2

fix html format header

f7a7997

jeancochrane requested changes Dec 30, 2025

View reviewed changes

Damonamajor and others added 8 commits December 30, 2025 15:15

Update reports/model_features/model_features.qmd

76537aa

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

Update reports/model_features/_baseline_query_data.R

53441bf

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

Update reports/README.md

be156a4

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

Update reports/model_features/data_changes.qmd

f383d47

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

Update reports/model_features/model_features.qmd

59e2cac

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

Update reports/model_features/model_features.qmd

d4bd876

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

delete Rprofile things

9c72768

Merge branch '393-consolidate-and-formalize-model-report-for-feature-…

6445dea

…analysis' of github.com:ccao-data/model-res-avm into 393-consolidate-and-formalize-model-report-for-feature-analysis

Damonamajor requested a review from jeancochrane December 30, 2025 21:29

jeancochrane reviewed Jan 2, 2026

View reviewed changes

Damonamajor and others added 5 commits January 4, 2026 20:26

Update reports/model_features/baseline_categorical.qmd

7754c98

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

lintr

6654816

Move libraries

b876e12

remove caching and rename files

d1241d2

add source("Utils.R") to subreports

ee9696d

Consolidate and formalize model report for feature analysis #401

Are you sure you want to change the base?

Consolidate and formalize model report for feature analysis #401

Uh oh!

Conversation

Damonamajor commented Oct 28, 2025 • edited by jeancochrane Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Damonamajor Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Damonamajor commented Oct 29, 2025

Uh oh!

Damonamajor commented Dec 19, 2025

Uh oh!

jeancochrane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Damonamajor commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeancochrane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeancochrane Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Damonamajor Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

jeancochrane Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Damonamajor commented Oct 28, 2025 •

edited by jeancochrane

Loading

Damonamajor Oct 28, 2025 •

edited

Loading

Damonamajor commented Dec 30, 2025 •

edited

Loading

jeancochrane Jan 2, 2026 •

edited

Loading