WIP: Signal Distribution and Forecast Accuracy Analysis Report #33

sam-watttime · 2025-01-22T06:35:56Z

This PR introduces a new module: report.py which pulls together some long-standing internal tools that WattTime has used to partially assess the quality of a new MOER model and its constituent forecast model. The goal of this PR is to allow any user to run these tests against our model(s) to increase transparency about the accuracy and trends of our models.

We are using papermill to run a templated jupyter notebook, and save the output as HTML with the "input" cells removed. Eventually we would like to find a place to host these HTML outputs (e.g. github pages, watttime.org website?). Whenever a new model is released, it may become part of our model release plan to generate a new HTML analysis report comparing the new model with its predecessor.

The primary entry point for this HTML generation will be a script (report.py).

Additional plots or statistical tests may be added in the future depending on user requests.

TODO in the PR:

Create notebook using only publicly available tooling (e.g. no db connection)
Create unit tests for compiling data and creating plots
Create unit tests for saving the HTML
Create unit tests for parsing CLI args and running script
Change "impact" plot behavior to consider timezone of BA (e.g. EV-day charging)

Not in this PR

Use the TS Optimizer module in TS MOER Optimizer #31
Create a ModelAnalysisFactory to abstract the handling of jobs throughout
Make the WattTimeForecast.get_historical_forecast method use a threadpool executor to speed up pulls (may not be desired).

sam-watttime · 2025-01-22T06:37:26Z

@tlarrue would love your initial feedback when you have a minute. I picked up what you started and sought to abstract out as much as possible (e.g. want to be able to run for an arbitrary number of model versions). I also added a threadpool to speed up the forecast pulling, and unit tests.

tlarrue · 2025-01-23T15:20:03Z

@tlarrue would love your initial feedback when you have a minute. I picked up what you started and sought to abstract out as much as possible (e.g. want to be able to run for an arbitrary number of model versions). I also added a threadpool to speed up the forecast pulling, and unit tests.

I didn't try to run, but I think this is looking great so far. The unit tests are a great call. In general, we're going to have to do a lot of testing here, including manual testing, to make sure the plots are robust to different datasets. As far as the abstractions, I am wondering why we're trying to do this for an arbitrary amount of model versions? What would be the reason for 3+ model versions being shown here? I'm actually thinking we might want to keep it even more strict for the public and do some sort of lookup of the previous model version (if exists).

Also just noting that there is more work to be done in the rank_compare functions, especially if we want to make the logic transparent/easy-to-follow for partners. Notably, on top of the time zones, the functions currently don't "re-optimize" within a charging window (it just actively charges according to the first forecast). We probably don't want to encourage this!

Lastly, just on repo structure. There is an analysis folder. Do you want to place this stuff there and split up more of this code into different files so it's super clear to partners how we're calculating these stats and how they can replicate?

jcofield · 2025-03-08T18:39:45Z

watttime/__init__.py

@@ -1,2 +1,3 @@
 from watttime.api import *
-from watttime.tcy import TCYCalculator
+from watttime.tcy import TCYCalculator


This puts TCYCalculator in this file's namespace, permitting import wattttime.TCYCalculator. One disadvantage to doing this is that it broadens the surface area of imports anytime there is a watttime namespace import. What is the advantage to re-namespacing it here?

jcofield · 2025-03-08T18:44:12Z

watttime/__init__.py

 from watttime.api import *
-from watttime.tcy import TCYCalculator
+from watttime.tcy import TCYCalculator
+from watttime.report import ModelAnalysis


My suspicion is that this a workaround to load this into the jupyter notebook. If the jupyter notebook said from watttime import ModelAnalysis then this would just be a convenience feature but instead it says from watttime import report. Is this a workaround? If so, let's discuss a more direct approach and if not, what is the advantage to this re-namespacing?

jcofield · 2025-03-08T18:47:28Z

watttime/report.py

+from watttime import api
+
+# hacky way to allow running this script locally
+sys.path.append(str(Path(__file__).parents[1].resolve()))


Which import fails if we exclude this? from watttime import api looks like an absolute reference that should work. One potential issue is the namespacing clashing of our package and this sub-package.

jcofield · 2025-03-08T18:50:59Z

watttime/report.py

+
+@dataclass
+class ModelAnalysis:
+    ba: str


jcofield · 2025-03-08T18:56:48Z

watttime/report.py

+    ba: str
+    model_date: str
+    signal_type: str
+    eval_start: str


nit-pick: The api module allows both str and datetime.

watttime-python-client/watttime/api.py

Line 185 in 6907a27

start: Union[str, datetime],

IMO, we should leave this as a str and later consider reusing

watttime-python-client/watttime/api.py

Line 59 in 6907a27

def _parse_dates(

jcofield · 2025-03-08T18:58:31Z

watttime/report.py

+    """
+
+    # Create subplots
+    unique_bas = set([j.ba for j in jobs])


unique_regions?

jcofield · 2025-03-08T19:02:38Z

watttime/report.py

+                    line=dict(width=2),
+                    showlegend=(i == 1),
+                ),
+                row=i,


Rename i to ndx? row_ndx? region_ndx? region_index?

jcofield · 2025-03-08T19:17:58Z

watttime/report.py

+    y_min = y_max = 0
+
+    # Iterate through each BA and create a bar plot
+    for i, ba_abbrev in enumerate(unique_bas, start=1):


This naming is inconsistent without our api

watttime-python-client/watttime/api.py

Line 343 in 6907a27

- region: The abbreviation of the region.

Our implied pattern is to always use region unless we are explicitly using region_full_name. As such I suggest we drop the use of both ba and abbrev throughout and use region and variants for sake of avoiding name clashes (i.e. region, this_region, region_, etc...)

jcofield · 2025-03-08T19:18:55Z

watttime/report.py

+
+    Parameters:
+        df (pd.DataFrame): DataFrame containing the rank correlation data.
+                           Columns: ['abbrev', 'name', '24hr', '48hr', '72hr'].


region and region_full_name would be more consistent

jcofield · 2025-03-08T19:36:09Z

tests/test_report.py

+            eval_days=self.eval_days, forecast_sample_days=FORECAST_SAMPLE_DAYS
+        )
+
+        self.jobs = list(


I suggest not using job(s) here and instead aligning ModelAnalysis with this variable. The natural alignment for the current draft would be to just name this something more like what ModelAnalysis dataset is: self.analysis_data, self.analyses, etc...

I would reserve the use of the word job for (i) uses of a parallelization package (ii) deeper down in this package (see comment there)

jcofield · 2025-03-08T19:38:22Z

watttime/report.py

+
+    for i, ba_abbrev in enumerate(unique_bas, start=1):
+        ba_abbrev = ba_abbrev.upper()
+        _jobs = [j for j in jobs if j.ba == ba_abbrev]


Once you rename jobs to something more like data or analysis_data, then you do not need this underscore. Then 'job' is an okay variable name here if you document it well, i.e. a plotting job on each of the analysis datasets.

jcofield · 2025-03-08T19:58:51Z

watttime/report.py

+
+
+@dataclass
+class ModelAnalysis:


This object and its name are okay as a first pass, but I think there is room to be highly specific about what our object(s) are meant to do. This object is doing these things:

Lazy loading raw api data

Serving as a convenience wrapper around api data

Merging two historical and forecast data to ease analysis

It is not meant to implement analysis (the name may encourage this later!)

Maybe it would be better to separate a lazyloader object and a ForecastDataComparer. Then the lazyloader object can be used for historical-only analysis as well as to power the ForecastDataComparer, which itself does not have to be concerned with the api?

As an easier iteration to keep this PR in motion, I would also support a simple renaming to prevent the interpretation of this as an object meant to implement analyses.

sam-watttime added 6 commits January 21, 2025 23:22

add report module and notebook

137b1fa

add tests for report module

7278440

add .gitkeep for analysis dir

f5ce3f0

update watttime module init

f1d3f21

add extras to setup.py

59c1d8c

update .gitignore to exclude contents of analysis dir

ed2d362

sam-watttime requested a review from tlarrue January 22, 2025 06:35

sam-watttime requested a review from xginn8 as a code owner January 22, 2025 06:35

make report.py a script

c94c399

jcofield reviewed Mar 8, 2025

View reviewed changes

watttime/report.py

@dataclass

class ModelAnalysis:

ba: str

Copy link

Contributor

jcofield Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

region?

jcofield reviewed Mar 8, 2025

View reviewed changes

watttime/report.py

"""

# Create subplots

unique_bas = set([j.ba for j in jobs])

Copy link

Contributor

jcofield Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique_regions?

jcofield reviewed Mar 8, 2025

View reviewed changes

watttime/report.py

line=dict(width=2),

showlegend=(i == 1),

),

row=i,

Copy link

Contributor

jcofield Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename i to ndx? row_ndx? region_ndx? region_index?

jcofield reviewed Mar 8, 2025

View reviewed changes

sam-watttime closed this Mar 10, 2025

sam-watttime mentioned this pull request Mar 12, 2025

Report Card #41

Closed

WIP: Signal Distribution and Forecast Accuracy Analysis Report #33

WIP: Signal Distribution and Forecast Accuracy Analysis Report #33

Uh oh!

Conversation

sam-watttime commented Jan 22, 2025 • edited by jcofield Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO in the PR:

Not in this PR

Uh oh!

sam-watttime commented Jan 22, 2025

Uh oh!

tlarrue commented Jan 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcofield Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcofield Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcofield Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcofield Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sam-watttime commented Jan 22, 2025 •

edited by jcofield

Loading

jcofield Mar 8, 2025 •

edited

Loading

jcofield Mar 8, 2025 •

edited

Loading

jcofield Mar 8, 2025 •

edited

Loading

jcofield Mar 8, 2025 •

edited

Loading