Skip to content

Conversation

@sam-watttime
Copy link
Contributor

@sam-watttime sam-watttime commented Jan 22, 2025

This PR introduces a new module: report.py which pulls together some long-standing internal tools that WattTime has used to partially assess the quality of a new MOER model and its constituent forecast model. The goal of this PR is to allow any user to run these tests against our model(s) to increase transparency about the accuracy and trends of our models.

We are using papermill to run a templated jupyter notebook, and save the output as HTML with the "input" cells removed. Eventually we would like to find a place to host these HTML outputs (e.g. github pages, watttime.org website?). Whenever a new model is released, it may become part of our model release plan to generate a new HTML analysis report comparing the new model with its predecessor.

The primary entry point for this HTML generation will be a script (report.py).

Additional plots or statistical tests may be added in the future depending on user requests.

TODO in the PR:

  • Create notebook using only publicly available tooling (e.g. no db connection)
  • Create unit tests for compiling data and creating plots
  • Create unit tests for saving the HTML
  • Create unit tests for parsing CLI args and running script
  • Change "impact" plot behavior to consider timezone of BA (e.g. EV-day charging)

Not in this PR

  • Use the TS Optimizer module in TS MOER Optimizer #31
  • Create a ModelAnalysisFactory to abstract the handling of jobs throughout
  • Make the WattTimeForecast.get_historical_forecast method use a threadpool executor to speed up pulls (may not be desired).

@sam-watttime sam-watttime requested a review from tlarrue January 22, 2025 06:35
@sam-watttime sam-watttime requested a review from xginn8 as a code owner January 22, 2025 06:35
@sam-watttime
Copy link
Contributor Author

@tlarrue would love your initial feedback when you have a minute. I picked up what you started and sought to abstract out as much as possible (e.g. want to be able to run for an arbitrary number of model versions). I also added a threadpool to speed up the forecast pulling, and unit tests.

@tlarrue
Copy link

tlarrue commented Jan 23, 2025

@tlarrue would love your initial feedback when you have a minute. I picked up what you started and sought to abstract out as much as possible (e.g. want to be able to run for an arbitrary number of model versions). I also added a threadpool to speed up the forecast pulling, and unit tests.

I didn't try to run, but I think this is looking great so far. The unit tests are a great call. In general, we're going to have to do a lot of testing here, including manual testing, to make sure the plots are robust to different datasets. As far as the abstractions, I am wondering why we're trying to do this for an arbitrary amount of model versions? What would be the reason for 3+ model versions being shown here? I'm actually thinking we might want to keep it even more strict for the public and do some sort of lookup of the previous model version (if exists).

Also just noting that there is more work to be done in the rank_compare functions, especially if we want to make the logic transparent/easy-to-follow for partners. Notably, on top of the time zones, the functions currently don't "re-optimize" within a charging window (it just actively charges according to the first forecast). We probably don't want to encourage this!

Lastly, just on repo structure. There is an analysis folder. Do you want to place this stuff there and split up more of this code into different files so it's super clear to partners how we're calculating these stats and how they can replicate?

@@ -1,2 +1,3 @@
from watttime.api import *
from watttime.tcy import TCYCalculator
from watttime.tcy import TCYCalculator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This puts TCYCalculator in this file's namespace, permitting import wattttime.TCYCalculator. One disadvantage to doing this is that it broadens the surface area of imports anytime there is a watttime namespace import. What is the advantage to re-namespacing it here?

from watttime.api import *
from watttime.tcy import TCYCalculator
from watttime.tcy import TCYCalculator
from watttime.report import ModelAnalysis
Copy link
Contributor

@jcofield jcofield Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suspicion is that this a workaround to load this into the jupyter notebook. If the jupyter notebook said from watttime import ModelAnalysis then this would just be a convenience feature but instead it says from watttime import report. Is this a workaround? If so, let's discuss a more direct approach and if not, what is the advantage to this re-namespacing?

from watttime import api

# hacky way to allow running this script locally
sys.path.append(str(Path(__file__).parents[1].resolve()))
Copy link
Contributor

@jcofield jcofield Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which import fails if we exclude this? from watttime import api looks like an absolute reference that should work. One potential issue is the namespacing clashing of our package and this sub-package.


@dataclass
class ModelAnalysis:
ba: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

region?

ba: str
model_date: str
signal_type: str
eval_start: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: The api module allows both str and datetime.

start: Union[str, datetime],
IMO, we should leave this as a str and later consider reusing
def _parse_dates(

"""

# Create subplots
unique_bas = set([j.ba for j in jobs])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique_regions?

line=dict(width=2),
showlegend=(i == 1),
),
row=i,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename i to ndx? row_ndx? region_ndx? region_index?

y_min = y_max = 0

# Iterate through each BA and create a bar plot
for i, ba_abbrev in enumerate(unique_bas, start=1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naming is inconsistent without our api

- region: The abbreviation of the region.

Our implied pattern is to always use region unless we are explicitly using region_full_name. As such I suggest we drop the use of both ba and abbrev throughout and use region and variants for sake of avoiding name clashes (i.e. region, this_region, region_, etc...)

Parameters:
df (pd.DataFrame): DataFrame containing the rank correlation data.
Columns: ['abbrev', 'name', '24hr', '48hr', '72hr'].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

region and region_full_name would be more consistent

eval_days=self.eval_days, forecast_sample_days=FORECAST_SAMPLE_DAYS
)

self.jobs = list(
Copy link
Contributor

@jcofield jcofield Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not using job(s) here and instead aligning ModelAnalysis with this variable. The natural alignment for the current draft would be to just name this something more like what ModelAnalysis dataset is: self.analysis_data, self.analyses, etc...

I would reserve the use of the word job for (i) uses of a parallelization package (ii) deeper down in this package (see comment there)


for i, ba_abbrev in enumerate(unique_bas, start=1):
ba_abbrev = ba_abbrev.upper()
_jobs = [j for j in jobs if j.ba == ba_abbrev]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you rename jobs to something more like data or analysis_data, then you do not need this underscore. Then 'job' is an okay variable name here if you document it well, i.e. a plotting job on each of the analysis datasets.



@dataclass
class ModelAnalysis:
Copy link
Contributor

@jcofield jcofield Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This object and its name are okay as a first pass, but I think there is room to be highly specific about what our object(s) are meant to do. This object is doing these things:

  1. Lazy loading raw api data
  2. Serving as a convenience wrapper around api data
  3. Merging two historical and forecast data to ease analysis
  4. It is not meant to implement analysis (the name may encourage this later!)

Maybe it would be better to separate a lazyloader object and a ForecastDataComparer. Then the lazyloader object can be used for historical-only analysis as well as to power the ForecastDataComparer, which itself does not have to be concerned with the api?

As an easier iteration to keep this PR in motion, I would also support a simple renaming to prevent the interpretation of this as an object meant to implement analyses.

@sam-watttime sam-watttime mentioned this pull request Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants