-
Notifications
You must be signed in to change notification settings - Fork 6
Adding Billing Rates to the config.py and rewriting references #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
39e759b to
b1b589e
Compare
openshift_metrics/config.py
Outdated
| RATE_GPU_H100_SU = os.getenv("RATE_GPU_H100_SU", "6.04") | ||
|
|
||
| # Legacy rate (for backward compatibility) | ||
| GPU_A100_RATE = os.getenv("GPU_A100_RATE", "1.803") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this constant used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find it. A text search across your PR doesn't show any reference to GPU_A100_RATE
| SU_NAMES = ["GPUV100", "GPUA100", "GPUA100SXM4", "GPUH100", "CPU"] | ||
| RESOURCE_NAMES = ["vCPUs", "RAM", "GPUs"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. Let's not touch these now for now. The whole SU types and their names will require some more thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimmysway This comment is not addressed. These constants are not accessed anywhere in your PR
openshift_metrics/merge.py
Outdated
| parser.add_argument("--rate-cpu-su", type=Decimal, default=Decimal(RATE_CPU_SU)) | ||
| parser.add_argument( | ||
| "--rate-gpu-v100-su", type=Decimal, default=Decimal(RATE_GPU_V100_SU) | ||
| ) | ||
| parser.add_argument( | ||
| "--rate-gpu-a100sxm4-su", type=Decimal, default=Decimal(RATE_GPU_A100SXM4_SU) | ||
| ) | ||
| parser.add_argument( | ||
| "--rate-gpu-a100-su", type=Decimal, default=Decimal(RATE_GPU_A100_SU) | ||
| ) | ||
| parser.add_argument( | ||
| "--rate-gpu-h100-su", type=Decimal, default=Decimal(RATE_GPU_H100_SU) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the point of this PR is to have configuration done through env vars, I would suggest removing these parser arguments since they are now redundant.
openshift_metrics/config.py
Outdated
| RATE_CPU_SU = os.getenv("RATE_CPU_SU", "0.013") | ||
| RATE_GPU_V100_SU = os.getenv("RATE_GPU_V100_SU", "1.214") | ||
| RATE_GPU_A100SXM4_SU = os.getenv("RATE_GPU_A100SXM4_SU", "2.078") | ||
| RATE_GPU_A100_SU = os.getenv("RATE_GPU_A100_SU", "1.803") | ||
| RATE_GPU_H100_SU = os.getenv("RATE_GPU_H100_SU", "6.04") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naved001 Do we still want keep the old behavior of having the rates None if not provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point. If someone were to forget setting I'd rather the code fail than use some default values for rates.
|
@jimmysway have you had a chance to address the comments in this PR? |
…and rewriting code references. Removed CLI arguments for setting rates
b1b589e to
2967310
Compare
|
Sorry I had them sitting in my local branch. I just merged with upstream |
|
@jimmysway There are merge conflicts in the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few previous comments were not addressed. I've also added a few new ones
| SU_NAMES = ["GPUV100", "GPUA100", "GPUA100SXM4", "GPUH100", "CPU"] | ||
| RESOURCE_NAMES = ["vCPUs", "RAM", "GPUs"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimmysway This comment is not addressed. These constants are not accessed anywhere in your PR
| USE_NERC_RATES_ENV = os.getenv("USE_NERC_RATES") | ||
| USE_NERC_RATES = USE_NERC_RATES_ENV.lower() == "true" if USE_NERC_RATES_ENV else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest just having USE_NERC_RATES and removing USE_NERC_RATES_ENV. If the user does not set USE_NERC_RATES, that should be equivalent to setting the env var to false:
| USE_NERC_RATES_ENV = os.getenv("USE_NERC_RATES") | |
| USE_NERC_RATES = USE_NERC_RATES_ENV.lower() == "true" if USE_NERC_RATES_ENV else None | |
| USE_NERC_RATES = os.getenv("USE_NERC_RATES", "false").lower() == "true" |
Do you have a specific reason for why these two env vars needs to be defined seperately?
| RATE_GPU_H100_SU = os.getenv("RATE_GPU_H100_SU") | ||
|
|
||
| # Legacy rate (for backward compatibility) | ||
| GPU_A100_RATE = os.getenv("GPU_A100_RATE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see where this constant is used
| if USE_NERC_RATES is None: | ||
| raise ValueError( | ||
| "USE_NERC_RATES environment variable must be set to 'true' or 'false'" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following from my comment above, this if clause should also be removed. I think USE_NERC_RATES not being set should be interpreted as False
| report_start_date_dt = datetime.strptime(report_start_date, "%Y-%m-%d") | ||
| report_end_date_dt = datetime.strptime(report_end_date, "%Y-%m-%d") | ||
| ignore_hours = outage_data.get_outages_during( | ||
| report_start_date, report_end_date, cluster_name | ||
| report_start_date_dt, report_end_date_dt, cluster_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current upstream code runs fine with report_start_date and report_end_date. Is there a reason you're converting them to datetime objects here?
| if RATE_CPU_SU is None: | ||
| raise ValueError("RATE_CPU_SU environment variable must be set") | ||
| if RATE_GPU_V100_SU is None: | ||
| raise ValueError("RATE_GPU_V100_SU environment variable must be set") | ||
| if RATE_GPU_A100SXM4_SU is None: | ||
| raise ValueError("RATE_GPU_A100SXM4_SU environment variable must be set") | ||
| if RATE_GPU_A100_SU is None: | ||
| raise ValueError("RATE_GPU_A100_SU environment variable must be set") | ||
| if RATE_GPU_H100_SU is None: | ||
| raise ValueError("RATE_GPU_H100_SU environment variable must be set") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the upstream version of the code, an error is already raised by the Rates dataclass if the SU rates are None. I suppose these a bit more informative, but I would argue they are not entirely necessary and leads to more maintenance burden down the line.
| report_start_date_dt = datetime.strptime(report_start_date, "%Y-%m-%d").replace( | ||
| tzinfo=UTC | ||
| ) | ||
| report_end_date_dt = datetime.strptime(report_end_date, "%Y-%m-%d").replace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to indicate that report_start_date and report_end_date are datetime objects, I would suggest using type annotation instead, which I think would be simpler to interpret than having to rename references and assignements.
| report_start_date_dt = datetime.strptime(report_start_date, "%Y-%m-%d").replace( | |
| tzinfo=UTC | |
| ) | |
| report_end_date_dt = datetime.strptime(report_end_date, "%Y-%m-%d").replace( | |
| report_start_date : datetime = datetime.strptime(report_start_date, "%Y-%m-%d").replace( | |
| tzinfo=UTC | |
| ) | |
| report_end_date : datetime = datetime.strptime(report_end_date, "%Y-%m-%d").replace( | |
| tzinfo=UTC | |
| ) |
Is there another reason why you renamed these variables?
| if report_start_date_dt.month != report_end_date_dt.month: | ||
| logger.warning("The report spans multiple months") | ||
| report_month += " to " + datetime.strftime(report_end_date_dt, "%Y-%m") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if clause is no longer needed. It will cause an error with the call to nerc-rates in get_su_definitions
Part 2 of #144