-
Notifications
You must be signed in to change notification settings - Fork 1
#77 ptd structure data v2 #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: TemplateV2
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Kathan, this is very comprehensive.
Will go over design choices in a meeting, but just so you can get an idea of what I'm thinking so far. Unfortunately this is too large in it's current state for me to actually get through.
I've only reached up to the dataloader part of the training script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fairly certain you can remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fairly certain you can also remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can definitely remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
| test_size: Proportion of data for testing | ||
| stratify: Whether to use stratified splitting | ||
| random_state: Random state for reproducibility | ||
| feature_engineering: Whether to perform automated feature engineering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good that you included the args here!
| self.stratify = stratify | ||
| self.random_state = random_state | ||
|
|
||
| # Initialize DataLoader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point can you find and replace all "Initialize" with "Initialise" please
| self.random_state = random_state | ||
|
|
||
| # Initialize DataLoader | ||
| self.data_loader = DataLoader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very different to our other projects, I think this is in place of the data importer we use elsewhere?
|
|
||
| # Import utility modules | ||
| from .utils.data_utils import FeatureEngineer, FeatureSelector, DataTransformer, DataProfiler | ||
| from .utils.visualise import DataVisualizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
American spelling
|
|
||
| class DataLoader: | ||
| """ | ||
| Enhanced class to load, clean, visualize and prepare data for machine learning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Enhanced" from what?
This docstring is quite vague.
Linked Issue(s)
Summary of changes
Added the structured data template with a stand alone file for data exploration on the exisitng template V2
Reason for changes
Enhancement