Skip to content

Conversation

@Ksonar262
Copy link

Linked Issue(s)

Explicitly tag the issue linked to this pull request, if any.
#77

Summary of changes

Added the structured data template with a stand alone file for data exploration on the exisitng template V2

Reason for changes

Enhancement

Copy link
Collaborator

@mikewoodward94 mikewoodward94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kathan, this is very comprehensive.

Will go over design choices in a meeting, but just so you can get an idea of what I'm thinking so far. Unfortunately this is too large in it's current state for me to actually get through.

I've only reached up to the dataloader part of the training script.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly certain you can remove this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly certain you can also remove this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can definitely remove this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

test_size: Proportion of data for testing
stratify: Whether to use stratified splitting
random_state: Random state for reproducibility
feature_engineering: Whether to perform automated feature engineering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good that you included the args here!

self.stratify = stratify
self.random_state = random_state

# Initialize DataLoader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point can you find and replace all "Initialize" with "Initialise" please

self.random_state = random_state

# Initialize DataLoader
self.data_loader = DataLoader(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very different to our other projects, I think this is in place of the data importer we use elsewhere?


# Import utility modules
from .utils.data_utils import FeatureEngineer, FeatureSelector, DataTransformer, DataProfiler
from .utils.visualise import DataVisualizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

American spelling


class DataLoader:
"""
Enhanced class to load, clean, visualize and prepare data for machine learning.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Enhanced" from what?

This docstring is quite vague.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants