diff --git a/docs/user_guides/fs/data_source/usage.md b/docs/user_guides/fs/data_source/usage.md index d17957dbf..df660faa1 100644 --- a/docs/user_guides/fs/data_source/usage.md +++ b/docs/user_guides/fs/data_source/usage.md @@ -22,7 +22,7 @@ We retrieve a data source simply by its unique name. project = hopsworks.login() feature_store = project.get_feature_store() # Retrieve data source - connector = feature_store.get_storage_connector('data_source_name') + ds = feature_store.get_data_source('data_source_name') ``` === "Scala" @@ -119,17 +119,20 @@ Another important aspect of a data source is its ability to facilitate creation The `Connector API` relies on data sources behind the scenes to integrate with external datasource. This enables seamless integration with any data source as long as there is a data source defined. -To create an external feature group, we use the `create_external_feature_group` API, also known as `Connector API`, and simply pass the data source created before to the `storage_connector` argument. + +To create an external feature group, we use the `create_external_feature_group` API, also known as `Connector API`, and simply pass the data source created before to the `data_source` argument. Depending on the external source, we should set either the `query` argument for data warehouse based sources, or the `path` and `data_format` arguments for data lake based sources, similar to reading into dataframes as explained in above section. -Example for any data warehouse/SQL based external sources, we set the desired SQL to `query` argument, and set the `storage_connector` argument to the data source object of desired data source. +Example for any data warehouse/SQL based external sources, we set the desired SQL to `query` argument, and set the `data_source` argument to the data source object of desired data source. + === "PySpark" ```python + ds.query="SELECT * FROM TABLE" + fg = feature_store.create_external_feature_group(name="sales", version=1 description="Physical shop sales features", - query="SELECT * FROM TABLE", - storage_connector=connector, + data_source = ds, primary_key=['ss_store_sk'], event_time='sale_date' ) @@ -141,8 +144,8 @@ For more information on `Connector API`, read detailed guide about [external fea ## Writing Training Data -Data Sources are also used while writing training data to external sources. -While calling the [Feature View](../../../concepts/fs/feature_view/fv_overview.md) API `create_training_data` , we can pass the `storage_connector` argument which is necessary to materialise the data to external sources, as shown below. +Data Sources are also used while writing training data to external sources. While calling the +While calling the [Feature View](../../../concepts/fs/feature_view/fv_overview.md) API `create_training_data` , we can pass the `data_source` argument which is necessary to materialise the data to external sources, as shown below. === "PySpark" ```python @@ -151,7 +154,7 @@ While calling the [Feature View](../../../concepts/fs/feature_view/fv_overview.m description = 'describe training data', data_format = 'spark_data_format', # e.g., data_format = "parquet" or data_format = "csv" write_options = {"wait_for_job": False}, - storage_connector = connector + data_source = ds ) ``` diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md index 51c630397..a731d815e 100644 --- a/docs/user_guides/fs/feature_group/create.md +++ b/docs/user_guides/fs/feature_group/create.md @@ -108,7 +108,7 @@ The currently support values are "HUDI", "DELTA", "NONE" (which defaults to Parq ##### Data Source -During the creation of a feature group, it is possible to define the `storage_connector` parameter, this allows for management of offline data in the desired table format outside the Hopsworks cluster. +During the creation of a feature group, it is possible to define the `data_source` parameter, this allows for management of offline data in the desired table format outside the Hopsworks cluster. Currently, [S3](../data_source/creation/s3.md) and [GCS](../data_source/creation/gcs.md) connectors and "DELTA" `time_travel_format` format is supported. ##### Online Table Configuration diff --git a/docs/user_guides/fs/feature_group/create_external.md b/docs/user_guides/fs/feature_group/create_external.md index e7f7d36e4..e0a779a5e 100644 --- a/docs/user_guides/fs/feature_group/create_external.md +++ b/docs/user_guides/fs/feature_group/create_external.md @@ -22,7 +22,7 @@ To create an external feature group using the HSFS APIs you need to provide an e === "Python" ```python - connector = feature_store.get_storage_connector("data_source_name") + ds = feature_store.get_data_source("data_source_name") ``` ### Create an External Feature Group @@ -52,7 +52,7 @@ Once you have defined the metadata, you can version=1, description="Physical shop sales features", query=query, - storage_connector=connector, + data_source=ds, primary_key=['ss_store_sk'], event_time='sale_date' ) @@ -69,7 +69,7 @@ Once you have defined the metadata, you can version=1, description="Physical shop sales features", data_format="parquet", - storage_connector=connector, + data_source=ds, primary_key=['ss_store_sk'], event_time='sale_date' ) @@ -112,7 +112,7 @@ For an external feature group to be available online, during the creation of the version=1, description="Physical shop sales features", query=query, - storage_connector=connector, + data_source=ds, primary_key=['ss_store_sk'], event_time='sale_date', online_enabled=True) diff --git a/docs/user_guides/fs/provenance/provenance.md b/docs/user_guides/fs/provenance/provenance.md index 272cb10bb..a179428bc 100644 --- a/docs/user_guides/fs/provenance/provenance.md +++ b/docs/user_guides/fs/provenance/provenance.md @@ -35,14 +35,14 @@ You can inspect the relationship between data sources and feature groups using t ```python # Retrieve the data source - snowflake_sc = fs.get_storage_connector("snowflake_sc") + ds = fs.get_data_source("snowflake_sc") + ds.query = "SELECT * FROM USER_PROFILES" # Create the user profiles feature group user_profiles_fg = fs.create_external_feature_group( name="user_profiles", version=1, - storage_connector=snowflake_sc, - query="SELECT * FROM USER_PROFILES" + data_source=ds ) user_profiles_fg.save() ``` @@ -50,13 +50,13 @@ You can inspect the relationship between data sources and feature groups using t ### Step 1, Using Python Starting from a feature group metadata object, you can traverse upstream the provenance graph to retrieve the metadata objects of the data sources that are part of the feature group. -To do so, you can use the [`FeatureGroup.get_storage_connector_provenance`][hsfs.feature_group.FeatureGroup.get_storage_connector_provenance] method. +To do so, you can use the [`FeatureGroup.get_data_source_provenance`][hsfs.feature_group.FeatureGroup.get_data_source_provenance] method. === "Python" ```python # Returns all data sources linked to the provided feature group - lineage = user_profiles_fg.get_storage_connector_provenance() + lineage = user_profiles_fg.get_data_source_provenance() # List all accessible parent data sources lineage.accessible @@ -72,7 +72,7 @@ To do so, you can use the [`FeatureGroup.get_storage_connector_provenance`][hsfs ```python # Returns an accessible data source linked to the feature group (if it exists) - user_profiles_fg.get_storage_connector() + user_profiles_fg.get_data_source() ``` To traverse the provenance graph in the opposite direction (i.e., from the data source to the feature group), you can use the [`StorageConnector.get_feature_groups_provenance`][hsfs.storage_connector.StorageConnector.get_feature_groups_provenance] method.