-
Notifications
You must be signed in to change notification settings - Fork 54
Session + Spark example #1022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Session + Spark example #1022
Conversation
✅ Deploy Preview for neo4j-graph-data-science-client ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
e511a66 to
95a787a
Compare
Co-authored-by: Florentin Dörre <florentin.dorre@neo4j.com>
|
Closes #991 |
| "# Create a GDS session!\n", | ||
| "gds = sessions.get_or_create(\n", | ||
| " # we give it a representative name\n", | ||
| " session_name=\"bike_trips\",\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i read about bike trips just today in our logs. nice to learn its your workload 👀
Mats-SX
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have many textual changes on the notebook.
I think this is a great step forward! I think Nathan and Stu will be very happy to see it.
| return None | ||
|
|
||
| def aborted(self) -> bool: | ||
| return self.status == "Aborted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency, let's also do the .lower() on this one
| "Flight returned internal error, with message: org.apache.arrow.flight.FlightRuntimeException: ", "" | ||
| ) | ||
| improved_message = improved_message.replace( | ||
| "Failed to invoke procedure `gds.arrow.project`: Caused by: org.apache.arrow.flight.FlightRuntimeException: ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this procedure doesn't really exist anymore -- only in very old Neo4j versions. these days it's called gds.arrow.project.v2 or, more commonly v3
you just moved it, I realise, but I want to note the ineffectiveness here.
|
|
||
| class CatalogEndpoints(ABC): | ||
| @abstractmethod | ||
| def get(self, graph_name: str) -> GraphV2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it doesn't find the graph? error?
|
|
||
| def get(self, graph_name: str) -> GraphV2: | ||
| if not self.list(graph_name): | ||
| raise ValueError(f"A graph with name '{graph_name}' does not exist in the catalog.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, error
| from pandas import DataFrame | ||
|
|
||
| from graphdatascience.arrow_client.v1.gds_arrow_client import GdsArrowClient | ||
| from graphdatascience.arrow_client.v2.gds_arrow_client import GdsArrowClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one usage of this class was removed with another import. is this import now unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah no it is used in the create factory
| "The notebook shows how to use the `graphdatascience` Python library to create, manage, and use a GDS Session.\n", | ||
| "\n", | ||
| "We consider a graph of bicycle rentals, which we're using as a simple example to show how project data from Spark to a GDS Session, run algorithms, and eventually retrieving the results back to Spark.\n", | ||
| "We will cover all management operations: creation, listing, and deletion." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "We will cover all management operations: creation, listing, and deletion." | |
| "In this notebook we will focus on the interaction with Apache Spark, and will not cover all possible actions using GDS sessions. We refer to other Tutorials for additional details." |
| "\n", | ||
| "Once the computation is done, we might want to further use the result in Spark.\n", | ||
| "We can do this in a similar way to the projection, by streaming batches of data into each of the Spark workers.\n", | ||
| "Retrieving the data is a bit more complicated since we need some input data frame in order to trigger computations on the Spark workers.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Retrieving the data is a bit more complicated since we need some input data frame in order to trigger computations on the Spark workers.\n", | |
| "Retrieving the data is a bit more complicated since we need some input DataFrame in order to trigger computations on the Spark workers.\n", |
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# 1. Start the node property export on the session\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "# 1. Start the node property export on the session\n", | |
| "# 1. Start the node property export on the GDS session\n", |
| "source": [ | ||
| "## Cleanup\n", | ||
| "\n", | ||
| "Now that we have finished our analysis, we can delete the session and stop the spark connection.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Now that we have finished our analysis, we can delete the session and stop the spark connection.\n", | |
| "Now that we have finished our analysis, we can delete the GDS session and stop the Spark session.\n", |
| "\n", | ||
| "Now that we have finished our analysis, we can delete the session and stop the spark connection.\n", | ||
| "\n", | ||
| "Deleting the session will release all resources associated with it, and stop incurring costs." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Deleting the session will release all resources associated with it, and stop incurring costs." | |
| "Deleting the GDS session will release all resources associated with it, and stop incurring costs." |
ref GDSA-373
ref GDSA-469