-
Notifications
You must be signed in to change notification settings - Fork 198
[AURON #1404] Support for Spark 4.0.1 Compatibility in Auron. #1405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This PR is still under debugging, but it is largely compatible with Spark 4. Some fine-tuning is still required.
|
Co-authored-by: cxzl25 <3898450+cxzl25@users.noreply.github.com>
|
@richox @cxzl25 @SteNicholas I believe we should continue moving forward with support for Spark 4.0. Although this version is an initial support and may have some issues, we should keep pushing ahead. I would appreciate hearing your thoughts and suggestions. This PR requires #1399 to be merged first. cc: @guixiaowen |
@slfan1989 LGTM,LGTM,LGTM |
|
Is there a big difference between the APIs of spark4.0 and spark3.5? can we use a unique shims-spark package for these two shims? |
I’ve discussed privately with @richox. In the Auron project, we’ve decided to create a unified shim instead of introducing a separate new module. I’ll continue to follow up on the progress of this PR. |
Which issue does this PR close?
Closes #1404.
Rationale for this change
[AURON#1404] Support for Spark 4.0.1 Compatibility in Auron.
What changes are included in this PR?
To support Spark 4, Auron needs to be adapted accordingly. Currently,
Celebornalready supports Spark 4.0, andIceberghas also supported Spark 4.0 for some time. The Iceberg community has already voted to deprecate support for Spark 3.4, and it will be removed soon.For this PR, I have made the following changes:
Created a new module: I created a new module
spark-extension-shims-spark4. While I considered making changes to the existingspark-extension-shims-spark3module, I decided that creating a new module would be a better approach since the current module is targeted specifically forSpark 3.Removed support for lower versions of Spark: In the
spark-extension-shims-spark4module, I removed all references and compatibility for Spark versions3.0to3.5, ensuring that the module starts supporting Spark4.0.Three changes encountered during compilation:
NativeShuffleExchangeExec#ShuffleWriteProcessor: Due to SPARK-44605 restructuring the write method in the API, I refactored the partition and rdd handling here to retrieve them from dependencies for compatibility with other interfaces. In the future, we should switch to the new interface and make further changes to nativeRssShuffleWrite / nativeShuffleWrite.NativeBroadcastExchangeBase#getBroadcastTimeout: In Spark 4.0, getBroadcastTimeout needs to be fetched from getActiveSession.NativeBroadcastExchangeBase#getRelationFuture: In Spark 4.0, the type of SparkSession has changed to org.apache.spark.sql.classic.SparkSession, so I made the necessary adjustments to the way it is accessed.Are there any user-facing changes?
No.
How was this patch tested?
CI.