-
Notifications
You must be signed in to change notification settings - Fork 376
feat(sqllogictest): use serde derived structs for schedule parsing #1953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Replace manual TOML parsing with serde-derived structs to improve maintainability and separate parsing from engine instantiation. Changes: - Add ScheduleConfig, EngineConfig, and EngineType structs with serde derives - Use #[serde(flatten)] for forward-compatibility with future config fields - Refactor Schedule::from_file() to use toml::from_str() directly - Add instantiate_engines() to separate parsing from engine creation - Remove manual parse_engines() and parse_steps() functions - Update tests to verify deserialization behavior Closes apache#1952
|
|
||
| impl DataFusionEngine { | ||
| pub async fn new(config: TomlTable) -> Result<Self> { | ||
| pub async fn new(_name: &str, config: &EngineConfig) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help me understand why do we add the name here? I can't think of any cases where name would matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that if there are multiple instances, this could help debug/log which one fails. As this is actually currently unclear, i'll remove this, we can always re-add it later
| /// Additional configuration fields for extensibility | ||
| /// This allows forward-compatibility with future fields like catalog_type, catalog_properties | ||
| #[serde(flatten)] | ||
| pub extra: HashMap<String, toml::Value>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting direction, but I'm feeling that we will need another serializable config like DatafusionCatalogConfig in the future rather than using toml::Value directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, see the suggestion above.
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AndreaBozzo , this is much better now. I have complete example of engine config, please help to fix it.
|
|
||
| /// Configuration for a single engine instance | ||
| #[derive(Debug, Clone, Deserialize)] | ||
| pub struct EngineConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub struct EngineConfig { | |
| pub enum EngineConfig { | |
| DataFusion { | |
| catalog: DatafusionCatalogConfig | |
| } | |
| } | |
| pub struct DatafusionCatalogConfig { | |
| typ: String, | |
| properties: HashMap<String, String> | |
| } | |
| /// Additional configuration fields for extensibility | ||
| /// This allows forward-compatibility with future fields like catalog_type, catalog_properties | ||
| #[serde(flatten)] | ||
| pub extra: HashMap<String, toml::Value>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, see the suggestion above.
crates/sqllogictest/src/schedule.rs
Outdated
| // Test forward-compatibility with extra fields (for PR #1943) | ||
| let input = r#" | ||
| [engines] | ||
| df = { type = "datafusion", catalog_type = "rest", some_future_field = "value" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| df = { type = "datafusion", catalog_type = "rest", some_future_field = "value" } | |
| df = { type = "datafusion", | |
| catalog = { | |
| type = "rest", | |
| props = { | |
| // A hashmap of properties. | |
| url = "http://localhost:9000", | |
| use_x = "true", | |
| } | |
| } | |
| } |
This is a complete example, and then we could use catalog loader to load a catalog.
…Config - Replace EngineConfig struct with tagged enum for better type safety - Add CatalogConfig struct with catalog_type and props fields - Remove unused _name parameter from DataFusionEngine::new() - Prepare structure for future catalog loader integration
e8867c8 to
f09e5e7
Compare
|
Thanks for the detailed feedback @liurenjie1024 and @CTTY! I appreciate a lot your time and patience on this. I've addressed the review comments in commit f09e5e7, the previous one lacked fmt The memory catalog remains the default for now. Full catalog loader integration (REST, Glue, HMS, etc.) would have added MANY new dependencies and i wanted to keep it simple to review/modify eventually. Thanks again. |
This PR refactors the schedule file parsing in the sqllogictest crate to use serde-derived structs instead of manual TOML parsing, as requested in #1952.
Changes
New structs with serde derives:
ScheduleConfig- top-level configuration parsed from TOMLEngineConfig- per-engine configuration with#[serde(flatten)]for extensibilityEngineType- enum of supported engine typesRefactored parsing flow:
Schedule::from_file()now usestoml::from_str()directlyinstantiate_engines()to separate parsing from engine creationparse_engines()andparse_steps()functionsForward-compatibility:
#[serde(flatten)]to capture extra fields inEngineConfig.extracatalog_typeandcatalog_propertiessupportRelation to #1943
This PR was suggested by @liurenjie1024 as a prerequisite to #1943 (dynamic catalog configuration). The
#[serde(flatten)]approach allows #1943 to simply extract the catalog configuration fromEngineConfig.extrawithout modifying the parsing logic.Testing
df_test.tomlpasses unchangedCloses #1952