Skip to content

is it possible to run try_into_collection on a Chunk instead of an Array? #82

@AlJohri

Description

@AlJohri

Starting with the parquet_read_parallel example from arrow2, I am trying to deserialize a Chunk into a Vec of structs.

Using the deserialize_parallel function as defined in the above example, the following code currently works for me:

pub struct Document {
    content: String,
}

...
let chunk = deserialize_parallel(&mut columns)?;
let array = StructArray::new(
    DataType::Struct(fields.clone()),
    chunk.arrays().to_vec(),
    None,
);
let documents: Vec<Document> = array.to_boxed().try_into_collection().unwrap();

Questions:

  1. With the currently exposed APIs in arrow2 and arrow2-convert, is there a better way to convert the Chunk into a Struct? I think the extra conversion from Chunk to StructArray with the to_boxed at the end is perhaps not the most efficient.
  2. Would it be possible to expose TryIntoCollection::try_into_collection directly on the Chunk as well?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions