Skip to content

Discussion: Custom delimiters for unflatten #454

@jpmckinney

Description

@jpmckinney

I notice that some CSVs uploaded to the OCDS Data Review Tool use semi-colons.

With commas:

ocid,id,date,tag,initiationType,tender/id
ocds-1234567-abc,ocds-1234567-abc-1,2000-01-02T00:00:00Z,tender,tender,abc

With semicolons:

ocid;id;date;tag;initiationType;tender/id
ocds-1234567-abc;ocds-1234567-abc-1;2000-01-02T00:00:00Z;tender;tender;abc

Some possible behaviors:

  1. Leave as is. With above example, field is read in as "ocid;id;date;tag;initiationType;tender" which shows up under additional fields.
  2. Allow a dialect to be passed in. This defers all responsibility to the calling code.
  3. Add a sniff boolean argument. If enabled, flatten-tool sniffs the dialect. The sample size and/or possible delimiters could also be passed in.

For CoVEs, flatten-tool's unflatten is called within lib-cove's convert_spreadsheet, which is called by a CoVE's view. The flattentool_options are derived from arguments to convert_spreadsheet – except for paths, encoding (utf-8-sig, cp1252, latin_1), metatab_vertical_orientation (True), convert_titles (True). So, whatever new arguments are added to unflatten will need to be added to convert_spreadsheet.

I think (2) is best, as it gives the most flexibility to the calling code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions