Skip to content

Normalisation #9

@k4lipso

Description

@k4lipso

Normalising Data works well using the --format SQL, but we still have a lot of Data which is stored multiple times in the Database.

The Hwloc_pci extractor is mostly identical on each host, but the "domain" specifier is unique, so it could be outsourced. the hwloc_pci datastring contains about 40.000 chars which where written 63 times on a test with 64 Hosts, so about 2.520.000 chars where written to the Database. that could be reduced to about 45.000.

The Filesystem extractor collects data about Mountpoints and Partitions, which both have their own unique EID in the Database representation. Most of the values are static and could be normalized, but there is one dynamic value: "available". Because of that each Host creates its own Datastring even if the Mountpoints are exactly the same. To be exact from 64 Hosts 62 data entrys where created.

Hwloc_machine Extractor creates Machineinfo entry. The data linked to that really is identical on each Host, but the "hostname" specifier is unique and not needed at all in the database represantation since the Hostname is not only unique but also allready saved in the Hosttable. Could be deleted or outsourced completly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions