Skip to content

Better source for Swedish data? #20

@PierreMesure

Description

@PierreMesure

Hi,

Amazing project! I actually found about it after I made one based on the exact same principle based on Swedish data for my own needs. I just published the code here.

I'm both frustrated and happy I found your project (as well as name-dataset) because I couldn't find anything when I first looked and felt like I had to write my own code. But now that I've done it, I'm bummed someone implemented it better and with more data. Oh well... 😊

Anyway, I'm reaching out since I saw that you seem to be using newborn data for Sweden. I've been using a different dataset which I think works better. SCB has a list of all the names born by at least two people living in Sweden (first, middle and last names). They can be found on this page (the files called Namnsök 2021 and 2022).

I did the math and this amounts to 98% of the population (e.g. 2% of the population have a unique name and are hence not in this list). So it's way more exhaustive than the lists of newborns, even if you go back a few decades. In total, there are 97386 unique first names to compare with the 1518 in your newborn dataset.

Would you be interested in a PR to use this dataset instead?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions