Better source for Swedish data?

Hi,

Amazing project! I actually found about it after I made one based on the exact same principle based on Swedish data for my own needs. I just published the code [here](https://github.com/civictechsweden/genderify-sweden).

I'm both frustrated and happy I found your project (as well as [name-dataset](https://github.com/philipperemy/name-dataset)) because I couldn't find anything when I first looked and felt like I had to write my own code. But now that I've done it, I'm bummed someone implemented it better and with more data. Oh well... 😊

Anyway, I'm reaching out since I saw that you seem to be using newborn data for Sweden. I've been using a different dataset which I think works better. SCB has a list of all the names born by at least two people living in Sweden (first, middle and last names). They can be found on [this page](https://www.scb.se/hitta-statistik/sverige-i-siffror/namnsok/) (the files called Namnsök 2021 and 2022).

I did the math and this amounts to 98% of the population (e.g. 2% of the population have a unique name and are hence not in this list). So it's way more exhaustive than the lists of newborns, even if you go back a few decades. In total, there are 97386 unique first names to compare with the 1518 in your newborn dataset.

Would you be interested in a PR to use this dataset instead?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better source for Swedish data? #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Better source for Swedish data? #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions