Skip to content

Alphabetical order in testing environment different than in regular R environment? #127

@swhalemwo

Description

@swhalemwo

I'm developing a package for project-specific data processing. One step is checking whether a number of names are really distinct, or if similar names refer to the same person. For this I first generate from a database a data.table of pairs that are similar based on string similarity, and compare this to a data.table of pairs for that I have manually checked whether they refer to the same person. If all similar sounding names have been covered in my manually compiled list, the test passes.

I do this via a negative join with data.table:

dt_redux <- dt_pairs_from_db[!dt_manually_checked_pairs, on = .(name1, name2)]
expect_true(nrow(dt_redux)==0)

This test did pass when calling test_all or build_install_test, but failed in R CMD check.

After some searching I tracked it down to the name order in dt_pairs_from_db. Here the pairs are generated from a string similarity function, which creates two entries for each couple (name1, name2 and name2, name1). To avoid having to check each couple twice, I only cover the cases where name1 > name2. However for one couple, "İnan Kıraç" and "Suna Kıraç", the alphabetical order differs between the normal R environment and the testing environment: In the normal R environment, expect_true("İnan Kıraç" > "Suna Kıraç") fails, but in the testing environment (in my test_package.R file), expect_true("İnan Kıraç" > "Suna Kıraç") passes.

This difference in alphabetical order lead to a dt_pairs_from_db being generated that didn't match the order of pairs to check in my dt_manually_checked_pairs, which caused the test to fail.

I've now fixed it by just adding this particular couple in both comparisons to my dt_manually_checked_pairs, but I'm curious what caused this; any ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions