Exploring data partitions for k-means and Fuzzy-c-means clustering:

One of the recurrent – and vexing – problems of ecology is the decision regarding the most adequate number of groups for clustering multivariate data. I already provided some functions (making use of heatmaps and networks) to facilitate this process in the EcotoneFinder package.
Here I present another set of analyses that may be used to the same effect, using different partition indices and, particularly, their evolution when the data are subjected to segmentation by increasing number of groups.
The initial idea for these functions came from a figure in a publication by Pavão et.al, 2019 [1] – which I intended to reproduce for my own data – and the extension of this protocol from k-means clustering to the fuzzy-c-means clustering I was using at the time.

Provided functions:

Three functions are currently in the repository:

cascadeFCM: and extension of the vegan::cascadeKM for fuzzy-c-means clustering.
KMeans_indices_test to produce the data needed to draw a plot similar to the one in Pavão et. al, 2019, with k-means clustering.
FCM_indices_test to produce the data needed to draw a plot similar to the one in Pavão et. al, 2019, with fuzzy-c-means clustering.

All these functions might eventually be integrated in future versions of the EcotoneFinderpackage.

Basic examples:

Considering the artificial data presented bellow, and provided in this repository:

The associated heatmap and networks (qgraph, running a spinglass algorithm to determine statistical communities in the network) both highlight three main groups, either as "squares" of more closely related species along the diagonal of the heatmap, or as groups of related nodes. This correctly describes the three main communities in the artificial data.

Now – running the KMeans_indices_test on the same data – we obtain an optimum at $n = 5$ groups instead of $n = 3$. This may correspond to the ecotonal communities (i.e. three communities plus two ecotones).

The use of fuzzy indices (and thus, fuzzy clusters) – using the FCM_indices_test – now finds back the $n = 3$ optimum, although $n = 5$ still seems a reasonably good solution.

References:

[1] Pavão DC, Elias RB, Silva L (2019) Comparison of discrete and continuum community models: Insights from numerical ecology and Bayesian methods applied to Azorean plant communities. Ecological Modelling 402:93–106 doi:10.1016/j.ecolmodel.2019.03.021

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
R		R
.gitignore		.gitignore
Heatmap.png		Heatmap.png
LICENSE		LICENSE
Network.png		Network.png
PCN.png		PCN.png
README.md		README.md
SSI.png		SSI.png
TestCommunities.png		TestCommunities.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Exploring data partitions for k-means and Fuzzy-c-means clustering:

Provided functions:

Basic examples:

References:

About

Uh oh!

Releases

Packages

Languages

License

Ecotoni/Fuzzy-Clustering-Tests

Folders and files

Latest commit

History

Repository files navigation

Exploring data partitions for k-means and Fuzzy-c-means clustering:

Provided functions:

Basic examples:

References:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages