# 🌀 Recursive Currents: Entity-Based Global News Clustering
**Recursive Currents** is a data visualization project that reveals the hidden architecture of global news coverage by clustering 50,000+ articles based on shared named entities (e.g., people, places, organizations). By recursively analyzing co-occurrence patterns, the project uncovers how geopolitical narratives, economic threads, and social phenomena converge across regions and topics.
Find the data set at https://www.kaggle.com/datasets/enowgeorge/kosmopulse-annotated-news-dataset-worldwide2025
---
## 📌 Overview
This Colab Notebook is the computational backend of the project. It performs:
- Cleaning and canonicalizing named entities
- Transforming articles into binary entity vectors
- Constructing a sparse co-occurrence matrix
- Performing hierarchical clustering of articles
- Generating recursive trees of entity-based clusters
- Visualizing the results as interactive treemaps and sunbursts
---
## 📁 Dataset
- **Input**: `kosmopulse_articles_with_entities.csv`
~50,000 global news articles, each with a list of named entities, headline, source, and date.
---
## ⚙️ Dependencies
Install all required Python packages:
```bash
pip install -r requirements.txtpandas,numpy,scikit-learn,scipyplotly,kaleido(for interactive and high-res PDF charts)
Each notebook section represents a modular step:
Convert stringified entity lists into real Python lists.
Remove junk entities, standardize names (e.g., “us” → “United States of America”), strip punctuation, and remove source-related tokens.
Convert each article into a binary vector using CountVectorizer.
Compute a sparse matrix of how often each pair of entities co-occur across articles.
Use TruncatedSVD to reduce dimensionality, then scipy’s hierarchical clustering to group articles recursively.
Traverse the clustering tree and label each internal node with the most frequent entity among its children.
Use Plotly to render:
- Treemaps: area-based hierarchical clusters of entity groups
- Sunbursts: radial recursive cluster visualization
Both visualizations display the top 3–5 levels of clustering.
entity_cluster_tree.json: JSON-formatted recursive treetreemap_top_5_layers.pdf: PDF of the treemap visualizationsunburst_top_5_layers.pdf: PDF of the sunburst visualization
This isn’t just a popularity chart.
The results reflect how different regions and media ecosystems interpret world events — showing that clusters often converge on narrative hubs like Trump, the U.S., China, or Ukraine, not always because they’re central to the story, but because they act as semantic intermediaries in a web of global discourse.
“Why are Trump, Pakistan, the UK, and Singapore connected?”
Because they often co-occur through diplomatic events, trade agreements, or security stories — even if indirectly. The recursive structure reveals these flows, which traditional keyword or popularity analysis would miss.
To save high-resolution PDF visualizations (e.g., for publication or sharing), kaleido is used internally by Plotly.
See the relevant notebook cells for exporting:
fig.write_image("treemap_top_5_layers.pdf", format="pdf", width=3000, height=2000)Developed as part of the Kosmopulse project. Entity extraction and article sourcing by [Your Name or Organization].
MIT License Attribution appreciated if used in publications, visual media, or research.