GitHub - Magken/Recursive-Currents: 🌀 Recursive Currents is a data-driven exploration of global news, clustering 50,000+ articles by shared named entities (people, places, organizations). It reveals how world narratives connect—across borders, topics, and timelines—through the names that keep appearing together. Built with Python, scikit-learn, and Plotly.

# 🌀 Recursive Currents: Entity-Based Global News Clustering

**Recursive Currents** is a data visualization project that reveals the hidden architecture of global news coverage by clustering 50,000+ articles based on shared named entities (e.g., people, places, organizations). By recursively analyzing co-occurrence patterns, the project uncovers how geopolitical narratives, economic threads, and social phenomena converge across regions and topics.

Find the data set at https://www.kaggle.com/datasets/enowgeorge/kosmopulse-annotated-news-dataset-worldwide2025

---

## 📌 Overview

This Colab Notebook is the computational backend of the project. It performs:

- Cleaning and canonicalizing named entities  
- Transforming articles into binary entity vectors  
- Constructing a sparse co-occurrence matrix  
- Performing hierarchical clustering of articles  
- Generating recursive trees of entity-based clusters  
- Visualizing the results as interactive treemaps and sunbursts  

---

## 📁 Dataset

- **Input**: `kosmopulse_articles_with_entities.csv`  
  ~50,000 global news articles, each with a list of named entities, headline, source, and date.

---

## ⚙️ Dependencies

Install all required Python packages:

```bash
pip install -r requirements.txt

🔑 Key Packages

pandas, numpy, scikit-learn, scipy
plotly, kaleido (for interactive and high-res PDF charts)

🚀 Running the Pipeline

Each notebook section represents a modular step:

1. Load & Parse Dataset

Convert stringified entity lists into real Python lists.

2. Clean & Canonicalize Entities

Remove junk entities, standardize names (e.g., “us” → “United States of America”), strip punctuation, and remove source-related tokens.

3. Vectorize Entities

Convert each article into a binary vector using CountVectorizer.

4. Build Co-Occurrence Matrix

Compute a sparse matrix of how often each pair of entities co-occur across articles.

5. Cluster Articles

Use TruncatedSVD to reduce dimensionality, then scipy’s hierarchical clustering to group articles recursively.

6. Generate Recursive Tree

Traverse the clustering tree and label each internal node with the most frequent entity among its children.

7. Visualize

Use Plotly to render:

Treemaps: area-based hierarchical clusters of entity groups
Sunbursts: radial recursive cluster visualization

Both visualizations display the top 3–5 levels of clustering.

📊 Sample Output

entity_cluster_tree.json: JSON-formatted recursive tree
treemap_top_5_layers.pdf: PDF of the treemap visualization
sunburst_top_5_layers.pdf: PDF of the sunburst visualization

💡 What It Shows

This isn’t just a popularity chart.

The results reflect how different regions and media ecosystems interpret world events — showing that clusters often converge on narrative hubs like Trump, the U.S., China, or Ukraine, not always because they’re central to the story, but because they act as semantic intermediaries in a web of global discourse.

✨ Example Insight

“Why are Trump, Pakistan, the UK, and Singapore connected?”

Because they often co-occur through diplomatic events, trade agreements, or security stories — even if indirectly. The recursive structure reveals these flows, which traditional keyword or popularity analysis would miss.

📅 Export

To save high-resolution PDF visualizations (e.g., for publication or sharing), kaleido is used internally by Plotly.

See the relevant notebook cells for exporting:

fig.write_image("treemap_top_5_layers.pdf", format="pdf", width=3000, height=2000)

🔧 Credits

Developed as part of the Kosmopulse project. Entity extraction and article sourcing by [Your Name or Organization].

📜 License

MIT License Attribution appreciated if used in publications, visual media, or research.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
main.ipynb		main.ipynb
readme.md		readme.md
requirements.txt		requirements.txt
sunburst_top_5_layers.pdf		sunburst_top_5_layers.pdf
treemap_top_5_layers (2) (1).pdf		treemap_top_5_layers (2) (1).pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔑 Key Packages

🚀 Running the Pipeline

1. Load & Parse Dataset

2. Clean & Canonicalize Entities

3. Vectorize Entities

4. Build Co-Occurrence Matrix

5. Cluster Articles

6. Generate Recursive Tree

7. Visualize

📊 Sample Output

💡 What It Shows

✨ Example Insight

📅 Export

🔧 Credits

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Magken/Recursive-Currents

Folders and files

Latest commit

History

Repository files navigation

🔑 Key Packages

🚀 Running the Pipeline

1. Load & Parse Dataset

2. Clean & Canonicalize Entities

3. Vectorize Entities

4. Build Co-Occurrence Matrix

5. Cluster Articles

6. Generate Recursive Tree

7. Visualize

📊 Sample Output

💡 What It Shows

✨ Example Insight

📅 Export

🔧 Credits

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages