Skip to content

Wadalisa/Structured-Based-GP

Repository files navigation

🧩⚔️ STRUCTURED-BASED GP — MAIN QUEST

“Evolve with strategy. Balance skill and structure.”


🗺️ Quest Overview

Structured Genetic Programming (SGP) is an advanced variant of GP that incorporates structural awareness into evolution.

This project compares Regular GP and Structured GP to classify the hepatitis dataset, balancing accuracy with population diversity to find the best classifiers.


📊 DATASET — HEPATITIS QUEST MAP

  • Dataset Name: hepatitis
  • Features: 20 (3 binary, 11 categorical, 6 continuous)
  • Observations: 155
  • Attribute Type: Mixed (categorical, continuous)
  • Missing Values: None
  • Duplicate Rows: None

🛠️ Pre-Processing Buffs

  • Binary columns encoded: 1 → Yes, 0 → No
  • Missing binary values imputed with mode
  • Continuous columns normalized via Min-Max Scaling
  • Missing continuous values imputed with median
  • Train/Test Split: 80/20

This dataset is used for both Regular GP and Structured GP.


🌳 MODEL REPRESENTATION — SYNTAX TREES

GP classifiers are represented as tree structures:

  • Internal Nodes: Logical/arithmetic operators
  • Leaf Nodes: Terminal features/constants

Initial Population: Generated with the Growth Method, respecting node arity to create valid programs.

Expression trees allow adaptable architectures for classification.


📐 FITNESS FUNCTION — DAMAGE CALCULATION

Regular GP

  • Uses F1-Score: harmonic mean of precision & recall
  • Ideal for class-imbalanced datasets

Structured GP

  • Combines F1-Score with structural diversity
  • Rewards unique structures to maintain a diverse population
  • Encourages evolution of both behaviorally strong and structurally unique individuals

🎯 SELECTION METHOD — PARTY RECRUITMENT

Regular GP

  • Tournament Selection (without replacement)
  • Winners removed to ensure exploration and diverse offspring

Structured GP

  • Tournament selection balances fitness and diversity
  • Individuals scored by average similarity, prioritizing unique solutions

🧬 GENETIC OPERATORS — EVOLUTION MECHANICS

Regular GP

  • Elitism: Top 10% preserved for next generation
  • Swap Mutation: Two random nodes swapped to explore the search space
  • Sub-tree Crossover: Random subtrees swapped between parents
  • Rates control exploration vs. exploitation

Structured GP

  • Structural Elitism: Preserves top performers and unique structures
  • Diverse Swap Mutation: Mutation rate increases if population similarity > 50%
  • Structured Sub-tree Crossover: Only occurs if parents are below similarity threshold

⏹️ TERMINATION CRITERIA

  • Number of generations
  • Maximum tree depth
  • Population size

⚙️ PARAMETERS — BUILD STATS

Standard GP Parameters

  • num_features: len(dataset.columns)-1
  • Terminal Set: [x1, x2, ..., xn, c]
  • Functional Set: ['and', 'or', 'not', '>', '<', '==','if','+','-','*','/']
  • Max Depth: 2
  • Population Size: 250
  • Generations: 52
  • Mutation Rate: 0.10
  • Crossover Rate: 0.8
  • Elitism Rate: 0.10
  • Tournament Size: 4

Structured GP Parameters

  • num_features: len(dataset.columns)-1
  • Terminal Set: [x1, x2, ..., xn, c]
  • Functional Set: ['and', 'or', 'not', '>', '<', '==','if','+','-','*','/']
  • Max Depth: 2
  • Population Size: 500
  • Generations: 52
  • Mutation Rate: 0.10
  • Crossover Rate: 0.8
  • Elitism Rate: 0.10
  • Beta: 0.7
  • Alpha: 0.3 → weight for structural fitness
  • Similarity Threshold: 0.6 → suppress crossover for overly similar parents

🏁 QUEST STATUS

🧩 Main Quest: Structured GP for hepatitis classification
🚀 Objective: Evolve classifiers that are accurate and structurally diverse
🏆 Reward: Balanced models ready for real-world evaluation


Honours-level project — evolve smart, not just fast.

About

Structured based GP

Topics

Resources

Stars

Watchers

Forks

Languages