“Evolve with strategy. Balance skill and structure.”
Structured Genetic Programming (SGP) is an advanced variant of GP that incorporates structural awareness into evolution.
This project compares Regular GP and Structured GP to classify the hepatitis dataset, balancing accuracy with population diversity to find the best classifiers.
- Dataset Name:
hepatitis - Features: 20 (3 binary, 11 categorical, 6 continuous)
- Observations: 155
- Attribute Type: Mixed (categorical, continuous)
- Missing Values: None
- Duplicate Rows: None
- Binary columns encoded:
1 → Yes,0 → No - Missing binary values imputed with mode
- Continuous columns normalized via Min-Max Scaling
- Missing continuous values imputed with median
- Train/Test Split: 80/20
This dataset is used for both Regular GP and Structured GP.
GP classifiers are represented as tree structures:
- Internal Nodes: Logical/arithmetic operators
- Leaf Nodes: Terminal features/constants
Initial Population: Generated with the Growth Method, respecting node arity to create valid programs.
Expression trees allow adaptable architectures for classification.
- Uses F1-Score: harmonic mean of precision & recall
- Ideal for class-imbalanced datasets
- Combines F1-Score with structural diversity
- Rewards unique structures to maintain a diverse population
- Encourages evolution of both behaviorally strong and structurally unique individuals
- Tournament Selection (without replacement)
- Winners removed to ensure exploration and diverse offspring
- Tournament selection balances fitness and diversity
- Individuals scored by average similarity, prioritizing unique solutions
- Elitism: Top 10% preserved for next generation
- Swap Mutation: Two random nodes swapped to explore the search space
- Sub-tree Crossover: Random subtrees swapped between parents
- Rates control exploration vs. exploitation
- Structural Elitism: Preserves top performers and unique structures
- Diverse Swap Mutation: Mutation rate increases if population similarity > 50%
- Structured Sub-tree Crossover: Only occurs if parents are below similarity threshold
- Number of generations
- Maximum tree depth
- Population size
- num_features:
len(dataset.columns)-1 - Terminal Set:
[x1, x2, ..., xn, c] - Functional Set:
['and', 'or', 'not', '>', '<', '==','if','+','-','*','/'] - Max Depth: 2
- Population Size: 250
- Generations: 52
- Mutation Rate: 0.10
- Crossover Rate: 0.8
- Elitism Rate: 0.10
- Tournament Size: 4
- num_features:
len(dataset.columns)-1 - Terminal Set:
[x1, x2, ..., xn, c] - Functional Set:
['and', 'or', 'not', '>', '<', '==','if','+','-','*','/'] - Max Depth: 2
- Population Size: 500
- Generations: 52
- Mutation Rate: 0.10
- Crossover Rate: 0.8
- Elitism Rate: 0.10
- Beta: 0.7
- Alpha: 0.3 → weight for structural fitness
- Similarity Threshold: 0.6 → suppress crossover for overly similar parents
🧩 Main Quest: Structured GP for hepatitis classification
🚀 Objective: Evolve classifiers that are accurate and structurally diverse
🏆 Reward: Balanced models ready for real-world evaluation
Honours-level project — evolve smart, not just fast.