Skip to content

softmin/ReHLine-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ReHLine-Python: Efficient Solver for ERM with PLQ Loss and Linear Constraints

PyPI version License: MIT Documentation Paper Downloads

Fast, scalable, and scikit-learn compatible optimization for machine learning

ReHLine-Python is the official Python implementation of ReHLine, a powerful solver for large-scale empirical risk minimization (ERM) problems with convex piecewise linear-quadratic (PLQ) loss functions and linear constraints. Built with high-performance C++ core and seamless Python integration, ReHLine delivers exceptional speed while maintaining ease of use.

See more details in the ReHLine documentation.

✨ Key Features

  • πŸš€ Blazing Fast: Linear computational complexity per iteration, scales to millions of samples
  • 🎯 Versatile: Supports any convex PLQ loss (hinge, check, Huber, and more)
  • πŸ”’ Constrained Optimization: Handle linear equality and inequality constraints
  • πŸ“Š Scikit-Learn Compatible: Drop-in replacement with GridSearchCV, Pipeline support
  • 🐍 Pythonic API: Both low-level and high-level interfaces for flexibility

πŸ“¦ Installation

Quick Install

pip install rehline

πŸš€ Quick Start

Scikit-Learn Style API (Recommended)

ReHLine provides plq_Ridge_Classifier and plq_Ridge_Regressor that work seamlessly with scikit-learn:

from rehline import plq_Ridge_Classifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# Generate dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Simple usage
clf = plq_Ridge_Classifier(loss={'name': 'svm'}, C=1.0)
clf.fit(X_train, y_train)
print(f"Accuracy: {clf.score(X_test, y_test):.3f}")

# Use in Pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', plq_Ridge_Classifier(loss={'name': 'svm'}))
])
pipeline.fit(X_train, y_train)

# Hyperparameter tuning with GridSearchCV
param_grid = {
    'C': [0.1, 1.0, 10.0],
    'loss': [{'name': 'svm'}, {'name': 'sSVM'}]
}
grid_search = GridSearchCV(plq_Ridge_Classifier(loss={"name": "svm"}), param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(f"Best params: {grid_search.best_params_}")

See more details in ReHLine with Scikit-Learn.

Low-Level API for Custom Problems

from rehline import ReHLine
import numpy as np

# Define custom PLQ loss parameters
clf = ReHLine()
# Set custom U, V matrices for ReLU loss
# and S, T, tau for ReHU loss
## U
clf.U = -(C*y).reshape(1,-1)
## V
clf.V = (C*np.array(np.ones(n))).reshape(1,-1)

# Set custom linear constraints A*beta + b >= 0
X_sen = X[:,0]
tol_sen = 0.1
clf.A = np.repeat([X_sen @ X], repeats=[2], axis=0) / n
clf.A[1] = -clf.A[1]

clf.fit(X)

See more detailed in Manual ReHLine Formulation.

🎯 Use Cases

ReHLine excels at solving a wide range of machine learning problems:

Problem Description Key Benefits
Support Vector Machines Binary and multi-class classification 100-400Γ— faster than CVXPY solvers
Fair Machine Learning Classification with fairness constraints Handles demographic parity efficiently
Quantile Regression Robust conditional quantile estimation 2800Γ— faster than general solvers
Huber Regression Outlier-resistant regression Superior to specialized solvers
Sparse Learning Feature selection with L1 regularization Scales to high dimensions
Custom Optimization Any PLQ loss with linear constraints Flexible framework for research

⚑ Performance Benchmarks

ReHLine delivers exceptional speed compared to state-of-the-art solvers. Here are speed-up factors on real-world datasets:

Speed Comparison vs. Popular Solvers

Task vs. ECOS vs. MOSEK vs. SCS vs. Specialized Solvers
SVM 415Γ— faster ∞ (failed) 340Γ— faster 4.5Γ— vs. LIBLINEAR
Fair SVM 273Γ— faster 100Γ— faster 252Γ— faster ∞ vs. DCCP (failed)
Quantile Regression 2843Γ— faster ∞ (failed) ∞ (failed) β€”
Huber Regression ∞ (failed) 452Γ— faster ∞ (failed) 2.4Γ— vs. hqreg
Smoothed SVM β€” β€” β€” 1.6-2.3Γ— vs. SAGA/SAG/SDCA/SVRG

Note: "∞" indicates the competing solver failed to produce a valid solution or exceeded time limits. Results from NeurIPS 2023 paper.

Reproducible Benchmarks (powered by benchopt)

All benchmarks are reproducible via benchopt at our ReHLine-benchmark repository.

Problem Benchmark Code Interactive Results
SVM Code πŸ“Š View
Smoothed SVM Code πŸ“Š View
Fair SVM Code πŸ“Š View
Quantile Regression Code πŸ“Š View
Huber Regression Code πŸ“Š View

🀝 Contributing

We welcome contributions! Whether it's bug reports, feature requests, or code contributions:

πŸ“š Citation

If you use ReHLine in your research, please cite our NeurIPS 2023 paper:

@inproceedings{dai2023rehline,
  title={ReHLine: Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence},
  author={Dai, Ben and Qiu, Yixuan},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

πŸ”— ReHLine Ecosystem

🏠 Core Projects

πŸ“Š Resources