SE for AI

Fair-AutoML: Performance-Aware Fairness Repair

Overview

Bias mitigation algorithms typically work only in specific situations and often repair fairness at the cost of large accuracy drops, making them impractical for critical decision-making software. We treated fairness repair as a performance-aware optimization problem: fix the bias in a buggy model without ruining its accuracy.

Fair-AutoML, built on a state-of-the-art AutoML tool, made this concrete through two changes to standard AutoML: an optimization function that incorporates fairness objectives alongside accuracy, and a fairness-aware search space over candidate model configurations. A search-space pruning method further reduced computational cost and repair time.

ML Software Maintenance and Technical Debt

Overview

ML software has distinctive maintenance risks because data, models, pipelines, and code evolve together. Technical debt can infect the data that models are trained on, degrading the functional performance of ML systems in ways traditional debt does not, and the growing inclusion of ML components in modern software introduces new kinds of debt.

We study how this debt appears in ML repositories in the wild. Mining 68,821 self-admitted technical debts (SATDs) from all revisions of 2,686 mature ML repositories on GitHub, we build taxonomies of ML-specific debt, locate the pipeline stages where it accumulates, and track how it is introduced and removed — evidence developers and researchers can use to build maintainable ML systems.

Large-Scale Mining of Data Science Software

Overview

Data science components have become common in software, yet software engineering research on this class of systems needed data and tooling that did not exist. We built an infrastructure to mine data science software from GitHub at scale: we extended the Boa framework to parse Python using ANTLR grammars for Python 2 and 3, transformed the source into ASTs stored in Boa’s Protobuf format, and hosted the result on a Hadoop cluster where Boa’s domain-specific language runs automatically parallelized queries. The resulting dataset covered 1,558 mature, top-rated data science projects — about 5 million Python file snapshots across all revisions — and was later extended to parse Jupyter notebooks.