Projects on reSAID Lab

Coding Agents and Operational Safety

Mon, 16 Nov 2026 00:00:00 +0000

Overview

Autonomous coding agents built on large language models are wired directly into development workflows: they edit files, run commands, configure environments, and fix bugs with growing autonomy. Most safety evaluations of these tools focus on explicitly malicious prompts, but we argue this misses the larger and more common danger: agents that fail during ordinary, goal-directed work through destructive operations, constraint violations, authorization bypasses, and silent errors that surface only after damage is done.

Trustworthy LLMs and VLMs

Tue, 08 Sep 2026 00:00:00 +0000

Overview

Large language and vision-language models are deployed in settings where biased, inconsistent, or manipulated behavior can affect users, yet their internals are often unavailable or hard to inspect. We develop methods that expose and characterize such hidden failures, treating trustworthiness as a property that must be tested for rather than assumed — and connecting each testing method to a concrete path for mitigation or defense.

A recurring theme in our work is that trustworthiness must account for a model’s reasoning process, not only its final answer. Attacks and guardrails that operate on outputs alone tend to leave reasoning traces that are inconsistent or easy to flag, but as models increasingly expose their chain-of-thought, the reasoning itself becomes both a new attack surface and a new opportunity for defense. We study how bias and backdoor threats propagate through model behavior, how to characterize them with principled signals, and how to build safeguards that hold up against adaptive adversaries.

LLM Reasoning and Planning

Mon, 13 Jul 2026 00:00:00 +0000

Overview

Large language models can appear to reason, yet generation is autoregressive: each token is chosen from the immediate context, one step at a time. This local view is powerful, but it explains familiar failure modes, such as reasoning that drifts, contradicts itself, takes redundant detours, or commits early to a path that later proves wrong. We study how to make model reasoning globally coherent, efficient, and trustworthy by helping a model decide where it is going before it takes the next step.

Long-Term Fairness and ML Safety

Thu, 01 May 2025 00:00:00 +0000

Overview

Many ML-enabled systems operate in dynamic environments: the system’s decisions change the environment, and those changes feed back into its future inputs. Certain self-reinforcing loops can amplify errors, entrench bias, and cause fairness violations in the long term even when immediate outcomes are fair. In predictive policing, for example, a model that flags a neighborhood as high-crime sends more patrols there, producing more recorded arrests, which the model reads as even higher crime. The same pattern appears in loan approvals that affect credit scores and in medical risk scoring that influences treatment access.

Fair-AutoML: Performance-Aware Fairness Repair

Wed, 15 Nov 2023 00:00:00 +0000

Overview

Bias mitigation algorithms typically work only in specific situations and often repair fairness at the cost of large accuracy drops, making them impractical for critical decision-making software. We treated fairness repair as a performance-aware optimization problem: fix the bias in a buggy model without ruining its accuracy.

Fair-AutoML, built on a state-of-the-art AutoML tool, made this concrete through two changes to standard AutoML: an optimization function that incorporates fairness objectives alongside accuracy, and a fairness-aware search space over candidate model configurations. A search-space pruning method further reduced computational cost and repair time.

Safety Assurance of ML-Based Systems

Wed, 01 Nov 2023 00:00:00 +0000

Overview

ML-based software makes predictions in settings where failures carry real safety consequences. Our motivating case study was the DHS passenger screening challenge, hosted on Kaggle with the largest prize pool in its history ($1.5 million): TSA screens more than two million passengers daily, high false alarm rates create checkpoint bottlenecks, and false negatives pose severe safety risks. We built abstractions of such ML-enabled systems and inferred preconditions that provide probable guarantees on the safety of their predictions.

Fairify: Fairness Verification of Neural Networks

Mon, 15 May 2023 00:00:00 +0000

Overview

We built Fairify, an SMT-based approach that verifies individual fairness of deep neural networks in production. Individual fairness requires that any two individuals who differ only in protected attributes such as race, sex, or age receive similar predictions; unlike group metrics, it captures worst-case discrimination. The property is hard to verify because it must be checked globally over the input domain and because of the non-linear computation nodes in the network.

Causal Fairness in Machine Learning Pipelines

Mon, 01 May 2023 00:00:00 +0000

Overview

Most fairness research treated a machine learning model as a single black box, measuring bias only from its final predictions. Real pipelines, however, contain an ordered set of components — data filtering, imputation, encoding, feature transformation, training, tuning — and each can affect the fairness of the resulting model. We investigated fairness at the component level: using causal reasoning, we intervened on one stage at a time, constructed an alternative pipeline without that stage, and measured the resulting prediction disparity to attribute unfairness to specific components.

ML Software Maintenance and Technical Debt

Mon, 14 Nov 2022 00:00:00 +0000

Overview

ML software has distinctive maintenance risks because data, models, pipelines, and code evolve together. Technical debt can infect the data that models are trained on, degrading the functional performance of ML systems in ways traditional debt does not, and the growing inclusion of ML components in modern software introduces new kinds of debt.

We study how this debt appears in ML repositories in the wild. Mining 68,821 self-admitted technical debts (SATDs) from all revisions of 2,686 mature ML repositories on GitHub, we build taxonomies of ML-specific debt, locate the pipeline stages where it accumulates, and track how it is introduced and removed — evidence developers and researchers can use to build maintainable ML systems.

Large-Scale Mining of Data Science Software

Sun, 01 May 2022 00:00:00 +0000

Overview

Data science components have become common in software, yet software engineering research on this class of systems needed data and tooling that did not exist. We built an infrastructure to mine data science software from GitHub at scale: we extended the Boa framework to parse Python using ANTLR grammars for Python 2 and 3, transformed the source into ASTs stored in Boa’s Protobuf format, and hosted the result on a Hadoop cluster where Boa’s domain-specific language runs automatically parallelized queries. The resulting dataset covered 1,558 mature, top-rated data science projects — about 5 million Python file snapshots across all revisions — and was later extended to parse Jupyter notebooks.