The reSAID Lab builds the methods, tools, and empirical understanding needed to engineer AI-enabled software that is reliable, fair, safe, and maintainable. Our work spans the full lifecycle of AI software — from how large language models reason and generate code, to how ML components are specified, verified, tested, and maintained in real systems. The themes below describe the questions we focus on; the projects that follow show how we pursue them, and each links to the publications it produced.

Research Themes

We work at the intersection of software engineering and artificial intelligence, with an emphasis on building responsible, trustworthy AI systems.

Responsible AI Engineering

Designing, building, and evaluating AI systems that are fair, safe, reliable, accountable, and aligned with human values. We develop methods for verifying, testing, and auditing AI software throughout the engineering lifecycle—spanning long-term fairness under feedback loops, compositional fairness in pipelines, and trustworthy system design.

LLMs and Coding Agents

Studying how large language models reason, plan, and generate code. We investigate LLM reasoning capabilities, coding agent reliability, bias in LLM outputs, and the technical debt that arises when LLMs are used in software development.

Formal Verification and Program Analysis

Applying formal methods, static and dynamic program analysis, and verification techniques to reason about correctness, safety, and fairness properties of software and ML-enabled systems. We build tools that provide provable guarantees and detect defects in AI software.

Software Engineering for AI

Applying software engineering principles—architecture, testing, maintenance, and quality—to AI and ML systems. We study data science pipelines, ML system architecture, and engineering practices for AI-enabled software.

Projects

Active and recent projects, organized by topic. Each project links to the papers behind it.

We study how autonomous coding agents fail during ordinary development work and design safeguards, from constraint enforcement to failure transparency and safe-halt behaviors, for deploying them responsibly.

Testing and analysis for hidden failure modes in large language and vision-language models, from social bias under black-box access to reasoning-level backdoors.

Methods that make language-model reasoning more deliberate, structured, and inspectable, separating high-level planning from low-level action generation.

We provided safety assurance for ML-enabled systems by inferring preconditions over pipeline abstractions and identifying safety risks from feedback loops between the system and its environment.

All publications →