<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>LLMs on reSAID Lab</title><link>https://resaid-lab.github.io/categories/llms/</link><description>Recent content in LLMs on reSAID Lab</description><generator>Hugo</generator><language>en-US</language><lastBuildDate>Tue, 08 Sep 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://resaid-lab.github.io/categories/llms/index.xml" rel="self" type="application/rss+xml"/><item><title>Trustworthy LLMs and VLMs</title><link>https://resaid-lab.github.io/projects/llm-bias-testing/</link><pubDate>Tue, 08 Sep 2026 00:00:00 +0000</pubDate><guid>https://resaid-lab.github.io/projects/llm-bias-testing/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Large language and vision-language models are deployed in settings where biased,
inconsistent, or manipulated behavior can affect users, yet their internals are
often unavailable or hard to inspect. We develop methods that expose and
characterize such hidden failures, treating trustworthiness as a property that
must be tested for rather than assumed — and connecting each testing method to a
concrete path for mitigation or defense.&lt;/p&gt;
&lt;p&gt;A recurring theme in our work is that trustworthiness must account for a model&amp;rsquo;s
reasoning process, not only its final answer. Attacks and guardrails that operate
on outputs alone tend to leave reasoning traces that are inconsistent or easy to
flag, but as models increasingly expose their chain-of-thought, the reasoning
itself becomes both a new attack surface and a new opportunity for defense. We
study how bias and backdoor threats propagate through model behavior, how to
characterize them with principled signals, and how to build safeguards that hold
up against adaptive adversaries.&lt;/p&gt;</description></item><item><title>LLM Reasoning and Planning</title><link>https://resaid-lab.github.io/projects/plan-then-action/</link><pubDate>Mon, 13 Jul 2026 00:00:00 +0000</pubDate><guid>https://resaid-lab.github.io/projects/plan-then-action/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Large language models can appear to reason, yet generation is autoregressive: each token is chosen from the immediate context, one step at a time. This local view is powerful, but it explains familiar failure modes, such as reasoning that drifts, contradicts itself, takes redundant detours, or commits early to a path that later proves wrong. We study how to make model reasoning globally coherent, efficient, and trustworthy by helping a model decide where it is going before it takes the next step.&lt;/p&gt;</description></item></channel></rss>