Adaptive Stress Testing Black-Box LLM Planners
Foundation Model Evaluation
arXiv 2025 Pre-Print
Large language models (LLMs) have recently demonstrated success in generalizing across decision-making tasks including planning, control, and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. We argue that detecting such failures is necessary, especially in safety-critical scenarios. We propose a novel method for efficiently searching the space of prompt perturbations using adaptive stress testing (AST) with Monte-Carlo tree search (MCTS). Our AST formulation enables discovery of scenarios and prompts that cause language models to act with high uncertainty or even crash.
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
While researchers have shown promising results in deploying foundation models to decision-making tasks, these models are known to hallucinate and generate decisions that may sound reasonable, but are in fact poor. We argue there is a need to step back and simultaneously design systems that can quantify the certainty of a model's decision, and detect when it may be hallucinating. In this work, we discuss the current use cases of foundation models for decision-making tasks, provide a general definition for hallucinations with examples, discuss existing approaches to hallucination detection and mitigation with a focus on decision problems, present guidelines, and explore areas for further research in this exciting field.
Field Robotics
Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation
We propose a modular learning-based vision pipeline to generate delay-compensated images in real-time for supervisors. Our extensive offline evaluations demonstrate that our method generates more accurate images compared to state-of-the-art approaches in our setting. Additionally, ours is one of the few works to evaluate a delay-compensation method in outdoor field environments with complex terrain on data from a real robot in real-time.
Autonomous Driving
An Expert Ensemble for Detecting Anomalous Scenes, Interactions, and Behaviors in Autonomous Driving
The ability to detect anomalous situations outside of the operational design domain is a key component in self-driving cars, enabling us to mitigate the impact of abnormal ego behaviors and to realize trustworthy driving systems. On-road anomaly detection in egocentric videos remains a challenging problem due to the difficulties introduced by complex and interactive scenarios. We conduct a holistic analysis of common on-road anomaly patterns, from which we propose three unsupervised anomaly detection experts: a scene expert that focuses on frame-level appearances to detect abnormal scenes and unexpected scene motions; an interaction expert that models normal relative motions between two road participants and raises alarms whenever anomalous interactions emerge; and a behavior expert which monitors abnormal behaviors of individual objects by future trajectory prediction.
Structural Attention-Based Recurrent Variational Autoencoder for Highway Vehicle Anomaly Detection
In autonomous driving, detection of abnormal driving behaviors is essential to ensure the safety of vehicle controllers. Prior works in vehicle anomaly detection have shown that modeling interactions between agents improves detection accuracy, but certain abnormal behaviors where structured road information is paramount are poorly identified, such as wrong-way and off-road driving. We propose a novel unsupervised framework for highway anomaly detection which explicitly uses the structure of the environment to aid anomaly identification.
Learning to Navigate Intersections with Unsupervised Driver Trait Inference
Navigation through uncontrolled intersections is one of the key challenges for autonomous vehicles. Identifying the subtle differences in hidden traits of other drivers can bring significant benefits when navigating in such environments. We use a variational autoencoder with recurrent neural networks to learn a latent representation of traits without any ground truth trait labels. Then, we use this trait representation to learn a policy for an autonomous vehicle to navigate through a T-intersection with deep reinforcement learning.
Instruction-Following Agents
A Data-Efficient Visual-Audio Representation with Intuitive Fine-tuning for Voice-Controlled Robots
A command-following robot that serves people in everyday life must continually improve itself in deployment domains with minimal help from its end users, instead of engineers. Previous methods are either difficult to continuously improve after the deployment or require a large number of new labels during fine-tuning. Motivated by (self-)supervised contrastive learning, we propose a novel representation that generates an intrinsic reward function for command-following robot tasks by associating images with sound commands. After the robot is deployed in a new domain, the representation can be updated intuitively and data-efficiently by non-experts without any hand-crafted reward functions.
BEAST: Building an Embodied Action-Prediction System with Trajectory Data
Competed in the 2023 Amazon SimBot Competition
We introduce our system BEAST (Building an Embodied Action-prediction System with Trajectory data) for interactive instruction-following within the Alexa Arena Platform. Our system leverages the abstraction of navigation provided by the Arena to decouple the language and vision predictions. This allows for greater simplicity within the system, and for rapid augmentation of the trajectory data-set and training for our text-only action prediction model. By creating a framework with a focus towards user experience our system is more robust to errors in predictions, and informative to the user.
Robot Crowd Navigation
HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments
We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture. Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning.
Intention Aware Robot Crowd Navigation with Attention-Based Interaction Graph
We study the problem of safe and intention-aware robot navigation in dense and interactive crowds. Most previous reinforcement learning (RL) based methods fail to consider different types of interactions among all agents or ignore the intentions of people, which results in performance degradation. In this paper, we propose a novel recurrent graph neural network with attention mechanisms to capture heterogeneous interactions among agents through space and time.
Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement Learning
Safe and efficient navigation through human crowds is an essential capability for mobile robots. Previous work on robot crowd navigation assumes that the dynamics of all agents are known and well-defined. In addition, the performance of previous methods deteriorates in partially observable environments and environments with dense crowds. To tackle these problems, we propose a novel network that reasons about spatial and temporal relationships for robot decision making in crowd navigation.
Traffic Congestion Mitigation
Lessons in Cooperation: A Qualitative Analysis of Driver Sentiments towards Real-Time Advisory Systems from a Driving Simulator User Study
Real-time Advisory (RTA) systems, such as navigational and eco-driving assistants, are becoming increasingly ubiquitous in vehicles due to their benefits for users and society. Until autonomous vehicles mature, such advisory systems will continue to expand their ability to cooperate with drivers, enabling safer and more eco-friendly driving practices while improving user experience. However, the interactions between these systems and drivers have not been studied extensively. To this end, we conduct a driving simulator study (N=16) to capture driver reactions to a Cooperative RTA system.
Cooperative Advisory Residual Policies for Congestion Mitigation
Fleets of autonomous vehicles can mitigate traffic congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these approaches are limited in practice as they assume precise control over autonomous vehicle fleets, incur extensive installation costs for a centralized sensor ecosystem, and also fail to account for uncertainty in driver behavior. To this end, we develop a class of learned residual policies that can be used in cooperative advisory systems and only require the use of a single vehicle with a human driver. Our policies advise drivers to behave in ways that mitigate traffic congestion while accounting for diverse driver behaviors, particularly drivers' reactions to instructions, to provide an improved user experience.
PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems
Intelligent driving systems can be used to mitigate congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these systems assume precise control over autonomous vehicle fleets, and are hence limited in practice as they fail to account for uncertainty in human behavior. Piecewise Constant (PC) Policies address these issues by structurally modeling the likeness of human driving to reduce traffic congestion in dense scenarios to provide action advice to be followed by human drivers. However, PC policies assume that all drivers behave similarly. To this end, we develop a co-operative advisory system based on PC policies with a novel driver trait conditioned Personalized Residual Policy, PeRP. PeRP advises drivers to behave in ways that mitigate traffic congestion.
Towards Co-operative Congestion Mitigation
Accepted to SAPHRI Workshop at ICRA 2022
The effects of traffic congestion are widespread and are an impedance to everyday life. Piecewise constant driving policies have shown promise in helping mitigate traffic congestion in simulation environments. We intend to use the CARLA simulator alongside the Flow framework to conduct user studies to evaluate the affect of piecewise constant driving policies.
Machine Learning
Hierarchical Self-Imitation Learning in Single-Agent Sparse Reward Environments
Undergraduate Thesis in IDEALS
Reinforcement learning problems with sparse and delayed rewards are challenging to solve because the algorithms explore environments to gain experience from high performing rollouts. Classical methods of encouraging exploration during training such as epsilon-greedy and noisebased exploration are not adequate on their own to explore large state spaces. This thesis presents a single agent reinforcement learning algorithm that combines the effects of SIL and HL – Generative Adversarial Self Imitation Learning + Hierarchical Actor-Critic (GASIL+HAC).