Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

Authors: Qi Heng Ho, Martin S. Feather, Federico Rossi, Zachary N. Sunberg, Morteza Lahijanian

Year: 2024

Source: https://arxiv.org/abs/2406.02871

TLDR:

This paper addresses the Maximal Reachability Probability Problem (MRPP) in Partially Observable Markov Decision Processes (POMDPs) without discounting, proposing a novel trial-based heuristic search value iteration algorithm that efficiently explores the belief space and provides policies with two-sided bounds on optimal reachability probabilities. The authors prove the algorithm's convergence under certain conditions and demonstrate through experimental evaluations that it outperforms existing methods in terms of probability guarantees and computation time, offering a promising approach for solving complex decision-making problems under uncertainty.

Free Login To Access AI Capability

Free Access To ChatGPT

The paper presents a new algorithm for solving the Maximal Reachability Probability Problem in POMDPs without discounting, leveraging trial-based heuristic search value iteration to efficiently explore the belief space and provide policies with two-sided bounds on optimal reachability probabilities, demonstrating improved performance over existing methods in both probability guarantees and computation time.

Free Access to ChatGPT

Abstract

Partially Observable Markov Decision Processes (POMDPs) are powerful models for sequential decision making under transition and observation uncertainties. This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem (MRPP), where the goal is to maximize the probability of reaching some target states. This is also a core problem in model checking with logical specifications and is naturally undiscounted (discount factor is one). Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. Specifically, we focus on trial-based heuristic search value iteration techniques and present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space (informed search via value bounds) while addressing their drawbacks in handling loops for indefinite-horizon problems. The algorithm produces policies with two-sided bounds on optimal reachability probabilities. We prove convergence to an optimal policy from below under certain conditions. Experimental evaluations on a suite of benchmarks show that our algorithm outperforms existing methods in almost all cases in both probability guarantees and computation time.

Method

The authors used a novel trial-based heuristic search value iteration methodology to address the Maximal Reachability Probability Problem (MRPP) in Partially Observable Markov Decision Processes (POMDPs) without discounting. This approach efficiently explores the belief space and provides policies with two-sided bounds on optimal reachability probabilities, leading to improved performance over existing methods in terms of probability guarantees and computation time.

Main Finding

The authors discovered a novel algorithm for solving the Maximal Reachability Probability Problem (MRPP) in Partially Observable Markov Decision Processes (POMDPs) without discounting. Their algorithm employs a trial-based heuristic search value iteration technique that effectively navigates the belief space, resulting in policies with two-sided bounds on optimal reachability probabilities. The authors also proved the convergence of their algorithm under certain conditions and, through experimental evaluations, showed that it outperforms existing methods in most cases, offering better probability guarantees and requiring less computation time.

Conclusion

The conclusion of the paper is that the authors have successfully developed a new algorithm for addressing the Maximal Reachability Probability Problem (MRPP) in Partially Observable Markov Decision Processes (POMDPs) without discounting. This algorithm utilizes a trial-based heuristic search value iteration approach to explore the belief space efficiently and provides policies with two-sided bounds on optimal reachability probabilities. The authors have demonstrated through theoretical analysis and experimental evaluations that their algorithm converges under specific conditions and outperforms existing methods in terms of probability guarantees and computation time, making it a significant contribution to the field of decision-making under uncertainty.

Keywords

Partially Observable Markov Decision Processes (POMDPs), Maximal Reachability Probability Problem (MRPP), trial-based heuristic search, value iteration, belief space, two-sided bounds, optimal reachability probabilities, convergence, experimental evaluations, benchmark comparisons, decision-making under uncertainty.

The Best AI PDF Reader

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

Abstract

Method

Main Finding

Conclusion

Keywords

Read Paper with AI

AI Presentation

Chrome Extension