Influence of modeling structure in probabilistic sequential decision problems
RAIRO - Operations Research - Recherche Opérationnelle, Tome 40 (2006) no. 2, pp. 195-234.

Markov Decision Processes (MDPs) are a classical framework for stochastic sequential decision problems, based on an enumerated state space representation. More compact and structured representations have been proposed: factorization techniques use state variables representations, while decomposition techniques are based on a partition of the state space into sub-regions and take advantage of the resulting structure of the state transition graph. We use a family of probabilistic exploration-like planning problems in order to study the influence of the modeling structure on the MDP solution. We first discuss the advantages and drawbacks of a graph based representation of the state space, then present our comparisons of two decomposition techniques, and propose to use a global approach combining both state space factorization and decomposition techniques. On the exploration problem instance, it is proposed to take advantage of the natural topological structure of the navigation space, which is partitioned into regions. A number of local policies are optimized within each region, that become the macro-actions of the global abstract MDP resulting from the decomposition. The regions are the corresponding macro-states in the abstract MDP. The global abstract MDP is obtained in a factored form, combining all the initial MDP state variables and one macro-state “region” variable standing for the different possible macro-states corresponding to the regions. Further research is presently conducted on efficient solution algorithms implementing the same hybrid approach for tackling large size MDPs.

DOI : 10.1051/ro:2006019
Classification : 90C40, 68T37, 68T20, 68R05, 11Y05, 68R10
Mots-clés : probabilistic planning, dynamic programming, Markov decision processes, application to autonomous decision making
@article{RO_2006__40_2_195_0,
     author = {Teichteil-K\"onigsbuch, Florent and Fabiani, Patrick},
     title = {Influence of modeling structure in probabilistic sequential decision problems},
     journal = {RAIRO - Operations Research - Recherche Op\'erationnelle},
     pages = {195--234},
     publisher = {EDP-Sciences},
     volume = {40},
     number = {2},
     year = {2006},
     doi = {10.1051/ro:2006019},
     zbl = {1112.90092},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/ro:2006019/}
}
TY  - JOUR
AU  - Teichteil-Königsbuch, Florent
AU  - Fabiani, Patrick
TI  - Influence of modeling structure in probabilistic sequential decision problems
JO  - RAIRO - Operations Research - Recherche Opérationnelle
PY  - 2006
SP  - 195
EP  - 234
VL  - 40
IS  - 2
PB  - EDP-Sciences
UR  - http://www.numdam.org/articles/10.1051/ro:2006019/
DO  - 10.1051/ro:2006019
LA  - en
ID  - RO_2006__40_2_195_0
ER  - 
%0 Journal Article
%A Teichteil-Königsbuch, Florent
%A Fabiani, Patrick
%T Influence of modeling structure in probabilistic sequential decision problems
%J RAIRO - Operations Research - Recherche Opérationnelle
%D 2006
%P 195-234
%V 40
%N 2
%I EDP-Sciences
%U http://www.numdam.org/articles/10.1051/ro:2006019/
%R 10.1051/ro:2006019
%G en
%F RO_2006__40_2_195_0
Teichteil-Königsbuch, Florent; Fabiani, Patrick. Influence of modeling structure in probabilistic sequential decision problems. RAIRO - Operations Research - Recherche Opérationnelle, Tome 40 (2006) no. 2, pp. 195-234. doi : 10.1051/ro:2006019. http://www.numdam.org/articles/10.1051/ro:2006019/

[1] D. Aberdeen, S. Thiébaux and L. Zhang, Decision-theoretic military operations planning, in Proceedings of 14th Conf. ICAPS 2004, Whistler, Canada (2004) 402-412.

[2] R. Bellman, Dynamic Programming. Princeton University Press, Princeton, NJ (1957). | MR | Zbl

[3] D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming: an overview (1995).

[4] B. Bonet and H. Geffner, Labeled rtdp: Improving the convergence of real-time dynamic programming, in Proceedings of 13th Conf. ICAPS 2003, Trento, Italy (2003) 12-21.

[5] C. Boutilier and D. Poole, Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, USA, AAAI Press / The MIT Press (1996) 1168-1175.

[6] C. Boutilier, Correlated action effects in decision theoretic regression, in Uncertainty in Artificial Intelligence (1997) 30-37.

[7] C. Boutilier, R.I. Brafman and C. Geib, Structured reachability analysis for Markov decision processes, in Uncertainty in Artificial Intelligence (1998) 24-32.

[8] C. Boutilier, T. Dean and S. Hanks, Decision-theoretic planning: Structural assumptions and computational leverage. J. Artificial Intell. Res. 11 (1999) 1-94. | Zbl

[9] C. Boutilier, R. Dearden and M. Goldszmidt, Stochastic dynamic programming with factored representations. Artificial Intell. 121 (2000) 49-107. | Zbl

[10] A.R. Cassandra, Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. Computer science, U. of Illinois, Providence R.I. (1998).

[11] T. Dean and K. Kanazawa, A model for reasoning about persistence and causation. Computational Intelligence 5 (1989) 142-150.

[12] T. Dean and S.-H. Lin, Decomposition techniques for planning in stochastic domains, in Proceedings of the 14th IJCAI 1995, San Francisco, CA (1995) 1121-1129.

[13] R. Dearden and C. Boutilier, Abstraction and approximate decision-theoretic planning. Artificial Intell. 89 (1997) 219-283. | Zbl

[14] T.G. Dietterich, Hierarchical reinforcement learning using the maxq value function decomposition. J. Artificial Intell. Res. 13 (2000) 227-303. | Zbl

[15] I.S. Duff, A survey of sparse matrix research, in Proceedings of the IEEE, Prentice Hall, New York 65 (1977) 500-535.

[16] I.S. Duff, A.M. Erisman and J.K. Reid, Direct Methods for Sparse Matrices. Clarendon Press, Oxford (1986). | MR | Zbl

[17] A. Dutech, Solving pomdp's using selected past events, in Proceedings of the 14th ECAI 2000, Berlin, Germany (July 2000) 281-285.

[18] P. Fabiani and F. Teichteil-Königsbuch, Symbolic heuristic policy iteration algorithms for structured decision-theoretic exploration problems, in Workshop on Reasoning under Uncertainty in Robotics - RUR'2005, Edinburgh, Scotland (2005).

[19] Z. Feng and E. Hansen, Symbolic heuristic search for factored markov decision processes, in Proceedings of 18th Conf. AAAI 2002, Edmonton, Alberta, Canada (2002) 455-460.

[20] Z. Feng, E.A. Hansen and S. Zilberstein, Symbolic generalization for on-line planning, in Proceedings of 19th Conf. UAI 2003, Acapulco, Mexico (2003) 209-216.

[21] C. Guestrin, M. Hauskrecht and B. Kveton, Solving factored mdps with continuous and discrete variables, in Proceedings of 20th Conf. UAI 2004, Banff, Canada (2004).

[22] C. Guestrin, D. Koller, R. Parr and S. Venkataraman, Efficient solution algorithms for factored mdps. J. Artificial Intell. Res. 19 (2003) 399-468. | Zbl

[23] E.A. Hansen and S. Zilberstein, Lao*: A heuristic search algorithm that finds solutions with loops. Artificial Intell. 129 (2001) 35-62. | Zbl

[24] M. Hauskrecht, N. Meuleau, L.P. Kaelbling, T.L. Dean and C. Boutilier, Hierarchical solution of markov decision processes using macro-actions, in Proceedings of 14th Conf. UAI 1998, San Francisco, CA (1998) 220-229.

[25] J. Hoey, R. St-Aubin, A. Hu and C. Boutilier, Spudd: Stochastic planning using decision diagrams, in Proceedings of 15th Conf. UAI 1999, San Francisco, CA (1999) 279-288.

[26] J. Hoey, R. St-Aubin, A. Hu and C. Boutilier, Optimal and approximate stochastic planning using decision diagrams. Technical Report TR-2000-05, University of British Columbia, 10 (2000).

[27] K.-E. Kim and T. Dean, Solving factored mdps using non-homogeneous partitions. Artificial Intell. 147 (2003) 225-251. | Zbl

[28] B. Kveton and M. Hauskrecht, Heuristic refinements of approximate linear programming for factored continuous-state markov decision processes, in Proceedings of 14th Conf. ICAPS 2004, Whistler, Canada (2004) 306-314.

[29] S.M. Lavalle, A Game-Theoretic Framework for Robot Motion Planning. Electrical engineering, University of Illinois, Urbana-Champaign (1995).

[30] W.S. Lovejoy, A survey of algorithmic methods for partially observed markov decision processes. Technical Report 28, Annals of Operation Research (1991). | MR | Zbl

[31] R. Parr, Flexible decomposition algorithms for weakly coupled markov decision problems, in Proceedings of 14th Conf. UAI 1998, San Francisco, CA (1998) 422-430.

[32] J. Pineau, G. Gordon and S. Thrun, Policy-contingent abstraction for robust robot control, in Conference on Uncertainty in Articifical Intelligence (UAI) (2003) 477-484.

[33] M.L. Puterman, Markov Decision Processes. John Wiley & Sons, INC (1994). | MR | Zbl

[34] R.I. Bahar, E.A. Frohm, C.M. Gaona, G.D. Hachtel, E. Macii, A. Pardo and F. Somenzi, Algebraic Decision Diagrams and Their Applications, in IEEE /ACM International Conference on CAD, Santa Clara, California, 1993. IEEE Computer Society Press 188-191.

[35] Y. Saad, Iterative Methods for Sparse Linear Systems. Society of Industrial and Applied Mathematics, second edition (2003). | MR | Zbl

[36] R. Sabbadin, Graph partitioning techniques for markov decision processes decomposition, in Proceedings of the 15th ECAI 2002, Lyon, France (July 2002) 670-674.

[37] R. St-Aubin, J. Hoey and C. Boutilier, APRICODD: Approximate policy construction using decision diagrams, in NIPS (2000) 1089-1095.

[38] R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998).

[39] F. Teichteil-Königsbuch and P. Fabiani, Processus Décisionnel de Markov décomposé et factorisé pour l'optimisation d'une stratégie d'exploration. Revue d'Intelligence Artificielle 20 (2006) 133-179.

[40] F. Teichteil-Königsbuch and P. Fabiani, Symbolic heuristic policy iteration algorithms for structured decision-theoretic exploration problems, in Workshop on Planning under Uncertainty for Autonomous Systems ICAPS'2005, Monterey, CA, USA (2005).

[41] G. Verfaillie, F. Garcia and L. Péret, Deployment and Maintenance of a Constellation of Satellites: a Benchmark, in Proceedings of ICAPS'03 Workshop on Planning under Uncertainty and Incomplete Information, Trento, Italy (June 2003).

Cité par Sources :