A linear programming based approach for composite-action Markov decision processes

Zhang, Zhicong; Li, Shuai; Yan, Xiaohui; Zhang, Liangwei

doi:10.1051/ro/2018081

Zhang, Zhicong ¹ ; Li, Shuai ¹ ; Yan, Xiaohui ¹ ; Zhang, Liangwei ¹

¹

RAIRO - Operations Research - Recherche Opérationnelle, Tome 53 (2019) no. 5, pp. 1749-1761

Suite au passage du modèle économique de la revue en S20, le texte intégral des articles des années 2019 et 2020 est accessible uniquement sur le site de l'éditeur et est réservé aux abonnés.

Résumé

We study a time homogeneous discrete composite-action Markov decision process (CMDP) which needs to make multiple decisions at each state. In this particular Markov decision process, the state variables are divided into two separable sets and a two-dimensional composite action is chosen at each decision epoch. To solve a composite-action Markov decision process, we propose a novel linear programming model (Contracted Linear Programming Model, CLPM). We show that the CLPM model obtains the optimal state values of a CMDP process. We analyze and compare the number of variables and constraints of the CLPM model and the Traditional Linear Programming Model (TLPM). Computational experiments compare running times and memory usage of the two models. The CLPM model outperforms the TLPM model in both time complexity and space complexity by theoretical analysis and computational experiments.

Reçu le : 2017-12-12
Accepté le : 2018-09-19

MR Zbl

DOI : 10.1051/ro/2018081

Classification : 90C40, 90C05
Keywords: Markov decision process, linear programming, optimal state value, action

Affiliations des auteurs :

Zhang, Zhicong ¹ ; Li, Shuai ¹ ; Yan, Xiaohui ¹ ; Zhang, Liangwei ¹

¹

@article{RO_2019__53_5_1749_0,
     author = {Zhang, Zhicong and Li, Shuai and Yan, Xiaohui and Zhang, Liangwei},
     title = {A linear programming based approach for composite-action {Markov} decision processes},
     journal = {RAIRO - Operations Research - Recherche Op\'erationnelle},
     pages = {1749--1761},
     year = {2019},
     publisher = {EDP Sciences},
     volume = {53},
     number = {5},
     doi = {10.1051/ro/2018081},
     mrnumber = {4016528},
     zbl = {1431.90173},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/ro/2018081/}
}

TY  - JOUR
AU  - Zhang, Zhicong
AU  - Li, Shuai
AU  - Yan, Xiaohui
AU  - Zhang, Liangwei
TI  - A linear programming based approach for composite-action Markov decision processes
JO  - RAIRO - Operations Research - Recherche Opérationnelle
PY  - 2019
SP  - 1749
EP  - 1761
VL  - 53
IS  - 5
PB  - EDP Sciences
UR  - https://www.numdam.org/articles/10.1051/ro/2018081/
DO  - 10.1051/ro/2018081
LA  - en
ID  - RO_2019__53_5_1749_0
ER  -

%0 Journal Article
%A Zhang, Zhicong
%A Li, Shuai
%A Yan, Xiaohui
%A Zhang, Liangwei
%T A linear programming based approach for composite-action Markov decision processes
%J RAIRO - Operations Research - Recherche Opérationnelle
%D 2019
%P 1749-1761
%V 53
%N 5
%I EDP Sciences
%U https://www.numdam.org/articles/10.1051/ro/2018081/
%R 10.1051/ro/2018081
%G en
%F RO_2019__53_5_1749_0

Zhang, Zhicong; Li, Shuai; Yan, Xiaohui; Zhang, Liangwei. A linear programming based approach for composite-action Markov decision processes. RAIRO - Operations Research - Recherche Opérationnelle, Tome 53 (2019) no. 5, pp. 1749-1761. doi: 10.1051/ro/2018081

Bibliographie
Cité par

H.S. Ahn, I. Duenyas, M.E. Lewis, The optimal control of a two-stage tandem queueing system with flexible servers. Prob. Eng. Inf Sci. 16 (2002) 453–469. | MR | Zbl | DOI

H.S. Ahn and R. Righter, Multi-actor Markov decision processes. J. Appl. Prob. 42 (2005) 15–26. | MR | Zbl | DOI

S. Andradóttir and H. Ayhan, Throughput maximization for tandem lines with two stations and flexible servers. Oper. Res. 53 (2005) 516–531. | Zbl | DOI

J.J. Hasenbein and B. Kim, Throughput maximization for two station tandem systems: a proof of the Andradóttir-Ayhan conjecture. Queue. Syst. 67 (2011) 365–386. | MR | Zbl | DOI

D.L. Kaufman, H.S. Ahn and M.E. Lewis, On the introduction of an agile, temporary workforce into a tandem queueing system. Queue. Syst. 51 (2005) 135–171. | MR | Zbl | DOI

C. Kim and S. Dudin, Priority tandem queueing model with admission control. Comput. Ind. Eng. 61 (2011) 131–140. | DOI

E. Krkzlar, S. Andradóttir and H. Ayhan, Robustness of efficient server assignment policies to service time distributions in finite-buffered lines. Nav. Res. Logist. 57 (2010) 563–582. | DOI | MR | Zbl

E. Krkzlar, S. Andradóttir and H. Ayhan. Flexible servers in understaffed tandem lines, Prod. Oper. Manage. 21 (2012) 761–777. | DOI

D.G. Pandelis, Optimal control of flexible servers in two tandem queues with operating costs. Prob. Eng. Inf. Sci. 22 (2008) 107–131. | MR | Zbl | DOI

D.G. Pandelis, Optimal stochastic scheduling of two interconnected queues with varying service rates. Oper. Res. Lett. 36 (2008) 492–495. | MR | Zbl | DOI

D.G. Pandelis, Markov decision processes with multidimensional action spaces. Eur. J. Oper. Res. 200 (2010) 625–628. | MR | Zbl | DOI

M.L. Puterman, Markov Decision Processes: Discrete Dynamic Stochastic Programming. John Wiley & Sons Inc, New York, NY (1994). | MR | Zbl | DOI

C.H. Wu, D.G. Down and M.E. Lewis, Heuristics for allocation of reconfigurable resources in a serial line with reliability considerations. IIE Trans. 40 (2008) 595–611. | DOI

J. Yang, Nested Markov decision framework for coordinating pavement improvement with capacity expansion. J. Transp. Eng. 138 (2012) 387–394. | DOI

Cité par Sources :