The technique of Dynamic Programming for Armed Bandits is employed for solving the problem of maximizing the randomly depreciated gains of a store with unknown (finite random) number of clients with fixed (finite) number of sellers which skills are also random and will be represented as probability distributions which are themselves random. Hence, Armed Bandits’s framework will be considered with horizon being a random variable with a finite support, that far as the authors know, it has not yet been discussed. In addition, numerical examples are detailed in order to illustrate the versatility and practical implementation of the approach presented in this paper in two general contexts, given by the number of available products: one product only, such situation coincides with that in which the number of sales needs to be maximized. And, more than one product, in this case, the amount of sales is not necessarily ruled by a Bernoulli distribution.
Accepté le :
DOI : 10.1051/ro/2017015
Mots clés : Armed bandit model, dynamic programming, assignment of personal, random horizon, markov decision processes
@article{RO_2017__51_4_1119_0, author = {V\'azquez-Guevara, V{\'\i}ctor Hugo and Cruz\ensuremath{-}Su\'arez, Hugo and Velasco-Luna, Fernando}, title = {Optimal assignment of sellers in a store with a random number of clients via the {Armed} {Bandit} model}, journal = {RAIRO - Operations Research - Recherche Op\'erationnelle}, pages = {1119--1132}, publisher = {EDP-Sciences}, volume = {51}, number = {4}, year = {2017}, doi = {10.1051/ro/2017015}, mrnumber = {3783937}, zbl = {1396.49020}, language = {en}, url = {http://www.numdam.org/articles/10.1051/ro/2017015/} }
TY - JOUR AU - Vázquez-Guevara, Víctor Hugo AU - Cruz−Suárez, Hugo AU - Velasco-Luna, Fernando TI - Optimal assignment of sellers in a store with a random number of clients via the Armed Bandit model JO - RAIRO - Operations Research - Recherche Opérationnelle PY - 2017 SP - 1119 EP - 1132 VL - 51 IS - 4 PB - EDP-Sciences UR - http://www.numdam.org/articles/10.1051/ro/2017015/ DO - 10.1051/ro/2017015 LA - en ID - RO_2017__51_4_1119_0 ER -
%0 Journal Article %A Vázquez-Guevara, Víctor Hugo %A Cruz−Suárez, Hugo %A Velasco-Luna, Fernando %T Optimal assignment of sellers in a store with a random number of clients via the Armed Bandit model %J RAIRO - Operations Research - Recherche Opérationnelle %D 2017 %P 1119-1132 %V 51 %N 4 %I EDP-Sciences %U http://www.numdam.org/articles/10.1051/ro/2017015/ %R 10.1051/ro/2017015 %G en %F RO_2017__51_4_1119_0
Vázquez-Guevara, Víctor Hugo; Cruz−Suárez, Hugo; Velasco-Luna, Fernando. Optimal assignment of sellers in a store with a random number of clients via the Armed Bandit model. RAIRO - Operations Research - Recherche Opérationnelle, Tome 51 (2017) no. 4, pp. 1119-1132. doi : 10.1051/ro/2017015. http://www.numdam.org/articles/10.1051/ro/2017015/
R. Bellman, On the Theory of Dynamic Programming. Proc. of the National Academy of Sciences (1952). | MR | Zbl
D.A. Berry, Bandit Problems with random discounting, Mathematical learning. Models-Theory and Algorithms. Springer Verlag (1983). | MR | Zbl
D.A. Berry and B. Fristedt, Bandit Problems. Chapman and Hall (1985). | MR | Zbl
Markov Decision Processes on Borel Spaces with Total Cost and Random Horizon. J. Optimiz. Theory Appl. 162 (2014) 329–346. | DOI | MR | Zbl
, and ,Savings and consumption with uncertain horizon. J.Political Econ. 85 (1977) 265–281. | DOI
and ,K.R. Parthasarathy, Probability Measures on Metric Spaces. Academic Press (1967). | MR | Zbl
J. Wakefield, Bayesian and Frequentist Regression Methods. Springer Verlag (2013). | MR | Zbl
Cité par Sources :