- Internal use
Time: 10:00 AM on Dec. 21 (Mon.), 2015
Place: E204 @ EB1
Title: On the Whittle Index for Restless Multi-armed Hidden Markov Bandits
We consider a restless multi-armed bandit in which each arm can be in one two states. When an arm is sampled, the state of the arm is not available to the sampler. Instead, a binary signal with a known randomness that depends on the state of the arm is made available. No signal is displayed if the arm is not sampled. An arm-dependent reward is accrued from each sampling. In each time step, each arm changes state according to known transition probabilities which in turn depend on whether the arm is sampled or not sampled. Since the state of the arm is never visible and has to be inferred from the current belief and a possible binary signal, we call this the hidden Markov bandit. Our interest is in a policy to select the arm(s) in each time step that maximises the infinite horizon discounted reward. Specifically, we seek the use of Whittle’s index in selecting the arms.
We first analyze the single-armed bandit and show that it admits an approximate threshold-type optimal policy when the ‘no-sample’ action is subsidised. Next, we show that this also satisfies an approximate-indexability property. Numerical examples support the analytical results.
D. Manjunath received his BE from Mysore University, MS from IIT Madras and PhD from Rensselaer Polytechnic Institute in 1986, 1989 and 1993 respectively. He has been with the Electrical Engineering Department of IIT Bombay since July 1998 where he is now an Institute Chair Professor. He has previously worked in the Corporate R & D Center of General Electric in Scehenectady NY (1990), Computer and Information Sciences Department of the University of Delaware (1992–93), Computer Science Department, University of Toronto (1993–94) and the Department of Electrical Engineering of IIT Kanpur (1994–98). At IIT Bombay, he has been the Head of the Computer Centre since 2011. His research interests are in the general areas of communication networks and performance analysis. His recent research has concentrated on random networks with applications in wireless and sensor networks, network pricing and queue control. He is a recipient of the best paper award at ACM SIGMETRICS 2010. He has been an associate editor of “IEEE Transactions on Networking” (2011-2015), “Queueing Systems: Theory and Applications,” and of Sadhana: The Proceedings of the Indian Academy of Sciences. He was TPC chair for COMSNETS 2011 and NCC 2015 and general chair for ACM MobiHoc 2013 and COMSNETS 2015. He is a coauthor of two textbooks, “Communication Networking: An Analytical Approach” (May 2004) and “Wireless Networking” (Apr 2008), both of which are published by Morgan-Kaufman Publishers.