Online Caching Policy with User Preferences and Time-Dependent Requests

Content caching is a promising approach to reduce data traffic in the back-haul links. We consider a system where multiple users request items from a cache-enabled base station that is connected to a cloud. The users request items according to the user preferences in a time-dependent fashion, i.e., a user is likely to request the next chunk (item) of the file requested at a previous time slot. Whenever the requested item is not in the cache, the base station downloads it from the cloud and forwards it to the user. In the meanwhile, the base station decides whether to replace one item in the cache by the fetched item, or to discard it. We model the problem as a Markov decision process (MDP) and propose a novel state space that takes advantage of the dynamics of the users’ requests. We use reinforcement learning and propose a Q-learning algorithm to find an optimal cache replacement policy that maximizes the cache hit ratio without knowing the popularity profile distribution, probability distribution of items, and user preference model. Simulation results show that the proposed algorithm improves the cache hit ratio compared to other baseline policies.

Hatami Mohammad, Leinonen Markus, Codreanu Marian

A4 Article in conference proceedings

53rd Annual Asilomar Conference on Signals, Systems, and Computers 2019. Pasific Grove, USA, Nov 3-6, 2019

M. Hatami, M. Leinonen and M. Codreanu, "Online Caching Policy with User Preferences and Time-Dependent Requests: A Reinforcement Learning Approach," 2019 53rd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 2019, pp. 1384-1388.

https://doi.org/10.1109/IEEECONF44664.2019.9048832 http://urn.fi/urn:nbn:fi-fe202003319802