Joint Caching and Computing Service Placement for Edge-Enabled IoT based on Deep Reinforcement Learning

By placing edge service functions in proximity to IoT facilities edge computing can satisfy various IoT applications’ resource and latency requirements. Sensing-data-driven IoT applications are prevalent in IoT systems and their task processing relies on sensing data from sensors. Therefore to ensure the quality of service (QoS) of such applications in an edge-enabled IoT system dedicated caching functions (CFs) are required to cache necessary sensing data. This paper considers an edge-enabled IoT system and investigates the joint caching and computing service placement (JCCSP) problem for sensing-data-driven IoT applications. Then deep reinforcement learning (DRL) is exploited since it can adapt to a heterogeneous system with limited prior knowledge. In the proposed DRL-based approaches a policy network based on the encoder-decoder model is constructed to address the issue of varying sizes of JCCSP states and actions caused by different numbers of CFs related to applications. Then an on-policy REINFORCE-based method is adopted to train the policy network. After that an off-policy training method based on the twin-delayed (TD) deep deterministic policy gradient (DDPG) is proposed to enhance the training efficiency and experience utilization. In the proposed DDPG-based method a weight-averaged twin-Q-delayed (WATQD) algorithm is introduced to reduce the bias of Q-value estimation. Simulation results show that our proposed DRL-based JCCSP approaches can achieve converged performance that is significantly superior to benchmarks. Moreover compared with the original TD method the proposed WATQD method can significantly improve the training stability.