Meta multi-objective reinforcement learning for communication load balancing
With the rapid adoption of fifth-generation (5G) communication systems and the increasing data demand per device, communication traffic is soaring to reach unprecedented numbers in the upcoming years. Moreover, the traffic is often unevenly distributed across frequency bands and base stations, thereby resulting in a degradation of the network throughput and user experience. Thus, load balancing has become a key technique to adjust the traffic load by offloading users from overloaded cells to less crowded neighboring ones. In this thesis, we study multi-objective reinforcement learning (MORL) and meta reinforcement learning (meta-RL) for load balancing to learn highly customized policies for different trade-offs between network performance metrics. We begin with a thorough review of existing load balancing literature to motivate the need for better algorithms that can further improve the network performance and the user's quality of service (QoS). Specifically, we emphasize the importance of a multi-objective approach to solve the load balancing problem since network providers aim to simultaneously optimize multiple conflicting objectives by adjusting the load balancing parameters. Using MORL, we formulate communication load balancing as a multi-objective control problem where the agent seeks to find optimal policies depending on possible trade-offs between the network performance indicators. Motivated by the dynamic nature of wireless networks, we propose a practical algorithm based on meta-RL concepts to compute a general load balancing policy capable of rapidly adjusting to new trade-offs. Indeed, the learned meta-policy can be fine-tuned for given preferences using fewer samples and gradient steps. We further enhance the generalization and adaptation of our proposed meta-RL solution using policy distillation techniques. To showcase the effectiveness of our framework, experiments are conducted based on real-world traffic scenarios. Our results show that our load balancing framework can: i) significantly outperform the existing rule-based load balancing methods ii) achieve better performance than single-objective solutions iii) compute better Pareto front approximations compared to several MORL baselines and iv) quickly adapt to new unseen objectives. To conclude, we analyze the limitations of the proposed solutions and we discuss several future promising directions for multi-objective load balancing.
Downlink Communication, Load Balancing, Multi-Objective Reinforcement Learning, Meta-Reinforcement Learning, Policy Distillation