Reinforcement Learning Based Resource Allocation for Energy-Harvesting-Aided D2D Communications in IoT Networks
Date
2023-03-10
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ORCID
0000-0002-3961-6836
Type
Thesis
Degree Level
Masters
Abstract
It is anticipated that mobile data traffic and the demand for higher data rates will increase dramatically as a result of the explosion of wireless devices, such as the Internet of Things (IoT) and machine-to-machine communication. There are numerous location-based peer-to-peer services available today that allow mobile users to communicate directly with one another, which can help offload traffic from congested cellular networks. In cellular networks, Device-to-Device (D2D) communication has been introduced to exploit direct links between devices instead of transmitting through a the Base Station (BS).
However, it is critical to note that D2D and IoT communications are hindered heavily by the high energy consumption of mobile devices and IoT devices. This is because their battery capacity is restricted. There may be a way for energy-constrained wireless devices to extend their lifespan by drawing upon reusable external sources of energy such as solar, wind, vibration, thermoelectric, and radio frequency (RF) energy in order to overcome the limited battery problem. Such approaches are commonly referred to as Energy Harvesting (EH) There is a promising approach to energy harvesting that is called Simultaneous Wireless Information and Power Transfer (SWIPT).
Due to the fact that wireless users are on the rise, it is imperative that resource allocation techniques be implemented in modern wireless networks. This will facilitate cooperation among users for limited resources, such as time and frequency bands.
As well as ensuring that there is an adequate supply of energy for reliable and efficient communication, resource allocation also provides a roadmap for each individual user to follow in order to consume the right amount of energy. In D2D networks with time, frequency, and power constraints, significant computing power is generally required to achieve a joint resource management design. Thus the purpose of this study is to develop a resource allocation scheme that is based on spectrum sharing and enables low-cost computations for EH-assisted D2D and IoT communication.
Until now, there has been no study examining resource allocation design for EH-enabled IoT networks with SWIPT-enabled D2D schemes that utilize learning techniques and convex optimization. In most of the works, optimization and iterative approaches with a high level of computational complexity have been used which is not feasible in many IoT applications. In order to overcome these obstacles, a learning-based resource allocation mechanism based on the SWIPT scheme in IoT networks is proposed, where users are able to harvest energy from different sources. The system model consists of multiple IoT users, one BS, and multiple D2D pairs in EH-based IoT networks. As a means of developing an energy-efficient system, we consider the SWIPT scheme with D2D pairs employing the time switching method (TS) to capture energy from the environment, whereas IoT users employ the power splitting method (PS) to harvest energy from the BS. A mixed-integer nonlinear programming (MINLP) approach is presented for the solution of the Energy Efficiency (EE) problem by jointly optimizing subchannel allocation, power-splitting factor, power, and time together. As part of the optimization approach, the original EE optimization problem is decomposed into three subproblems, namely: (a) subchannel assignment and power splitting factor, (b) power allocation, and (c) time allocation. In order to solve the subproblem assignment problem, which involves discrete variables, the Q-learning approach is employed.
Due to the large size of the overall problem and the continuous nature of certain variables, it is impractical to optimize all variables by using the learning technique. Instead dealing for the continuous variable problems, namely power and time allocation, the original non-convex problem is first transformed into a convex one, then the Majorization-Minimization (MM) approach is applied as well as the Dinkelbach.
The performance of the proposed joint Q-learning and optimization algorithm has been evaluated in detail. In particular, the solution was compared with a linear EH model, as well as two heuristic algorithms, namely the constrained allocation algorithm and the random allocation algorithm, in order to determine its performance. The results indicate that the technique is superior to conventional approaches. For example, it can be seen that for the distance of $d = 10$ m, our proposed algorithm leads to EE improvement when compared to the method such as prematching algorithm, constrained allocation, and random allocation methods by
about 5.26\%, 110.52\%, and 143.90\%, respectively.
Considering the simulation results, the proposed algorithm is superior to other methods in the literature. Using spectrum sharing and harvesting energy from D2D and IoT devices achieves impressive EE gains. This superior performance can be seen both in terms of the average and sum EEs, as well as when compared to other baseline schemes.
Description
Keywords
Device-to-device (D2D) communications, energy harvesting, Internet of Things (IoT) networks, majorization– minimization (MM), mixed-integer nonlinear problem (MINLP), reinforcement learning (RL), resource management, spectrum sharing.
Citation
Degree
Master of Science (M.Sc.)
Department
Electrical and Computer Engineering
Program
Electrical Engineering