Reinforcement Learning of Beam Codebooks in Millimeter Wave and Terahertz MIMO Systems

Yu Zhang, Muhammad Alrabeiah, Ahmed Alkhateeb

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


Millimeter wave (mmWave) and terahertz MIMO systems rely on pre-defined beamforming codebooks for both initial access and data transmission. These pre-defined codebooks, however, are commonly not optimized for specific environments, user distributions, and/or possible hardware impairments. This leads to large codebook sizes with high beam training overhead which makes it hard for these systems to support highly mobile applications. To overcome these limitations, this paper develops a deep reinforcement learning framework that learns how to optimize the codebook beam patterns relying only on the receive power measurements. The developed model learns how to adapt the beam patterns based on the surrounding environment, user distribution, hardware impairments, and array geometry. Further, this approach does not require any knowledge about the channel, RF hardware, or user positions. To reduce the learning time, the proposed model designs a novel Wolpertinger-variant architecture that is capable of efficiently searching the large discrete action space. The proposed learning framework respects the RF hardware constraints such as the constant-modulus and quantized phase shifter constraints. Simulation results confirm the ability of the developed framework to learn near-optimal beam patterns for line-of-sight (LOS), non-LOS (NLOS), mixed LOS/NLOS scenarios and for arrays with hardware impairments without requiring any channel knowledge.

Original languageEnglish (US)
Pages (from-to)904-919
Number of pages16
JournalIEEE Transactions on Communications
Issue number2
StatePublished - Feb 1 2022


  • Beamforming codebook
  • millimeter wave (mmWave)
  • reinforcement learning
  • site-specific
  • terahertz (THz)

ASJC Scopus subject areas

  • Electrical and Electronic Engineering


Dive into the research topics of 'Reinforcement Learning of Beam Codebooks in Millimeter Wave and Terahertz MIMO Systems'. Together they form a unique fingerprint.

Cite this