The Water-Filling Algorithm: In-depth Explanation

The Water-Filling Algorithm: In-depth Explanation
Photo by Tucker Monticelli / Unsplash

What is the capacity optimal way of allocating power over multiple simultaneous transmissions? This is a common question that arises time and time again in communications. In general, it is not an easy question to answer. In many cases, the problem turns out to be NP-hard, and we need to rely on approximate solutions.

There are cases, however, when the capacity optimal power allocation can be achieved and there even exists an efficient algorithm to solve the problem. In this post, we will have an in-depth look at the water-filling algorithm.

Water-filling is a generic method for allocating transmission power over (equalizing) communications channels in the absence of inter-channel interference (ICI). The channels are assumed to be, in a technical sense, orthogonal. In simple terms, this means simultaneous parallel transmission without any interference in-between. So that we can increase the transmission power in one transmission without compromising the rate in the others. This can mean OFDMA, TDMA, ZF-MIMO, or any other means of removing the ICI by design.

Parallel simultaneous transmissions

We had a look at capacity-achieving power allocation in our previous post, where we solved the problem using a generic convex optimization algorithm from CVXPY. Here, we derive a custom algorithm for solving the problem more commonly known as water-filling.

The system model

Before going too deep into the water-filling algorithm, we need to define the context in which we are going to apply it. This is called the system model. As mentioned above, we consider \((N)\) simultaneous parallel transmission. Each transmission occurs over a channel with corresponding channel gain \(g_n\). Similarly, each transmission is allocated with transmission power \(p_n \geq 0\).

In this post, we consider there to be a single transmitter, for instance, a cell tower. This transmits to \(N\) users. Let's also consider the OFDMA type of scenario, where each user is allocated a separable chunk of bandwidth, which removes the inter-user interference.

System model illustration

The achievable rate of an individual transmission \((n = 1, \ldots, N)\), can be given as

\[ \text{log}_2 (1 + \text{SNR}_n(p)). \]

The signal-to-noise ratio is given as

\[ \text{SNR}_n(p_n) = {g_n p_n \over N_0}, \]

where \(g_n\) denotes the channel gain for the \(n^\text{th}\) channel, \(p_n\) is the corresponding power allocation and \(N_0\) is a fixed noise-level. For the sake of notational simplicity, we consider the noise level to be the same for all transmissions.

The sum-rate (throughput) for all \(N\) orthogonal channels is given by the sum of their individual rates. That is,

\[ \sum_{n=1}^N \text{log}_2 (1 + \text{SNR}_n(p_n)) \]

To get to the capacity of the said system, we need to set some constraints on the SNR. We will use the sum power budget (\(P\)), which limits the available power across the channels as

\[ \sum_{n=1}^N p_n = P \]

Now, we can write the capacity as

\[ C = \sum_{n=1}^N \text{log}_2 (1 + \text{SNR}_n(p^*_n)) \]

where \(p^*_n\) is the sum-rate maximizing power allocation for channel \(n\).  Now, we have the basic premise set up to come up with an algorithm that finds the optimal power allocation, namely, the water-filling algorithm.

Algorithm derivation

To see, where the water filling algorithm actually comes from, we formulate the capacity-achieving power allocation as a throughput maximization problem, where we try to maximize the sum throughput over \(N\) channels with varying channel gains while being limited by a power budget. This problem can be mathematically formulated as

\[ \begin{array}{rl} \underset{p_n}{\text{max.}} & \displaystyle \sum_{n=1}^N \text{log}_2 (1 + {g_n p_n \over N_0}) \\ \text{s.t.} & \displaystyle \sum_{n=1}^N p_n = P \\ & \displaystyle p_n \geq 0 \end{array} \]

This is a convex problem, so any solution satisfying the Karush-Kuhn-Tucker (KKT) conditions can be expected to be optimal.

We start with the stationary condition. The derivative of the Lagrangian w.r.t. all \(p_n\) has to equal zero

\[ \begin{array}{rl} \Delta_{p_n} (-\sum_{n=1}^N \text{log}_2 (1 + {g_n p_n \over N_0}) + \alpha(\sum_{n=1}^N p_n - P) - \sum_{n=1}^N\lambda_n p_n) & \displaystyle = 0 \end{array} \]

Taking the gradient, the expression simplifies into

\[ \begin{array}{rl} -{g_n \over g_n p_n + N_0} + \alpha - \lambda_n & \displaystyle = 0 \\ \Leftrightarrow {g_n \over g_n p_n + N_0} & \displaystyle = \alpha - \lambda_n \\ \Leftrightarrow {g_n \over {\alpha - \lambda_n} } & \displaystyle = (g_n p_n + N_0) \\ \Leftrightarrow {g_n \over {\alpha - \lambda_n} } - N_0 & \displaystyle = g_n p_n \\ \Leftrightarrow {1 \over {\alpha - \lambda_n} } - {N_0 \over g_n} & \displaystyle = p_n \end{array} \]

This gives as the solution for channel specific power allocation

\[ p_n = \Big({1 \over {\alpha - \lambda_n} } - {N_0 \over g_n}\Big)^{+}  \]

We consider only the positive solutions due to the primal constraint \(p_n \geq 0\).

Having the channel-specific dual variables \(\lambda_n\) appearing in the solution makes solving the problem quite complex.  Luckily, there is a way to get rid of them. Taking a look at the complementary slackness KKT condition

\[\lambda_n p_n = 0\]

This states that either \(\lambda_n = 0\) or \(p_n = 0\), which means that if \(p_n > 0\), then \(\lambda_n = 0\) and the power allocation reduces to

\[ p_n = \Big({1 \over {\alpha} } - {N_0 \over g_n}\Big)^{+} \]

We can further simplify the power allocation by assigning a one-to-one mapping

\[ \bar{\alpha} = \frac{1}{\alpha} \]

Then, we get

\[ p_n = \Big(\bar{\alpha} - {N_0 \over g_n}\Big)^{+} \]

Now, we can search for such \(\bar{\alpha}\) that satisfies the power budget allocation, and we get the optimal power allocation.

Bisection search for the power level

There are many ways of computing the optimal power. One of the most efficient methods is known as the bisection search.

You set initial upper (\(U\)) and lower (\(L\)) bounds and always set the current level to be in the middle of bounds.

For every power level, you test whether the power budget is under or over-utilized. If the power budget is over-utilized, you move the upper bound to the current power level (\(U = \bar{\alpha}\)). On the other hand, if the power level is below the power budget, you move the lower bound up (\(L = \bar{\alpha}\)).

Effectively you divide the search space on every iteration, which makes the algorithm converge very fast.

Bisection search for the optimal power level

Randomly choosing a large enough upper bound can be problematic. We can derive our initial guess for the upper bound directly from the power constraints by making an assumption that all the transmission will be active:

\[ P = \sum_{n=1}^N \Big(\bar{\alpha} - {N_0 \over g_n}\Big) = N\bar{\alpha} -  \sum_{n=1}^N  {N_0 \over g_n} \]

This simplifies into the upper bound

\[ U = \frac{P + \sum_{n=1}^N {N_0 \over g_n}}{N} \]

Algorithm outline

Finally, the algorithm can be summarized as

  • Choose desired convergence threshold \(\epsilon > 0\)
  • Initialize bounds \(L = 0\) and \(U = \frac{P + \sum_{n=1}^N {N_0 \over g_n}}{N}\).
  • Iterate until \(P - \sum_{n=1}^N p_n \leq \epsilon\)
    1. \(\bar{\alpha} = \frac{U-L}{2}\)
    2. \(p_n = \bar{\alpha} - \frac{N_0}{g_n}\ \forall n \in [1,N] \)
    3. if \(P > \sum_{n=1}^N p_n\) then \(U = \bar{\alpha}\)
    4. Otherwise \(L = \bar{\alpha}\)


This concludes our in-depth explanation of how the water-filling algorithm works. It is applicable to many situations and can be extended to cover various power constraints and interference conditions.    

For a Python implementation of the algorithm, check our previous post.  

Further reading