OFDM-IDMA Uplink Multi-user System with Scalable Latency for Next Generation WLAN

In this paper, we propose an Interleave-Division Multiple Access (IDMA) based uplink multi-user system for next generation WLAN. By minimizing the latency through accurate detection per iteration, we were able to design a receiver architecture that meets the latency demands of current IEEE 802.11 WLAN. To do this, the proposed system utilizes a novel algorithm for simplified LLR calculation of the soft input soft output demapper needed in the IDMA first stage detection. The proposed system has a maximum of 34.8 bits/s/Hz spectral efficiency for a single spatial stream and can support up to 8 users in a single 20MHz channel. We compare the proposed system to a reference OFDMA system and show its advantages in terms of diversity, flexibility and BER performance.


Introduction
Interleave Division Multiple Access (IDMA) is a special form of Code Division Multiple Access (CDMA) where instead of unique spreading codes, the receiver differentiates each STA by their unique interleaving patterns. This leads to a low complexity receiver which grows linearly with the number of parallel stations (STAs) supported [1].
IDMA has several other advantages over uplink multiple access schemes such as orthogonal frequency division multiple access (OFDMA) and CDMA. These includes higher spectral efficiency and insensitivity to clipping distortion [2]- [4]. In addition, because all users utilize all subcarriers at the same time, there is no need for scheduling avoiding extra overhead, computational complexity and latency [5]. At the simplest case, the hardware complexity of the IDMA transmitter is very similar to a regular OFDMA or multicarrier CDMA transmitter. The IDMA transmitter however utilizes multiple interleaver patterns if the system supports multilayer transmission. The receiver on the other hand is recursive and requires deep memory hardware requirements. In [6], the author demonstrated the feasibility of implementing IDMA in current LSI technology.
IDMA has been previously proposed for cellular networks as an upgrade to the 3rd generation WCDMA system. In [7], the authors proposed a single carrier multilayer IDMA system for 3GPP long term evolution (LTE) systems. This system features direct enhancement of throughput and reliability from the previous CDMA based system. In [5], the performance of the multicarrier version of IDMA is analyzed in cellular environment. This paper focuses on the design of an IDMA based uplink multi-user system in IEEE 802. 11 Wireless LAN (WLAN). Currently, the 802.11 standardization of the next generation WLAN called IEEE 802.11ax has introduced a long desired uplink multi-user access feature to improve the system efficiency [8].
In [9], a preliminary proposal was presented in the IEEE TGax discussing the feasibility of the IDMA approach in IEEE 802.11 systems. The main problem that needs to be addressed in designing an IDMA based system in a random access network with bursty transmission is the latency. The 802.11 standard defines various interframe spaces (IFS) that need to be met by all STA to prevent collisions and maintain smooth operation. With IDMA however, each iteration consists of an interleaving and deinterleaving process causing huge latencies much higher than the defined IFS.
In this paper, we detail a latency adaptive receive algorithm first reported in [10] that only needs a few iterations to produce high reliable bit estimates. This algorithm utilizes a maxlog soft input soft output detector for the initial stage and a simple despreading operation in the second stage. Due to the high accuracy first stage, not only is the receiver iterations can be stopped at any number of iteration, the algorithm also works very well with very *Corresponding author.

680-4 Kawazu, Iizuka-shi, Fukuoka City, Japan
high order QAM modulation such as 64QAM and 256QAM without any requirement of minimum spreading factor and parallel number of users for convergence. High spectral efficiency operation however needs to be supported by diversity mechanisms such as maximal ratio combining of multiple receive antenna signals.
In order to minimize the hardware complexity, we also re-use a lot of existing IEEE 802.11 blocks such as channel coding, and constellation mapper/soft output demapper pair. While outside the scope of this paper, block re-use opens up the possibility of an OFDMA-IDMA hybrid multiple access system which adds more flexibility with regards to resource scheduling.
To reduce the bloat of the paper considering the enormous combinations of modes of operation, we only consider a single spatial stream system operating at one 20MHz channel. The extension to both multi-stream and higher bandwidth operation should be straightforward following the concepts discussed in this paper.
The rest of the paper proceeds as follows. In section 2, we discuss the current 802.11 architecture including a straightforward extension to OFDMA. Section 3 then describes the proposed IDMA architecture in detail. In section 4, we derive the proposed multi-user detection algorithm used in the orioised system. Numerical simulation results are then shown in section 5. Lastly, we conclude this paper in section 6.

IEEE 802.11 Architecture
In this section, we describe the 802.11 WLAN architecture. While there are technically a number of physical layer (PHY) options defined in the standard, only the OFDM PHY has become relevant in recent years. Up to the latest standard including the 802.11ac ammendments, the 802.11 PHY consists of an OFDM system with 312.5kHz subcarrier spacing (i.e. 64 subcarriers for 20MHz bandwidth) and guard interval of either 800ns or 400ns duration.

Transmitter
The block diagram of the 802.11 transmitter is shown in Fig. 1. The binary input signal is first scrambled to avoid long sequences of 1's or 0's which could degrade the performance of the system. The scrambled signal is then encoded using a convolutional encoder. Note that as an alternative, IEEE 802.11 devices can use low density parity check code (LDPC) and in this case the proceeding interleaver is no longer needed. After interleaving, the bits are then mapped according to the chosen modulation order. After modulation, OFDM modulation is performed using the inverse discrete fourier transform (IDFT), and guard interval inserter. Finally, windowing and TX filtering are done in order to reduce the signal spectral sidebands which may cause interference to other systems.

Receiver
Aside from the time and frequency synchronization, the receiver is completely the reverse of the transmitter as shown in Fig. 2. The receive filter is done to reduce the effect of noise and interference outside the receive bandwidth. The output is then used to adjust the automatic gain control as well as carrier frequency offset compensation. These processes are done in the time domain and can be implemented either sequentially or joint. After this, the receiver is now ready to perform frequency domain processing. It first removes the guard interval and then applies discrete fourier transform(DFT) to obtain the frequency domain symbols. When receiving long training symbols, the channel estimation block works by computing the channel coefficients such that when the receiver is at the point of receiving the actual data symbols, the equalizer can use the previously computed channel estimates. These channel estimates will be used by the equalizer throughout the duration of the packet. After equalization, the receiver will perform demapping and deinterleaving before doing FEC decoding. The FEC decoder is usually the Viterbi decoder when convolutional encoder is employed in the transmitter. Finally, descrambling is done to obtain the originally transmitted data bits.

Reference OFDMA system
As of March 2015, the 802.11 task group ax (TGax) has adopted the use of OFDMA for the next generation of WLAN standard. While still in the early stage, latest development requires the reduction of the subcarrier spacing from 312.5kHz to 78.125kHz resulting in a quadrupled symbol duration and number subcarriers [8]. Using this, we designed a reference uplink OFDMA system to compare our proposed IDMA system.   Aside from the frequency design, the transceiver architecture of 802.11ax devices will differ from current 802.11ac devices in the use of the subcarriers. In the 802.11ax devices, each STA will have the capability to utilize only a portion of the subcarrier set. For this reason, the number of parallel users transmitting at one time is a major design parameter.
We follow a straightforward resource block allocation as described in Fig. 3. In this figure, the case where the set of subcarriers is divided into 8 resource blocks with one resource block occupying 29 data subcarriers and 1 pilot subcarrier is shown. In this setting, the resource block is approximately 2.5MHz. Note that while finer resource blocks result in increased MU diversity, it will also result in a much higher complexity as well as complexity in user scheduling. In section 5, we also consider the case when the resource block bandwidth is 10MHz for comparison. There are a total of 16 null subcarriers consisting of 13 guard band subcarriers and 3 null DC subcarriers. For simplicity, the pilot subcarrier is placed at the center of the resource block. Table 1 shows the rest of the system parameters of the reference OFDMA PHY architecture. Unless otherwise specified, the remainder of the paper assumes the parameters in Table 1.
For the interleaver, we follow the same block interleaver design as in the 802.11 standard but with modified parameters applied independently to each resource block. Depending on the resource scheduling strategy of the access point (AP), it can allot multiple resource blocks to one STA at a time.

Design criteria
The proposed IDMA WLAN system specification is shown in Table 2. This specification is very similar to the reference OFDMA system except for the interleaver type and the maximum throughput. Because each STA can use all the available subcarriers at once, the theoretical maximum throughput is much higher than that of OFDMA. However, in order to support this high throughput at realistic receive signal strength, diversity techniques must be employed such as space time block coding (STBC) or receive maximum ratio combining (MRC).
As mentioned in the Introduction, the main problem in implementing IDMA in the current 802.11 system is the latency. While there are many interval constraints defined in the standard, we concentrate on the short IFS (SIFS) which is defined as the amount of time required for the receiver of a frame to process the received frame and to respond with a response frame such as the acknowledge (ACK) frame. For 802.11 OFDM PHY, this is set at 16us. Hence, the receiver must be able to finish receive processing within 16us with a realistic operating clock frequency. Considering additional delays from the transmit path as well as the MAC processing delay, it is a good rule of thumb to aim around 10μs as receive processing delay. As detailed in section 4, the latency of the proposed system fits well with this constraint.

IDMA transmitter
The proposed IDMA transmitter differs from the 802.11 transmitter with the addition of spreading and the patterns used in the interleaver. The spreading can be thought of as part of the FEC encoder in a general system but to be more specific, we define the FEC encoder as the current 802.11 FEC encoder while the spreader is a repetition coder which can be modified depending on the total number of bits sent. In the IDMA and turbo coding literature, the choice for the convolutional encoder is one of the recursive type because this has better performance in iterative decoding when the a posteriori probability (APP) decoder is inside the iteration loop. But since this will cause a very high latency to implement, we opt for a simpler iteration loop where only the repetition decoder is placed inside the iteration loop. Another advantage of having this sacrifice in terms of complexity is that we can re-use the Viterbi decoder already present in legacy 802.11 systems.
In this paper, we do not make any optimization with regard to the interleaver except that it is generated randomly. The set of 8 interleaver patterns used by the participating 8 STAs are pre-generated and stored in both AP and STAs. The specific interleaver used by one client depends on its index assigned by the AP during association.

IDMA receiver
Often, the receiver architecture for a standardized system need not to be described in detail to give the designer as much freedom as possible. However, in order to prove the feasibility of the proposed system in a WLAN environment, we show a specific receiver implementation with good performance, complexity and latency. The receiver architecture is shown in Fig. 5. Note that blocks are very similar to a regular 802.11 receiver until the multi-user detection (MUD) which marks the start of the IDMA processing.

Multi-user Detection
Let the receive signal after OFDM demodulation be         where xn(j) represents the transmit symbol sent by STA n in the jth subcarrier, hn is the channel coefficient from STA n to the AP, and lastly a is the zero mean complex Gaussian noise sample with variance σ 2 in the receiver. The quadrature modulated symbol x comes from the set A containing all the possible symbols from a specific quadrature amplitude modulation (QAM) constellation. The goal of the MUD is to detect xn for n = 1,...,N given y and noisy estimates of h.
In [1], the authors defined a series of elementary operations to obtain a rough estimate of x by assuming that the sum of the signals of a number of users will result in a gaussian like signal with a pdf of denotes the new mean and variance of the noise which includes the interference from other users. With random interleaving and sufficiently high number of users, the central limit theorem makes this assumption very accurate. Obtaining the likelihood ratio from (2) is straightforward using [11] where the same low complexity soft demapper employed in many 802.11 systems was first proposed.
In (3), the variable E[xn(j)] refers to the soft symbol estimate of xn(j). The soft symbol estimate can be obtained from the extrinsic information ϵ(xn,k(j)) using the expression E[xn(j)] = tanh(ϵ(xn,k(j))/2) for BPSK signals. Note that the extrinsic information ϵ(xn(j)) is the feedback information of the previous iteration providing new information about the estimates of the symbols of each users.
In the first iteration, there is no extrinsic information causing the estimate of E[xn(j)] to be very inaccurate. Through the decoder which may consist of a combination of despreader and an APP decoder, increasingly accurate estimate of E[xn(j)] are produced. Even with little actual noise, the receiver needs more than 4 iterations to obtain an acceptable bit error rate (BER) [2]. Another drawback with this method is that due to the inaccurate first estimate, the method is limited to low order QAM modulated symbols such as BPSK and QPSK due to the non-linearity of the soft-demapper operation for higher order QAM.
In order to lessen the number of iterations and reduce latency, it is necessary to obtain an accurate result right from the first iteration. To do this, we employ joint maximum likelihood estimation of the transmitted bits for all users. Let   Note that the above method can be likened to a soft decision version of a joint hard detection in an interference channel and hence will have good BER performance even at one iteration given a high enough signal to noise ratio. Further iterations will result in performance near AWGN capacity.

Extrinsic LLR
After an initial estimate of the transmitted symbols for all STAs, the decoding of each STA's transmit sequence is done. For every STA n, the receiver performs deinterleaving expressed as which is followed by the channel decoder. As seen in Fig. 5, the feedback loop of the IDMA receiver does not include the actual channel decoder (i.e. Viterbi decoder). The first reason for this is to reduce hardware complexity because this avoids the need of a soft output channel decoder such as an APP decoder or soft output Viterbi algorithm. The next reason is this allows us to reduce the latency of the system by implementing a parallel interleaver instead of a serial one.
Finally, the extrinsic LLR information is used to compute the feedback variable P(x-n) according to

Summary and latency analysis
Each iteration of the proposed MUD involves the following processes: 1) Soft Demapper 2) Deinterleaver 3) Despreader 4) Extrinsic LLR computation 5) Interleaver 6) Feedback variable update From the receive signal y, the first process involves computing a first estimate of each STAs data bits using (11)(12)(13) to obtain λ(xn,k( j)). This process is simply many parallel arithmetic computations and latencies are only due to pipelining. The next step is the deinterleaver shown in (14) which due to the memory operations involved would need a maximum of 2048 cycles for 256 FFT size and spreading factor of 8. The next step is the despreader as expressed in (15) and is an accumulator operation that has negligible latency. For the computation of the Extrinsic LLR shown in (17), another interleaver operation which again would need 2048 cycles in the highest supported spreading factor is required. Lastly, the feedback update variable in (10) and (18) when implemented using a lookup table will also have negligible latency.
As evident in the explanation above, the main contributor of the latency is the interleaver and deinterleaver pair performed in every iteration. Using a nominal operating frequency of 640MHz, we plot the latency vs. the number of iterations in Fig. 6. In this figure, it is readily seen how the latency linearly increase for every iteration and spreading factor (SF). For the specified maximum SF of 8, the proposed system can only process the signal with 2 iterations while still meeting the target deadline. While the latency results may look very pessimistic, we would like to note that the increase in latency per iteration can be reduced directly by applying parallel interleavers which would tradeoff hardware complexity of the interleaver to meet the required latency.

Simulation Results
In order to show the performance of the proposed system, as well as to confirm the soundness of the chosen design architecture, we perform simulations comparing our reference OFDMA architecture with the proposed IDMA architecture. The default simulation parameters are listed in Table 3. Figures 7 and 8 show the performance of the proposed system compared with OFDMA uplink transmission. In this simulation, the total bitrate is fixed to 7.3Mbps by setting all the STA's modulation and coding scheme (MCS) to 0. MCS's definitions follow the 802.11ac standard and are all listed in Table 4.
For OFDMA, the number of subcarriers is divided equally between the number of STAs such that for N STAs, each STA will have a data rate of 7:3/N Mbps. For IDMA, because all the subcarriers are used by all STAs at the same time, we use a spreading factor equal to N to match the data rate of the reference OFDMA system. In the following simulation results, there are two versions of the OFDMA system, one is without frequency resource scheduling and the other is with ideal frequency resource scheduling denoted in the figures as OFDMA-ideal. Ideal frequency resource scheduling requires perfect channel state information (CSI) at the transmitter prior to the transmission of the scheduling frame and the actual uplink MU transmission.
In Fig. 7, we simulate the performance of IDMA and OFDMA when there are only two resource blocks whose bandwidth is 10MHz each. This implementation is the easiest to implement but suffers from poor diversity gain. In this simulation, there are a total of 4 active STAs competing for the two available resource blocks. In the ideal OFDMA case, the AP allots to the STA whose channel has the highest energy on a particular resource block. On the other hand, the regular OFDMA case as well as the IDMA case allots the resource blocks to a random active STA. As seen in the figure, the performance of the proposed system has clear advantage to OFDMA with random frequency allocation but has worse performance against OFDMA with perfect scheduling by about 3dB.  In Fig. 8, we perform the same simulation with resource block bandwidth of 2.5MHz for a total of 8 resource blocks that can be allotted to 8 STAs in parallel. In this case, the total number of active STAs is 16 which is again twice the number of resource blocks available. As the bandwidth of the resource blocks decreases, the diversity gain of OFDMA with perfect scheduling is increasing. On the other hand, with random scheduling, the fact that each STA experiences flat fading is not compensated by any multi-user diversity gain making the overall performance degradation worse. The performance of IDMA is almost unchanged regardless of the number of STAs present in the system.
From the above results, the benefit of the proposed system is clearly due to the lack of scheduling overhead. This effect is substantial considering that ideal scheduling would need the AP to poll all STAs one at a time.
In Fig. 9, we examine the effect of the number of IDMA iterations in the performance of the proposed system. In this simulation, we consider a system that can accommodate 8 parallel STA transmission. Again, we fix the data rate to 7.3 Mbps by adjusting the spreading factor accordingly. In the figure, it can be seen that the performance of the proposed system only needs at least 2 iterations to obtain good BER performance.
Lastly, we simulate the performance of the proposed system in various MCS's. The advantage of the proposed system is that aside from the option to control the system performance by changing the MCS, it can also adjust spreading factor and the number of streams per user based on any available STA information such as CSI or long term PER statistics. In Fig. 10, the packet error rate (PER) of the proposed system across all MCSes for N = 2 STAs is shown. We employed 2 IDMA iterations and a spreading factor of 8. As the graphs show, the proposed system can easily provide the maximum MCS of 9 for 2 users without any scheduling at about 21dB of SNR for a 10% PER.

Conclusions
In this paper, the performance of an OFDM-IDMA system for next generation uplink multi-user system was presented. This system has very high compatibility with the current 802.11ac system and with a reference straightforward extension to an OFDMA system. This makes it possible to operate IDMA on top of OFDMA given some conditions are met. These conditions include the ability to allot resource blocks to multiple users and the ability for the AP to instruct a specific interleaver pattern to associated STAs. The proposed system utilizes almost all of the currently existing IEEE 802.11ac blocks which while not optimal, reduces additional complexity for implementing OFDM-IDMA. Simulation results reveal that it only needs around 2 iterations to provide good BER performance for both high and low scattering channel environments. Finally, we showed simulation results showing the ability of the system to support the maximum MCS of the 802.11ac system.