Troubleshoot for High BLER / Poor PHY throughput
The first step goal for almost every communication test (reglardless of whether it is protocol test or performance test) would be to achieve the condition with zero (or low enough) BLER/CRC Error for both Downlink and Uplink traffic. My recommendation for every test in practical sense is ... "Setup the condition that gives you Zero (or Low enough) BLER/CRC Error for the test that you want to perform". If you are tesiting protocol issues, you may not need to achieve Zero BLER but still it should be low enough. If you want to achieve the max throughput, you should make it sure to get zero BLER/CRC at the specific condition (e.g, at required MCS, Number of RBs).
Whenever you have any problems with your testing regardless of whether it is for protocol or throughput, the first thing I would suggest you to do before raising any support ticket is to check BLER/CRC error and try to fix that problem first if BLER/CRC error is too high.
Unfortunately there is no single shot solution to achieve the condition for zero (low enough) BLER. A certain degree of trial-and-error is required (unavoidable). If you are lucky or more experienced with a specific test equipment or DUT, you may fix the problem with only a few trial and error, but if you are not lucky or less experience with the test setup/DUT, you may need to go through a lot of trial-and-errors.
The purpose of this tutorial is not to give you any single shot solution that fits for all cases, but to give you some common tricks for me ( support team) to do trying to improve the BLER/CRC error and eventually achieve the desired condition for the required test cases.
Table of Contents
- Troubleshoot for High BLER / Poor PHY throughput
- General Guideline
- Potential Culprit for BLER(CRC Error)/ PHY throughput
- Indicator of Poor Radio Link and Troubleshoot Tricks
- Callbox(eNB/gNB)
- t Command
- t spl Command
- WebGUI - RB
- WebGUI - Tx/Ret per Slot
- WebGUI - SNR
- WebGUI - Constellation
- WebGUI - Trace
- UEsim
- Tips on PHY Configuration
- LTE Downlink Max Throughput with Amarisoft eNB + Non Amarisoft UE / Amarisoft UEsim
- LTE Downlink Max Throughput with Non-Amarisoft eNB (e.g, live eNB) + Non Amarisoft UE / Amarisoft UEsim
- LTE Uplink Max Throughput with Amarisoft eNB + Non Amarisoft UE / Amarisoft UEsim
- NR Downlink Max Throughput with Amarisoft gNB + Non Amarisoft UE / Amarisoft UEsim
- NR Downlink Max Throughput with Non-Amarisoft gNB + Amarisoft UEsim
General Guideline
Before you trying to do any test (regardless of whether it is protocol test or performance test), I would suggest you to set some criterio of your own in terms of BLER/CRC error. My one like comment is "You should never ignore the importance of BLER, but don't expect too much and waste too much time more than required for your test'.
Followings are some common cases that I observer from many users and my personal suggestion.
- I am trying with simple attach test, but the attach does not go through : There can be many other reasons for this like SIM parameter mismatch and any other UE / gNB/ gNB capability mismatches, but the first thing you have to check is to check if the radio link quality is good enough to complete the attach procedure. In this case, you don't have to achieve 0 BLER but at least you should achieve a BLER lower than a certain level. If problem happens during the initial attach procedure, 't command or 't spl command would not give any useful information. In this case, check the trace log in WebGUI and see if there are too many PDSCH failure or PUSCH failres.
- I am doing this kind of protocol test but it does not seem to work as expected : Again there would be some specific reason for it, but the first thing I want to recommend is to check the radio link quality using t command / t spl command and see if you get good enough BLER level. There is not hard-limit criteria on what is the good enough BLER level. Just as my personal rule of thumb, you may go with BLER less than 20% for most of the protocol test.
- I am doing this kind of protocol test but I see a few PDSCH / PUSCH CRC Error in the trace log : My suggestion is .. if it is not the criticial for those specific PDSCH/PUSCH to be decoded at that specific timing and overall CRC Failure rate is not that high, just ignore those decoding failure and move on with the test.
- I am doing the throughput test (e.g, iperf, tcp) but I am not getting the specific throughput that I want to achieve : In this case, it is mandatory to check on the radio link quality and BLER level. As long as you any BLER at physical layer, you will never achieve the max throughput at higher layer. It is strongly recommend to go through every single steps suggested in this tutorial on your side first and than ask for further investigation if those trials still does not work. As far as I experienced, this would be the fastest way to achieve the goal since troubleshooting the throughput issue requires a lot of back-and-forth since there is no single shot solution that fits for all the cases. Reducing the number of back-and-forth would reduce the total amount of time to resolve (at least improve) the issue.
- I am doing the throughput test (max throughput test) over the antenna : To be honest, it would be almost impossible to achieve the max throughput with Zero BLER in radiative condition (i.e, over the antenna) unless you are lucky. It would be even more difficult to get high through over the antenna as the number of antenna (number of layers) increases due to various factors. Some of the typcial factors that make it difficult is 'Path Loss, Unwanted noise from reflected signal, Unwanted correction or coupling among antenna. So I would suggest you to try with conductive testing (i.e, RF cable connection) or reduce the acceptance criteria for the throughput.
Potential Culprit for BLER(CRC Error)/ PHY throughput
If you ask 'what is the single most important factor (even though it is not the only factor) for BLER/CRC Error leading to poor PHY throughput', I would say it is 'Radio Link Quality'. Of course, this reply assume that every components (Test equipment, DUT) are at a certain maturity level (even though they are not perpect). The second most common factor (based on my experience) would be some configuration issue. The factors to impact on BLER(CRC Error) can be listed as follows (based on my experience).
- Poor Radio Link Quality (Too Low Power, Too High Power, Frequency Error, Timing Error)
- Too high code rate (mostly caused by PHY Configuration)
- Other PHY parameters setup (e.g, DMRS, SRS, PTRS configuration etc)
- Poor Decoding Performance of the Recievers (UE or eNB/gNB)
- Poor Transmitted Signal Quality (UE or eNB/gNB)
Indicator of Poor Radio Link and Troubleshoot Tricks
As mentioned above, Radio Link Quality is the most important factor to resolve BLER (CRC error) issues, I want to talk first about this topic. For investigating / troubleshooting Radio Link Quality issue, you need to know how to check if there is any radio link issues for your test and how to resolve the problem. There are many different ways to identify the radio link issues and there are some common tricks to improve the issue. The tricks introduced in this tutorial would not be the tricks that always work, but they are at least the first things to be tried before any further investigation.
There wouldn't be much differences in terms of the general logics in terms of radio link troubleshoot between Callbox(eNB/gNB) and UE sim, so I would suggest you to read through both callbox and UEsim sections even though you are using only one of them.
Callbox(eNB/gNB)
In this section, I would look into common tricks on how to check and troubleshooting for high BLER on Callbox side. The tricks mentioned here are not all the tricks you can do for handling high BLER issues, but these are the most common things / steps that we (as Amarisoft tech support) to do at the initial troubleshoot stages.
t Command
The first thing you have to check when you have any issues if you are using Amarisoft Callbox (gNB, eNB) is to run 't command'. An example and meaning of important items indicating possible radio link issues is as shown below.
If you see any non ideal(Undesired) condition from the result (e.g, low CQI, low RI, low snr, high retx, high rxko), it is likely that you have some radio link quality issues. The most common practice to improve the situation can be summarized as follows.
If you see the undesired result for DL,
- try change tx_gain step by step and see if there improvement. (For example, increase/decrease tx_gain by 3dB several steps). You need to repeat this tweaking to cover pretty large range of tx_gain because sometimes the improvement happens only at a specific range.
- try change rx_gain step by step and see if there improvement. (For example, increase/decrease rx_gain by 3dB several steps). You need to repeat this tweaking to cover pretty large range of rx_gain because sometimes the improvement happens only at a specific range.
If you see the undesired result for UL,
- try change rx_gain step by step and see if there improvement. (For example, increase/decrease rx_gain by 3dB several steps). You need to repeat this tweaking to cover pretty large range of rx_gain because sometimes the improvement happens only at a specific range.
t spl Command
Next thing you should try is to run 't spl' command. From the result of this command, you can check wether RX chain of the callbox is saturated with too high power or suffering from too low power.
If you see too high or too low values for RX MAX or Non Zero values for SAT,
- try change rx_gain step by step and see if there improvement. (For example, increase/decrease rx_gain by 3dB several steps). You need to repeat this tweaking to cover pretty large range of rx_gain because sometimes the improvement happens only at a specific range.
WebGUI - RB
If the troubleshoot with the 't command' does not work, check with RB map on Web GUI and see if the similar rate of the BLER across all the time of the test or if it happens only at a specific time period. If the problem happens only at the specific time period, try the same test several times and see if the problem happens at the same time period at every test.
WebGUI - Tx/Ret per Slot
Check the ratio of retransmission (i.e, retransmission / transmission) per slot and see if there is any slots which shows outstandingly high retransmission rate. Repeat the same test several times and see if the pattern is reproduciable. If the same pattern repeats, it is more likely that there might be some PHY configuration factors for it (e.g, increased code rate .. for example, PSS,SSS,PBCH,SRS slots in LTE) or there is some specific UE side issues for those specific slot (check on UE PHY log)
WebGUI - SNR
This is mainly for UL radio link troubleshooting, but UL radio link quality would affect both on UL throughput and DL throughput. In the desirable condition, you should see high snr across the whole test period. If you see any big drops (dips) during any specific period, repeat the same test several times and see if the same pattern repeats.
WebGUI - Constellation
This is mainly for UL radio link troubleshooting. If you have poor UL throughput and the troubleshoot with 't command' and 't spl' does not work, check the constellation of the received signal (PUSCH) and check if the constellation is good enough to be decoded.
WebGUI - Trace
There are some cases where it is hard to identify any BLER/CRC issues from t command/t spl command or any graphicall information shown above. One of the typical case is that those high BLER issue happens during the initial attach and failed to get into the connected status. In this case, you need to open up the trance log in text editor or WebGUI (personally I would suggest to use WebGUI) and check out how many decoding failure (CRC Error) happens. If CRC error happens, it is displayed with Red circle with bask slash as shown below. If you see, multiple consecutive errors for PUSCH or PDSCH, it is not a good sign and resove this issue first before you continue to any other test.
The first step for this kind of troubleshoot as well is teaking tx_gain or rx_gain
UEsim
NOTE : Special Comments for Connecting Non-Amarisoft Callbox
If you are connecting Amari UE sim to non Amarisoft eNB/gNB, there is an important thing to keep in mind. That is about the RF max input power for SDR card. It is recommended that you should not apply the total power greater than 5 dBm to RX port of the sdr card. It means that to connect Non-Amarisoft Callbox (e.g, Live eNB/gNB) directly to UEsim sdr card without proper attenuators may damage the SDR Card Input.
Another thing to take into account is about max Tx power of UE sim SDR card. The most common Tx power class for commercial UEs 23 dBm and there are other types of UE which can transmit much higher power, but the max power of Amarisoft SDR card is not as high as the commercial device.
The max total power varies depending on the type of sdr card and frequency. Refer to the following documents for the specification of SDR card tx power and take this into account for your test setup.
- SDR 50 : See this document
- SDR 100 : See this document
t Command
The first thing you have to check when you have any issues if you are using Amarisoft UEsim is to run 't command'. An example and meaning of important items indicating possible radio link issues is as shown below.
- CFO : There is no clear cut range for guaranteed performance, but rule of thumb it is recommended to be within the range of abs(CFO) <= SCS (in Hz)
- SRO : There is no clear cut range for guaranteed performance, but rule of thumb it is recommended to be within the range of abs(SRO) <= 2 ( <= 0.2 for high modulation).
- SINR : Measured for SSB only.
If you see the undesired result for DL,
- try change rx_gain step by step and see if there improvement. (For example, increase/decrease rx_gain by 3dB several steps). You need to repeat this tweaking to cover pretty large range of rx_gain because sometimes the improvement happens only at a specific range.
- I would suggest to trx rx_gain -1 (automatic gain control : AGC) first and try specific values if the AGC does not work
If you see the undesired result for UL,
- try change tx_gain step by step and see if there improvement. (For example, increase/decrease tx_gain by 3dB several steps). You need to repeat this tweaking to cover pretty large range of tx_gain because sometimes the improvement happens only at a specific range.
If you want to improve CFO/SRO, you may try followings
- Stop UEsim (service lte stop)
- cd /root/trx_sdr
- ./sdr_util -c 0 clock_tune // this gives you the current clock_tune value as shown here. ('0' here indicates the sdr card number. Change this according to your setup)
- ./sdr_util -c 0 clock_tune 2.3 -s // this reset the clock tune value to a specified value. ('0' here indicates the sdr card number. 2.3 is the value you want to set. Change this according to your setup. Try first with the current SRO value you see in 't' result and tweak up and down until you get the lowest CFO and SRO)
t spl Command
Next thing you should try is to run 't spl' command. From the result of this command, you can check wether RX chain of the callbox is saturated with too high power or suffering from too low power.
If you see too high or too low values for RX MAX or Non Zero values for SAT,
- try change rx_gain step by step and see if there improvement. (For example, increase/decrease rx_gain by 3dB several steps). You need to repeat this tweaking to cover pretty large range of rx_gain because sometimes the improvement happens only at a specific range.
Web GUI
The features and tips for Web GUI on UE sim is almost same as Callbox WebGUI and I would segguest you to refer to WebGUI features in Callbox Section.
Tips on PHY Configuration
Various configuration may affect on BLER especially at max throughput testing. It is hard to comments about the single set of parameters that fits for every cases. Followings are some of the possible test setup and related phy configurations that may influence on BLER at max throughput condition (i.e, MAX Transport Block Size per each subframe).
LTE Downlink Max Throughput with Amarisoft eNB + Non Amarisoft UE / Amarisoft UEsim
In this case, most of the PHY configuration would be set in optimal condition for the antenna configuration set by the configuration file (if you use the default configuration provided by the installation package). You should be able to achieve max throughput only by optimizing radio link quality.
LTE Downlink Max Throughput with Non-Amarisoft eNB (e.g, live eNB) + Non Amarisoft UE / Amarisoft UEsim
It would be good idea to check PHY configurations on eNB side like PCFICH value (number of symbols for control channel), p_a, p_b values.
LTE Uplink Max Throughput with Amarisoft eNB + Non Amarisoft UE / Amarisoft UEsim
In this case, just tweaking rx_gain, tx_gain, conductive connection would work for most case (if you use the default configuration provided by the installation package), but in very high modulation scheme removing SRS configuration in SIB2 would perform better. But with the Amarisoft eNB, you may get zero BLER but would not get ideal max throughput since Amari eNB always reserve some UL resources for PUCCH reception from multiple UE as explained in this tutorial. If you want to achive the ideal max throughput, you need to set a special configuration as shown in this tutorial.
NR Downlink Max Throughput with Amarisoft gNB + Non Amarisoft UE / Amarisoft UEsim
In this case, most of the PHY configuration would be set in optimal condition for the antenna configuration set by the configuration file (if you use the default configuration provided by the installation package). You should be able to achieve max throughput only by optimizing radio link quality.
NR Downlink Max Throughput with Non-Amarisoft gNB + Amarisoft UEsim
I want to suggest to configure in gNB as follows and optimize tx_gain, rx_gain on UEsim
- k1, k2 should not be too small. The desired value is 4 or greater, but you may push them down to 2 if the UEsim PC has enough performance.
- dmrs-AdditionalPosition should be pos1 or higher (pos0 is not recommended)
- Configure CSI-RS for CQI, RI, PMI so that gNB can adjust scheduling based on CSI-Report from UEsim