Audio & Video quality testingMay 11, 2023

What Is Multi-User Testing and How Is It Used to Test Audio and Video Software?

Ieva Rebeka Koroļkova

Until recently, the main focus in TestDevLab’s audio & video testing department has been to provide network emulation, observe how devices behave in different conditions, and provide an objective evaluation of the software quality. Our usual test setup allows us to observe specific device pairs—one sender and one receiver—no matter how many participants are in the call. In such a setup, it’s possible to simulate real-world network conditions for both the sender and the receiver and evaluate the quality of the receiver based on the modifications we make.

However, more recently, our focus has shifted or, to be more precise, has expanded. Namely, due to the increasing popularity of digital communication and developers implementing different architectures for their applications, there has been a growing demand and need to observe more than one participant in a call. It is important to ensure a stable connection and quality experience for everyone taking part in the call. To meet this demand, TestDevLab began working on operational setups and technologies that would allow us to test more complicated scenarios. In brief, we worked to implement complex multi-user testing in an audio and video context.

In this blog article, we will discuss the challenges of multi-user testing for communication applications while ensuring the connection under diverse conditions. We will take a deep dive into multi-user testing and explore how TestDevLab has created a solution that helps to monitor the behavior of multiple users within one call. Keep reading to learn the key challenges and strengths of multi-user testing, explore our unique audio and video testing laboratory, and find out how our testing approach can deliver the best possible results to both companies and their users.

Variable environments require adaptive solutions

One of the biggest challenges in digital communications is the wide variety of network environments available around the world and the vast pool of devices with different capabilities. As the main selling point for any digital communication service is to connect people wherever they are, the product itself has to be adaptable to various network conditions and device specifications.

The first thing to note is the growing ownership of devices and the diverse range of devices used to access the internet and to communicate (Image 1). Communicating through messaging applications and social media is not only possible on desktop devices or mobile devices anymore. Now, smart TVs, game consoles, and smart home devices come into the mix as well.

According to the Digital 2023 Global Overview Report, the biggest share of internet usage is on mobile devices, with 92.3% of users accessing the internet via mobile devices. Nevertheless, it is important to keep in mind that there are different types of mobile devices that run on different operating systems (Image 2). Therefore, this distributed market share of different operating systems creates another complexity of cross-platform compatibility on the device layer, operating system layer, and specification layer.

Mobile Operating System Market share in February 2023 — Image 2: Mobile Operating System Market Share Worldwide in February 2023 | Image source: Statcounter

Another thing to consider is the network environment and mobile internet connection speeds, which vary from country to country (Image 3). To ensure an uninterrupted connection, communication applications need to be able to adapt to changing environments and network conditions. Specifically, they should be able to adapt not only to different network speeds but also to changes resulting from movement (e.g. being a passenger in the car), the physical environment (e.g. being in the countryside instead of the city), or disruption (e.g. mobile network and wifi switching on the device).

Mobile network connection speeds — Image 3: Mobile internet speed in different countries in January 2023 | Image source: DataReportal

Video conferencing architectures

To ensure adaptability and resilience in various conditions, communication application vendors have introduced new solutions and specific video conferencing architectures into their products. Let's take a look at a few of them to understand how they operate to ensure a stable connection between multiple participants.

Selective Forwarding Unit (SFU). This architecture places the server in the center, where it receives video streams from all participants in the call. The server then copies the received streams and sends a copy to each receiving participant. The good thing about this architecture is that it needs less server resources, however, it also requires more available bandwidth to store all received video stream copies. Moreover, the participants will have a higher CPU load compared to other architectures.
Multipoint Control Unit (MCU). This architecture is similar to SFU in that it collects all participant video streams, however, it then proceeds to process the videos by combining them into a single video stream with the requested layout. The processed video stream is then sent to all recipients. This architecture does not require wide bandwidth, allowing participants to use devices with lower specifications than desktop, however, the server itself needs a lot of resources to gather and process the video streams. This complicates the scalability of the calls and requires a larger investment from the vendor.
Simulcast. This architecture ensures that each participant in the call sends multiple copies of their video stream to the server, each copy being in different resolution with different quality. The server then proceeds to choose the appropriate copies and send them to recipients based on their hardware, software and environmental capabilities (Image 4). This type of architecture allows flexibility for the call and helps to ensure that low quality from one participant does not ruin the whole call experience for others. However, it demands good network capabilities from the participants to ensure that multiple copies of the video feed can be transmitted continuously.

Image 4: Simulcast architecture visualization

Video conferencing architectures like Simulcast are one of the main reasons why different participants within the same call can have different experiences. Some architectures are more adaptive than others with different requirements (Image 5), but all of them are meant to achieve specific results for end users. One of the best ways to ensure and fully understand whether the end results match the expectations is to look at multiple participants in the call and compare their experiences. This is where objective multi-user testing comes into play.

Image 5: Comparison of different architectures for 4 user video calls | Image source: TrueConf

What is multi-user testing in an audio and video context

Simply put, multi-user testing is any type of testing that observes multiple devices simultaneously. It is particularly important and valuable in audio and video testing because, as mentioned above, a video call typically involves more than one person. We have briefly covered our work and experiments related to this type of testing in our webinar, Audio and Video Quality Testing: The Past, Present and Future, and are now fully prepared to perform such testing.

By observing multiple devices in one call, we can create various environments and see how they affect the behavior of the call for each recipient. For example, one participant may have unlimited network access while sitting at home, while another might be outside in the countryside with limited network access. In such a scenario, would the experience be the same? Most probably not. To find out for sure, we will look more closely at multi-user testing in communications with an emphasis on audio and video testing principles.

Multi-user testing setups

Let’s start by exploring the different ways we can approach audio and video multi-user testing. While the core principle—to observe multiple devices within the call—is simple, there are several ways to build a setup that directly impacts the testing process. Here are various setups we use when performing multi-user testing:

One sender with multiple receivers. In this setup, we have a group call where one of the participants is sending an audio and video feed, while the other participants may or may not send video media. For this setup, the main difference is that we have one main speaker and multiple participants who are listening. The focus here is to observe multiple receivers within the call and see how the quality differs between them.
Multiple senders with one receiver. A group call where multiple participants are sending audio and video feeds, while one of the participants is receiving said media. The main objective of this setup is to observe how sender devices perform under different conditions from the perspective of one receiver. Namely, the focus is to evaluate and compare performance of multiple senders. The benefit of such a setup is that variables from the receiver’s end are eliminated and unchanged for each sender throughout the whole setup.
Multiple senders with multiple receivers. This is the most complex setup, where multiple participants in the group call are sending audio and video media and multiple participants are receiving said media. In the most complex scenario, all of the participants can have different network conditions.

One important thing to note is that these setups differ in terms of what is observed within the calls. It does not put a limitation on the number of participants and the size of the call. By adding device farms, we can create large calls on real devices, where multiple participants have a non-standard network environment.

Challenges and limitations

While the concept of multi-user testing is quite intriguing and promising, it comes with its own set of challenges and limitations. We at TestDevLab have worked to create a setup that can provide as much value and information as possible. Here are some of the main things we have encountered and learned along the way:

One solution does not fit all. Even though the core idea is very similar for all types of calls listed above, each one requires specific adjustments to ensure the correct environment. Therefore, modifying the core solution is essential. For example, in cases where there are multiple senders, it’s important to facilitate conversations between all of the senders to ensure that both video and audio data can be evaluated by the receiver.
More devices, more complex scenarios, and more extreme network conditions require more manual attention. The complexity of the setup in some cases requires close attention from the tester to ensure that the call is functional and there are no missed errors in the process. While it is possible to automate simpler cases, for multi-user setups, especially those with real devices, the risk of issues, such as connection problems, device overheating, and battery drain, is higher. Moreover, usability is an important aspect of the calls that should be evaluated both objectively and subjectively, so that the tester can provide more information on the behavior of the participants. Of course, generally, it still depends on factors such as the number of devices, device models, network scenarios, and other variables.
Device management and infrastructure needs to be planned beforehand. As multi-user testing is aimed at group calls, it is important to plan the layout of the setup, position the devices, and make hardware and infrastructure preparations before starting the overall testing procedure. If the test is planned to be run more than once, the setup should stay in the same position to ensure comparability between test sets. This applies to all audio and video testing, however, it is still easier to set up a regular call with one observable device than it is for two or more devices.
Mobile devices add an extra layer of complexity to the setup. Because desktop devices have higher capabilities, it is easier to create the setup, as some of the processes can be executed on the observed devices without placing a high demand on hardware capacity and resources. In the case of mobile devices, however, recording and collecting data is more challenging, as the mobile devices have to be connected to another desktop testing device. As a result, the overall device count and complexity increases. This is done so that mobile devices do not overheat, do not waste resources on recording the screen (as it is not a common use case and can require too much hardware capacity), and to prevent other processes from coming in the way of testing processes.

After overcoming the challenges and limitations associated with multi-user testing, you can gain detailed and objective insights into group calls and individual user experiences.

Multi-user testing laboratory and setups

The general testing procedure in multi-user testing is quite similar to the standard one we use for testing audio and video quality in video conference applications (see Image 6), with an exception for use cases with multiple senders / multiple receivers. To ensure that the current solution can be scaled without loss of quality and be able to evaluate multiple devices within one call, we will look into the differences between scenarios, their setup and testing processes.

Image 6: Regular audio and video testing setup (observation of one device)

One sender with multiple receivers

Starting with the simplest of the three scenarios, let’s explore the setup that involves one sender and multiple recipients in the call. In the example below (Image 7), we can see the visualization of this setup for a four-user call. The baseline of the setup is very similar to the standard one (Image 6), however, the challenge here is to ensure that all three observed devices are within one testing laboratory.

It’s important to note that for each new observed recipient, additional audio playback devices must be added and wired in order to properly connect them to the recipients. This creates a bigger setup that testers need to manage. For manual testing, it is important to structure the setup and devices in a way that ensures they are accessible and do not interfere with each other, meaning that they do not cover each other’s screens and can be operated by hand. The more recipients are observed, the more complexity is added to the physical part of the setup, and the more complicated it becomes for the tester running the test, as they have to start the processes in the right order to ensure the correct collection of data.

However, the main challenge here is the physical setup and its structure. Once it is done, the processing and evaluation procedures are the same as for the regular setup. The only thing to have in mind is that the effort for such evaluation is multiplied by the number of recipients, as each has their own set of raw files waiting to be processed into objective metrics.

Image 7: Visualization of one sender and multiple receivers setup for 4-user call

Multiple senders with one receiver

The next on the complexity ladder is the setup with multiple senders and one receiver. Here the main complexity component is the fact that there is more than one participant who not only sends video media, but also audio feed. That means that the setup needs to facilitate communication between the participants in a way that ensures no one is talking over each other and in a way that feels natural and best emulates a real video call.

In terms of the physical setup, this scenario uses a simpler infrastructure than the previous one. The structure consists of multiple regular setups (Image 8) that are connected in a single call. On each setup level, there is no additional difficulty, however, to ensure that the raw files acquired during the tests can be correctly processed, the devices must be positioned in sync with each setup with a focus on the visibility on one observable receiver. From our experience, each application has its own way of handling the video feeds inside the grid, with different sizes and scales. In some cases, changeable scales during the calls can also be expected. As a result, preparing the setup requires more time, which is multiplied by the number of competitors in the test set because for each application the setup has to be reevaluated and corrected.

Additionally, conversation audio samples need to be incorporated according to the setup to ensure that senders are “speaking” sequentially one after the other. The good news is that this just requires a different audio input and does not require additional effort from the testers.

While the preparation process might not be so smooth, the biggest challenge is in the post-processing of the raw data, as the standard process won't work in this scenario. Namely, instead of searching for one video feed and evaluating one audio feed, the process requires testers to identify multiple audio and video feeds and their sources. To address this challenge, we have adapted our solutions to enable the post-processing of files without adding too much additional effort. Our scripts are adapted to recognize different sender grid points by their assigned color (Image 8), and also to understand the sequence of the audio phrases from each sender.

Image 8: Color distinction of different senders

Multiple senders with multiple receivers

This is the most complex scenario out of the three, as it involves multiple senders and receivers. In theory, this case is a combination of everything that was discussed previously. It makes the process simultaneous and more efficient by testing two separate scenarios, so everything can be observed and compared in one call. However, it also aggregates all of previously mentioned challenges as well.

Image 9: Visualization of the mobile setup with multiple senders and multiple receivers with device farm

This setup (see Image 9) allows us to benchmark not only differences between multiple senders or receivers, but also cross-platform connection and quality. As mentioned above, the process is similar to a combination of the previous two setups with an added layer of complexity, but let’s explore the nuances, differences and advantages of this setup:

Cross-platform comparison. This setup enables us to observe cross platform connections. For example, we do not only look at the connection between an iOS sender and an Android / iOS receiver, but also look at iOS to Android, iOS to iOS, Android to Android, and Android to iOS connections. Using different devices as senders and receivers in such setups allows us to observe various connections to evaluate the quality of the same call through the eyes of each participant. Therefore, not only can we observe how different platforms behave in the call, but also how device capabilities impact the performance.
Multiple setups and independent operation. As the whole setup consists of multiple standard setups, the initial reaction is that it only adds complexity to the whole operation. However, in reality it provides various benefits as well, one of which is the ability for each setup to operate independently with its own tester regulating it. As a result, each pair of devices also gets a subjective evaluation that is needed to ensure immediate action in case of an error or disruption.
Can be performed from different locations. Expanding on the previous point, being able to operate smaller parts of the whole setup independently allows us to manipulate their location, which can add additional complexity to the scenarios.
Greater variety of network emulation. Testers can utilize a greater number of observed devices and create more complex and interesting network environments for the call. With this setup, we can create more diverse conditions and an emulation network for each participant, while observing all of these different behaviors during a single call.

Now that we have provided you with an overview of the multi-user testing process and the different setups, we can explore examples of the results that might give interesting insights, observations, and possible opportunities on how to get the most out of this type of testing.

Types of quality metrics used in multi-user testing

What are the overall results that you can expect from multi-user testing and what opportunities does it promise? In terms of quality metrics, multi-user testing uses the same quality metrics as those used in audio and video testing, though it has some additional advantages. Allow us to remind you about the metrics that we use to measure audio and video quality and explain what multi-user testing offers in addition to standard audio and video quality testing.

Audio quality metrics

In terms of audio quality, both POLQA and VISQOL MOS scores can be used. In cases where there is one sender, the process is the same as with a simple setup—the audio from the sender is recorded through each receiver and then evaluated using the appropriate algorithm (Images 10 and 11). After the evaluation, we can gather overtime results for each perceived audio and evaluate the network limitation or the device's impact on the audio quality. As we have algorithms automatically aligned with audio delay evaluation, these results can also be gathered simultaneously.

Overtime audio quality evaluation for multiple receivers in the same call — Image 10: Audio quality evaluation for multiple receivers in the same call

Image 11: Audio delay evaluation for multiple receivers in the same call

Whenever multiple senders are introduced, the whole conversation, as received by the receiver device, is recorded. The recording is split based on the sender speaking in the specific sample. Then all samples from the specified sender are evaluated against the original audio sent to calculate the MOS score and the audio delay. See example below (Images 12 and 13).

Image 12: Average audio quality POLQA score evaluated from different senders in the same call

Image 13: Average audio delay evaluated from different senders in the same call

In both examples we can see the difference that the specific limitation makes. This reveals that the call is not the same for everyone. Usually for audio data, the difference is not as noticeable as it is in the video data.

Video quality metrics

Based on our experience, video is the first to be affected by different network limitations. Specifically, either the quality or the frame rate is obscured or degraded, or the video is turned off altogether to maintain at least some level of audio communication. To evaluate video quality both non-reference metrics—BRISQUE and VQTDL— and full-reference metrics—VMAF—are used.

In the example below, we can see just how much of an impact the network emulation has on the received video frame rate and how different the video can be for different participants in the same call (Image 14).

Image 14: Video frame rate for different recipients in the same call

In the graph above, we can see that the application recognizes that the receiver with an unlimited network is able to maintain 25 fps throughout the call without compromising the whole experience as a result of other participants having a more limited network. This type of data proves that applications can adapt content to suit the limitations of participants.

Assessing the overall quality of the video call

The most interesting part of the results gathered via multi-user testing is getting a comprehensive and overall view of the video call quality—from audio quality and network consumption to application adaptation and environments. In the final example of multi-user testing below, we would like to show you an overall quality evaluation of a video call with four participants (one sender, multiple receivers) and examine how it showcases the differences in the call quality based on the available network. All data shown is gathered using a multi-user testing approach, which means that the data represents the average of 5 calls (all metrics gathered within one call). First, let’s take a look at the audio quality and delay.

Image 15: Audio delay evaluation for a 4-user call with one sender and multiple receivers

In all three receivers we can observe different behavior that directly correlates with the severity of their network limitation (Images 15 and 16). One main thing to note here is that the application not only adapts to the 250Kbps limitation and maintains good quality audio, but it also does not compromise the experience for other receivers because of the network condition. This indicates that the calls are optimized to receive a level of quality they can handle. It appears that the application slightly compromises the audio delay for the 250Kbps receiver, which is twice as large as that for other receivers, to ensure better audio quality. Moving on, we’ll examine video evaluation data to find out if it will show similar tendencies.

Image 17: Average frame rate for each receiver in the call

Image 18: Average VQTDL score for each receiver in the call

Image 19: Average video delay for each receiver in the call

A similar pattern as observed in the evaluation of audio quality can also be seen in the video data (Images 17, 18 and 19). The most obvious change is visible in the frame rate, which drops quite significantly based on which receiver is observed together with their network limitation. An important thing to note here is that all metrics exhibit the same tendency, where lower network receivers do not affect recipients with better network conditions. The quality is appropriate to the capabilities of each receiver and does not compromise the overall call quality for everyone, which is a very important resource utilization technique. To ensure that the data is accurate and relevant, it is important to also look at network data and consumption.

Image 20: Network consumption for each receiver in the call

When looking at the average network consumption, the application adapts to the limitations. On an unlimited receiver, the overall consumption is around 1,167Kbps on average (around 1.17Mbps) and is quite consistent throughout the call (see Image 20). When limiting other recipients, the consumption drops within the limits from the first seconds of the call and maintains steady consumption until the end of the call.

This type of information from different metrics allows us to understand the impact of limitations within the call and can showcase what possible improvements might be needed to either improve the quality for less capable environments or to create a balance of the quality for all participants. Such information can give insights into the architectural decision making, improvements implementation, and testing strategies to target problematic areas.

To get additional value, there are some opportunities for multi-user setup. For example, the multi-user infrastructure can be joined with the device farm, ensuring large numbers within one call with physical devices. In doing so, all of them would be split into groups to ensure a clear network environment and all of them can be limited as needed. Another solution could be to use our Loadero tool if virtual participants are suitable for the solution under test.

Conclusion

As we have thoroughly explored multi-user testing in audio and video communications, we can draw some conclusions that summarize the main idea. For one, multi-user testing can showcase all of our audio and video metrics for multiple participants within a single call in different configurations and number of senders and receivers. We at TestDevLab have worked to make such setup abilities and add-ons possible to ensure that the call is observed from the view point of different participants. The setup is flexible to suit different needs and ensure that problematic areas are addressed and tested, while benchmarking is ensured in the long run to observe the tendencies of participant behavior.

Want to measure the audio and video quality of your software solution? Our team of audio and video quality experts can help you perform audio and video quality testing to ensure that your solution is at the top of its game. Contact us and let’s discuss your project.

Tags:
multi-user testing