This post was edited / updated on February 19, 2021.
According to Microsoft’s instructions, VMQ should be turned off on 1 Gbps network adapters. There is a known issue with the Broadcom 1Gb VMQ implementation that is presumably fixed in the latest drivers. Make sure all your drivers are up to date.
Back to basics: What is a virtual machine queue (VMQ), why do you need it and why should you deploy it?
A virtual machine queue, or dynamic VMQ, is a mechanism for mapping physical queues on a physical network card with a parent network or a foreign operating system virtual network card (vNIC) or virtual machine network card (vmNIC). This mapping streamlines the processing of network traffic. Increased efficiency results in less CPU time in the main body and reduces network traffic latency.
The VMQ distributes traffic per vmNIC / vNIC, and each VMQ can use up to one logical CPU on the host, that is, the VMQ evenly distributes traffic among multiple guests (VMs) on a single host with a vSwitch (1 core per vmNIC / vNIC).
Note: vNIC refers to the host partition of the virtual switch in the Virtual NIC management system, and vmNIC is a synthetic network card in the virtual machine.
VMQ is enabled by default on Windows Server machines when vSwitch is created on at least 10Gig network adapters, and is useful when hosting multiple virtual machines on the same physical host.
The figure below shows the intrusion traffic when the VMQ feature of virtual machines is enabled.
[VMQ incoming traffic flow for virtual machines – source Microsoft]
When using 1Gig network adapters, VMQ is disabled by default because Microsoft does not see a performance benefit for VMQ on 1Gig network cards, and one CPU / core can stay in 1Gig network traffic without problems.
As I mentioned above, when VMQ is disabled, all network traffic to the vmNIC must be handled by a single core / processor, but when VMQ is enabled and configured, network traffic is automatically shared among multiple processors.
What if you have a large number of virtual machines on web servers on a host with at least two eight-core processors and a lot of memory, but you are limited by physical network cards with only 1Gig?
The answer is…
As I pointed out in previous blog posts, Message I and Post II, Windows Server 2012 R2 introduced a new feature called Virtual Receive Side Scaling (vRSS). This feature works with the VMQ when sharing the CPU load of received network traffic to multiple (vCPUs) within a virtual machine. This effectively eliminates the processor core bottleneck we experienced with a single vmNIC. To take advantage of this feature, both the host and the guest must be running Windows Server 2012 R2. As a result, VMQ needs to be deployed on the physical host and RSS enabled on the virtual machine, but so far Microsoft is not really implementing vRSS for host vNICs, it is for virtual machines only, so we are stuck on one processor in the Hosted Environment section of the Host Management section. The good thing is that the vNICs on the host side are locked to one processor, but they still get VMQ values assuming you have enough queues and they are distributed to different processors.
The requirements for VMQ deployment are as follows:
- Windows Server 2012 R2 (dVMQ + vRSS).
- Physical network adapters must support VMQ.
- Install the latest NIC driver / firmware (very important).
- Enable VMQ for 1Gig network cards in the registry, this step can be skipped if you have at least 10Gig adapters:
HKEY_LOCAL_MACHINE SYSTEM CurrentControlSet services VMSMP Parameters BelowTenGigVmqEnabled = 1
- Restart the host if you enable the registry key in step 4.
- Determine the Base and Max CPU values based on your hardware configuration.
- Specify values for Base and Max CPU.
- Enable RSS on virtual machines.
- Turn on VMQ in Hyper-V Settings for each virtual machine already in use ON by default.
What is a basic processor? It is the first processor used to handle incoming traffic for a particular vmNIC.
What is the largest processor? That is the maximum number of CPUs that we allow that network card to handle traffic.
Ok, so once this is explained, VMQ is determined step by step:
We have 8 physical 1Gig network cards and 2 X 8 cores (32 logical processors).
First, determine if HyperThreading is enabled by running the following cmdlet:
PS C: Get-WmiObject –Class win32_processor | ft –FeaturesOfCores, NumberOfLogicalProcessors car
As you can see, we have NumberOfLogicalProcessors as twice as NumberOfCores, so we know that HT is enabled on the system.
Next, we need to look at NIC teamwork and load distribution status:
PS C: Get-NetlbfoTeam | ft –Feature TeamNics, TeamingMode, LoadBalancingAlgorithm car
Once we have determined that HyperThreading is enabled and the group mode is switch-independent with the dynamic mode, we next proceed to Configure Basic and Maximum Processors.
Attention: before moving on to the task, one important point to consider is if the network card team is in switch-independent group mode and the load balancing is set to Hyper-V port mode or dynamic mode, the number of queues reported is the sum of all. queues received from team members (SUM-Queues mode), otherwise the number of queues reported is the minimum number of queues supported by a group member (MIN-Queues mode).
What is (SUM-Queues mode) and What is (MIN-Queues mode)?
The SUM-Queues mode is the total number of VMQ numbers for all physical network cards participating in the team, but the MIN-Queues mode is the minimum number of all physical network cards participating in the team.
For example, suppose we have two physical network cards, each with 4 VMQs, if the grouping mode is switch-independent with the Hyper-V port, the mode is SUM queues corresponding to 8 VMQs, but if the grouping mode is switch dependent with the Hyper-V port, the mode is MIN queues corresponding to 4 VMQs.
[You can refer to the table below in order to determine the Teaming and Load distribution mode, source – Microsoft]:
|Distribution mode → Team mode ↓||Address Hash modes||Hyper-V port||Dynamic|
|Switch to independent||Minijonot||Sum queues||Sum queues|
In my scenario, the NIC team is switch-independent with dynamic mode, so we’re at it SUM queues mode.
If the team is in Sum queues teamwork processors should overlap or minimize duplication. For example, on a 4-core host (8 logical processors) with the 2X10Gbps NIC team, you can set the first network adapter to use a 2-core processor and 4 cores; the other is set to use the core processor 6 and 2 cores.
If the team is in Minijonot mode the processor forces used by team members must to be identical, you must configure each member of the network card team to use the same cores, that is, each physical network card has the same function.
Now let’s first check if VMQ is enabled:
PS C: Get-NetAdapterVmq
As you can see, VMQ is enabled (= True) but has not yet been configured.
And here we have two combined network teams, each with 4 physical network cards and 16 queues, so the total number of VMQs per team is 64.
I use one combined group for the vmNIC (VM) and the other is used for the host vNICs.
We set up the Base and Max processors by running the following cmdlets for the team adapters below ConvergedNetTeam01:
PS C: Set-NetAdapterVmq –Name NIC-b0-f0 –BaseProcessorNumber 2 –MaxProcessors 8
PS C: Set-NetAdapterVmq –Name NIC-b0-f1 –BaseProcessorNumber 10 –MaxProcessors 8
PS C: Set-NetAdapterVmq –Name NIC-b0-f2 –BaseProcessorNumber 18 –MaxProcessors 8
PS C: Set-NetAdapterVmq –Name NIC-b0-f3 –BaseProcessorNumber 26 –MaxProcessors 8
As I mentioned above,Sum queues (you need to specify a base and maximum processor for each physical network card, not an overlap, but in our lab environment we didn’t have as many cores as we had queues, so we had to overlap, otherwise we’re wasting our queues.
Let’s run Get-NetAdapterVmq again and see the changes:
As you can see, the Base and Max processors are now set, we can run next Get-NetAdapterVmqQueue and this shows us how all queues are assigned over the VMQs of vmNICs to all virtual machines on that host.
Now let’s look at the result before and after the implementation of VMQ + vRSS:
In the guest operating system:
In the host:
In the guest operating system:
In the host:
Last but not least, good practices for determining VMQ:
- When using NIC grouping, always use a switch with independent dynamic mode whenever possible.
- Make sure that the base processor is never set to zero for best performance, as CPU0 handles special functions that no other processor in the system can handle.
- Remember, when you configure the Base / Max CPU and HyperThreading is enabled on your system, only the even number of CPUs is the correct CPU (2,4,6,8, etc.), If HT is not enabled, you can use an even and odd number (1, 2 , 3, 4, 5, etc …).
- In SUM queues Try to configure the Base and Max processors for each physical network card with as few overlaps as possible, this depends on the configuration of the host hardware with multiple cores.
- Only specify the maximum CPU values 1,2,4,8. It is good that the maximum number of processors exceeds the last kernel or exceeds the number of VMQs on the physical network card.
- Do not install Base & Max processors Multiplexor NIC Teamed Adapters, leave it as default.
Finally, I would prefer to enable VMQ on 1Gig network cards so that I can keep my network traffic distributed to as many processors / cores as possible.
For VMQ and RSS deep dives in this TechNet 3 series VMQ deep diving.
Hopefully this will help.
Until then, enjoy the weekend!