QoS for High Utilization

raviece09 · July 21, 2024, 2:27pm

Hi Team,

i need some guidance on Qos . our network is facing the high receive utilization for one of the branch site. During business hours its crossing the 95%. we have legacy Hub and spoke infrastructure. that means all the traffic should pass through DC. DC link is 1Gbps Qos is applied on DC link with shaping average of 200mbps to the branch site. But still we see high receive utilization on branch site . policing is not agreed in our infra. kindly suggest how to solve this with out the link upgrade.

Thanks,
Ravi.

lagapidis · July 23, 2024, 10:14am

Hello Ravi

First of all, remember that no matter what QoS policies you employ, if there is a high enough volume of traffic, you will eventually reach the configured shaping average of 200 Mbps. So it is not unusual to reach utilization of 95 to 99% if your shaping average is set to the actual maximum of the gi1/0/1 interface of your branch router. (that is indeed best practice by the way…)

These numbers are not necessarily bad, since they are to be expected based on your QoS setup. The question you must ask is, do you see a degradation in the quality of the network services that the users at that branch are experiencing? Are you experiencing packet loss and dropped frames? If not, then simply seeing 95 to 99% is not necessarily bad. Indeed, it means that your QoS policies are making efficient use of the available bandwidth for the volume of traffic that is being received.

Now having said that, if you are experiencing degradation in the actual performance of network services at the site, and if you are seeing dropped frames due to congestion, then this is indeed a problem, and must be dealt with.

Just to confirm, you have configured on the gi 1/0/1 interface of your DC1 device queuing and shaping in an outbound direction, for each of the remote sites, correct? That has been set to an average of 200Mbps for Site 1.

You have also configured classification and marking in an inbound direction on Gi1/0/2 and quing and shaping on an outbound direction on Gi 1/0/1 on Site 1, correct? These configs should not affect any of the Rx issues you are facing at the site, since these policies act upon Tx traffic from site 1.

The first thing I would check is the queuing and shaping you have configured on the WAN interface of DC1 for outgoing traffic that is specifically destined for Site 1. Ensure that it is configured correctly to maximize queuing of congestion so that you can minimize frame drops.

Secondly, you could implement a service policy on your Gi1/0/2 interface on Site 1 in an outbound direction, as suggested. Although this would not directly affect the shaping and queuing of incoming traffic to Site 1, it can be used to make the TCP window size smaller for your TCP session. How?

Well, by limiting the rate of outbound traffic on the LAN interface of Site 1 using an outbound service policy, you are essentially affecting the rate at which TCP sessions send their traffic. If congestion occurs due to the shaping that you have placed there, the TCP receivers will start delaying their TCP ACKs that are sent back to the sender. This “delay” will result in the sender reducing their window size, and reducing the amount of traffic they send. In other words, if ACKs are delayed or paced, the sender may reduce its window size, thus sending data at a slower rate. This takes advantage of the fact that TCP adjusts its window size based on the perceived network congestion, which is influenced by ACK timing.

Take a look at this lesson for more information about this behavior:

Now this will only work on TCP traffic. UDP traffic will remain unaffected.

This information has hopefully given you some insight into your current setup and how you can tweak and adjust it to attempt to get a better result. Let us know how you get along, and if you need any further help, let us know!

I hope this has been helpful!

Laz