Introduction to QoS (Quality of Service)

In the shaping section and in the diagram where Rene says:

Above you can see we have 20 moments where we send for 10 ms. 20 x 10 = 200 ms in total. We have 20 pauses of 40 ms, 20x 40 = 800 ms in total.

CIR is 200Mbps and because of VoIP we want to minimize the one-way delay by using a Tc of 10ms. To achieve this Tc value using the formula we need a Bc = Tc * CIR = 10*200.000=2.000.000 bits.
If i have understood correclty the IOS automatically will calculate that in these 20 moments where we send for 10 ms each time we need to send 10 million bits to get *20 times = 200Mbits?
Thank you in advance.

Hello Marios

The whole reason shapers work in this way is because an interface is capable of either transmitting at 100% of its capacity or 0% of its capacity. There is no in-between. In order to employ shaping, over time, interfaces are made to transmit for only a certain percentage of the time, thus resulting in a perceived limitation of throughput over long periods. If an interface only transmits for 50% of the time, then it will on average, transmit at 50% of its rated throughput.

In this case, we want a 1 Gbps interface to transmit at 200 Mbps. That’s 20% of the rated speed, so we make the interface transmit 20% of the time.

However, in our case, we want the interval of transmission to be 10ms, in order to accommodate VoIP services. So yes, you do use the Bc = Tc * CIR formula. In this case CIR = 200 Mbps or 200 million bits per second. Now the shaper knows that it must send a maximum of 200 million bits every second. So every time interval of 10ms, the shaper must send how many bits?

Well, the IOS calculates this. Since transmission time is 10ms, and we’re sending traffic for only 20% of the time, that means we’re not sending traffic for 80% of the time. If 10 ms is 20% of the time, then another 40ms is the other 80% of the time.

Now over a time period of 1 second, you will have 20 such time periods of 10ms sending 40ms not sending. So that’s 20 * (10+40) = 1000 ms or 1 second. How many bits will you be sending in those 10 ms? Well, it’s the speed of the interface multiplied by the time period, so that is 1 Gbps * 0.01s = 10 million bits or 10 Mb.

Since the IOS has the time interval, the interface speed, and the CIR, it calculates the rest.

I hope this has been helpful!


Remember, an interface can either send at 100% of its capacity or zero.

Now because of VoIP we want the Tc to be 10 ms.

Since the CIR is 200 Mbps, or 200 million bits per second,

Whats the difference between hardware Queue and Software Queue ?

Is QOS by default enabled in routers and switches ?

Can we modify the queue size of input and output queue ?

What is transmit ring ?

i could not understand the time which we wait in shaping in rene example…

whats the way to modify the queue size on an interface ??..

Is there a true difference between bandwidth statement and bandwidth when using it within a QoS policy? Below I posted two examples from another site because it’s a good explanation of the two. However, with bandwidth remaining, you have to calculate the max-reserved bandwidth and at first, it seems like it gets less bandwidth allocated during times of congestion but then it goes on to say that it can use unused bandwidth from other classes. However, isn’t this the same behavior as bandwidht percent?

Bandwidth percent example
The mission-critical class gets a 200 Kbps bandwidth reservation since it is given a fixed sum guarantee of 20 percent. 20 percent of 1000k would be 200 kbps, so the voice priority class gets a maximum 200 kbps, mission critical receives 200 kpbs, the class interactive receives 100 kpbs and finally the class-default receives 250kpbs.

policy-map egress
priority percent 20 
class mission-critical
bandwidth percent 20
class interactive
bandwidth percent 10
class class-default
bandwidth percent 25
int s0/0
bandwidth 1000
service-policy output egress

Bandwidth Remaining example
See how bandwidth will be calculated when assigning the bandwidth always based upon a remaining value. Let’s consider the same example from above but change it from bandwidth percent to remaining bandwidth percent:

policy-map egress
priority percent 20
class mission-critical
bandwidth remaining percent 20
class interactive
bandwidth remaining percent 10
class class-default
bandwidth remaining percent 70
int s0/0
bandwidth 1000
service-policy output egress

Notice that the voice class still has a fixed sum guarantee of 20 percent of the interface configured bandwidth - .20 * 1000kpbs which is 200kpbs. But now we have to calculate the max reserve bandwidth since this must deducted first before determining the bandwidth remaining. As a reminder, the maximum reserved is how much you can ever reserve using the bandwidth or bandwidth percent statements. Cisco defines this formula as

Bandwidth available = Bandwidth fixed sum
guarantees – Max Reserve (75% of bandwidth by default)
Applying the formula to our example, we have 750Kpbs – 200 kpbs = 550kpbs. Now the 550 kpbs will be divided out based upon the pre-defined percentages fore each class. Therefore, the Mission Critical class would receive (.20)(550 kpbs) = 110 kpbs, the class interactive will be 55 kpbs, and the class-default would receive 385 kpbs. Also, if any class doesn’t use its full bandwidth allocation, the leftover will automatically be distributed to the other classes proportionally, based upon the configured percentages.

Hello Vjnetwork

The fundamental difference between the two options is this:

  • The priority percent command will reserve bandwidth based on the absolute total value of bandwidth available.
  • The priority remaining percentcommand will reserve bandwidth based on the relative available bandwidth.

What’s the difference? The priority percent command will always calculate its value based on the absolute value of the bandwidth, that is, the interface bandwidth. The priority remaining percent command will calculate its value based on the remaining bandwidth. The remaining bandwidth is a dynamic value, and this is at the very center of the difference between the two methods. That remaining value can have a maximum of 75% of the absolute bandwidth available but may change dynamically because the remaining bandwidth also includes whatever “free” or unused bandwidth there is from other classes.

For more information on the command, with further explanations of the differences, take a look at this Cisco documentation:

I hope this has been helpful!


I believe the 75% rule has been deprecated.

Hello Vjnetwork

Yes, you are correct, according to the following documentation, the max-reserved-bandwidth command is being phased out.
The following documentation indicates replacement commands that will achieve a similar result:

Now although the specific command has been depreciated, the concept of maintaining a minimum priority bandwidth for control plane data is still very much valid. As stated in the above documentation:

Effective with Cisco IOS XE Release 3.2S, the max-reserved bandwidth command is replaced by a modular QoS CLI (MQC) command (or sequence of MQC commands).

I hope this has been helpful!


Hello Narad

A software queue is a queue on which you can configure queueing mechanisms. Anything you configure for QoS in a Cisco device uses software queues. These dynamically use areas of RAM to create constructs that will function as configured. All of these functions use the main CPU and RAM to operate.

The hardware queues use dedicated hardware to perform queuing, scheduling, and packet memory management. These queues are generally “hard-wired” and are not configurable. Their arrangement is also platform-specific. Lower-end devices will have a single hardware queue per port. More advanced devices will have multiple hardware queues in different arrangements such as 2q3t or 8Q3T or 1P1Q3T or 2P6Q3T. These are different models of queuing. Some more advanced platforms will let you choose and configure which of the available arrangements you want to apply. These QoS arrangements are called QoS Scheduling. The following document describes QoS scheduling in much detail.

QoS is not enabled by default.

It depends upon the feature being implemented. For example, in QoS LLQ the default queue limit is 64 packets, but this can be modified. Hardware queue sizes cannot be modified.

Transmit ring is a control structure used by Cisco devices to control which buffers are used to receive and transmit packets to the media. More info about these can be found here:

An interface can only transmit traffic at its rated speed. A 1Gbps interface can transmit either 0 traffic or 1Gbps traffic. It can’t transmit at 200 Mbps. However, to achieve a shaping limit of 200Mbps, it will operate for 20% of the time. So over one second, it will transmit for 200 ms, and stop transmitting for 800ms. If it transmits at 1Gbps for 200ms and stops for 800ms, then on average over one second, the interface has transmitted at 200Mbps, achieving the shaping limitation…

I hope this has been helpful!


seems to me that good traffic management could be done by holding/delaying TCP acks. This would only work for tcp, but I think it would work good. TCP is self clocking based on speed of acks. you slow the acks, the clock runs slower (and stream is slower). is there any qos that is done this way? here some reference on tcp self clocking.

Hello Rod

Remember that a TCP connection takes place between hosts. The hosts are responsible for sending ACKs, and any delay that they may introduce in order to limit the traffic of a TCP session can only be controlled by the hosts themselves. Network devices cannot selectively introduce a delay to TCP ACK messages in order to manage traffic.

However, TCP can indirectly affect TCP session traffic by selectively and randomly dropping some TCP segments. This will result in missing segments, and the receiver of that TCP session will inform the sender with an ACK number that corresponds to the last successfully received byte. This will introduce a delay, but will also cause the TCP session to slow down because missing segments will make the window size smaller.

In summary, by introducing an intentional and controlled traffic loss, TCP sessions will adjust their transmission rates, allowing network devices to perform traffic management on a particular link.

More about how this can be implemented can be found at the following lesson:

I hope this has been helpful!



When the host and IP phone transmit data and voice packets destined for the host and IP phone on the other side, it is likely that we get congestion on the serial link. The router will queue packets that are waiting to be transmitted but the queue is not unlimited.

Should it be, ’ but the que is limited’.

Hello Yonas

“The queue is not unlimited” is the same as “the queue is limited”. It’s just a different way of saying it. Indeed Rene chooses to say “not unlimited” which means “limited”, but it’s a matter of preference.

But thanks for keeping an eye out for discrepancies, it’s always helpful to get feedback of all types!



Great article!. One question. I’ve noticed that the images with a router and ingress and egress actions to its left and right show Gi0/1 as the names of the input and output interfaces. Shouldn’t they be different interfaces?

Oh, and another question: in the section about congestion avoidance its mentioned that there is maximum threshold above the one all packets are drop. However, the illustration in that section shows a space between that maximum threshold and the “queue full” arrow. What represents that space?. If packets are drop just after the maximum threshold is reached, how would that space be filled with packets if packets are already being drop and no more packets can enter the queue?


Hello José

Yes you are correct. I’ll let Rene know to make the correction on the diagrams…

This is to indicate that the Maximum Threshold is a value that you can modify. It does not necessarily have to be at the maximum of the actual queue. And remember, that the Maximum Threshold here is an average, which means that the actual queue capacity can surpass that at any given time. For more information about how WRED works and how the thresholds are defined and applied, take a look at the following lesson:

I hope this has been helpful!


1 Like

Is the following summary of QoS correct?
Classification - ACL / NBAR
Marking - ToS / DS
Action - Congestion Management (Queuing) / Policing and Shaping / Congestion Avoidance

Hello Robert

Yes, that is indeed correct. Classification uses either the info in the headers (using ACLs) or payload inspection (using NBAR) to determine what kind of packet (VoIP, video, web, FTP, email, etc…) each one is. Classification is essentially used to identify the application from which packets are sent. Classification is locally significant and is not communicated to any other network device.

Marking changes the ToS in the header of the IP packet and/or the CoS in the tag of the Ethernet frame. Marking can be used to communicate to devices downstream about how to prioritize such traffic based on predefined agreements. Classification is used to mark packets/frames appropriately.

Once marked, network devices can be configured to act upon the markings based on predefined QoS mechanisms, including queuing, policing, shaping, congestion avoidance, and others.

I hope this has been helpful!



Good day,

I have a couple of questions:

  1. Is it the best practice to apply the shaping on the output of an interface and the policy on the input or can this vary according to the needs?

2.An administrator must design a solution to connect a client to the Internet. And the solution will include a Layer 3 circuit with a CIR of 50 Mbps from the service provider. The throughput from the provider’s switch to the customer’s router is 1 Gbps. What solution should the engineer include in the exit interface to avoid possible problems with choppy voice traffic. Shaping or Policy?

  1. When is it more convenient to use shaping or policy, with what types of traffic?

Thank you

Hello @r41929998 ,

In a nutshell, shaping “buffers” your exceeding traffic and policing “drops” (or remarks) your exceeding traffic.

A service provider will use (inbound) policing to enforce a CIR rate. From the customer’s side, you could configure (outbound) shaping to prevent the service provider’s policer from dropping your traffic.

If you have a 1 Gbps connection and a 50 Mbps CIR, the service provider will police at 50 Mbps. That means they’ll drop everything above 50 Mbps.

You can configure shaping to rate-limit your traffic to 50 Mbps, but you have to be careful. Shaping increases your delay, so it’s unsuitable for delay-sensitive applications like VoIP.

It’s best to combine shaping with something like LLQ:

You put your delay sensitive traffic like VoIP in a priority queue with a limit. Other important traffic can be shaped if needed.


Hello, everyone!

I have a few QoS questions.

  1. When it comes to QoS, I often see IP phones being represented as the originators of voice traffic. The question is, what if a computer is running an application like Discord, Skype or even Cisco Webex? How would we be able to classify and priotize voice traffic coming from such applications? Would NBAR also work here?
  2. If we imagine a packet being sent between two routers → R1 and R2, these delays are all involved except for the queuing delay if there is no congestion, right?

    In other words, it would take some ms to process the packet and perform all the necessary tasks to generate and forward it, it would take some ms of time to put the bits on the medium, and it would take a few ms for the frame to actually travel across the medium.
    But when it comes to the Queuing delay, that one would only be involved if we experienced congestion, right? If there is no congestion, there is no need for any sort of queuing thus there’d be no queuing delay.

Thank you.


Hello David

When it comes to QoS, IP phones are typically easy to work with because they classify and mark voice traffic by default, and their traffic can be easily identified by other network devices due to the dedicated voice VLAN they use as well as the use of CDP or LLDP to identify the devices themselves. When voice traffic originates from applications like Discord, Skype, or Cisco Webex on a computer, the classification and prioritization become less automated, but it can still be achieved using several tactics:

  • Application-level marking can be performed by certain software applications (such as VoIP clients or videoconferencing software) to mark packets sent from these applications for QoS purposes.
  • Application recognition/deep packet inspection can be used by network devices to classify traffic coming from particular applications. This is where NBAR can be useful, inspecting packets deeply to recognize various applications, including voice services like Skype or Webex. This inspection goes beyond basic port numbers and can identify the application-specific signatures within the traffic.

Yes, that is correct. Queuing delay is only experienced when there is congestion. All the other forms of delay mentioned occur always.

I hope this has been helpful!