Introduction to QoS (Quality of Service)

Hello-

When the host and IP phone transmit data and voice packets destined for the host and IP phone on the other side, it is likely that we get congestion on the serial link. The router will queue packets that are waiting to be transmitted but the queue is not unlimited.

Should it be, ’ but the que is limited’.

Hello Yonas

“The queue is not unlimited” is the same as “the queue is limited”. It’s just a different way of saying it. Indeed Rene chooses to say “not unlimited” which means “limited”, but it’s a matter of preference.

But thanks for keeping an eye out for discrepancies, it’s always helpful to get feedback of all types!

Laz

Hi,

Great article!. One question. I’ve noticed that the images with a router and ingress and egress actions to its left and right show Gi0/1 as the names of the input and output interfaces. Shouldn’t they be different interfaces?

Oh, and another question: in the section about congestion avoidance its mentioned that there is maximum threshold above the one all packets are drop. However, the illustration in that section shows a space between that maximum threshold and the “queue full” arrow. What represents that space?. If packets are drop just after the maximum threshold is reached, how would that space be filled with packets if packets are already being drop and no more packets can enter the queue?

Regards.

Hello José

Yes you are correct. I’ll let Rene know to make the correction on the diagrams…

This is to indicate that the Maximum Threshold is a value that you can modify. It does not necessarily have to be at the maximum of the actual queue. And remember, that the Maximum Threshold here is an average, which means that the actual queue capacity can surpass that at any given time. For more information about how WRED works and how the thresholds are defined and applied, take a look at the following lesson:

I hope this has been helpful!

Laz

1 Like

Is the following summary of QoS correct?
Classification - ACL / NBAR
Marking - ToS / DS
Action - Congestion Management (Queuing) / Policing and Shaping / Congestion Avoidance

Hello Robert

Yes, that is indeed correct. Classification uses either the info in the headers (using ACLs) or payload inspection (using NBAR) to determine what kind of packet (VoIP, video, web, FTP, email, etc…) each one is. Classification is essentially used to identify the application from which packets are sent. Classification is locally significant and is not communicated to any other network device.

Marking changes the ToS in the header of the IP packet and/or the CoS in the tag of the Ethernet frame. Marking can be used to communicate to devices downstream about how to prioritize such traffic based on predefined agreements. Classification is used to mark packets/frames appropriately.

Once marked, network devices can be configured to act upon the markings based on predefined QoS mechanisms, including queuing, policing, shaping, congestion avoidance, and others.

I hope this has been helpful!

Laz

2 Likes

Good day,

I have a couple of questions:

  1. Is it the best practice to apply the shaping on the output of an interface and the policy on the input or can this vary according to the needs?

2.An administrator must design a solution to connect a client to the Internet. And the solution will include a Layer 3 circuit with a CIR of 50 Mbps from the service provider. The throughput from the provider’s switch to the customer’s router is 1 Gbps. What solution should the engineer include in the exit interface to avoid possible problems with choppy voice traffic. Shaping or Policy?

  1. When is it more convenient to use shaping or policy, with what types of traffic?

Thank you

Hello @r41929998 ,

In a nutshell, shaping “buffers” your exceeding traffic and policing “drops” (or remarks) your exceeding traffic.

A service provider will use (inbound) policing to enforce a CIR rate. From the customer’s side, you could configure (outbound) shaping to prevent the service provider’s policer from dropping your traffic.

If you have a 1 Gbps connection and a 50 Mbps CIR, the service provider will police at 50 Mbps. That means they’ll drop everything above 50 Mbps.

You can configure shaping to rate-limit your traffic to 50 Mbps, but you have to be careful. Shaping increases your delay, so it’s unsuitable for delay-sensitive applications like VoIP.

It’s best to combine shaping with something like LLQ:

You put your delay sensitive traffic like VoIP in a priority queue with a limit. Other important traffic can be shaped if needed.

Rene

Hello, everyone!

I have a few QoS questions.

  1. When it comes to QoS, I often see IP phones being represented as the originators of voice traffic. The question is, what if a computer is running an application like Discord, Skype or even Cisco Webex? How would we be able to classify and priotize voice traffic coming from such applications? Would NBAR also work here?
  2. If we imagine a packet being sent between two routers → R1 and R2, these delays are all involved except for the queuing delay if there is no congestion, right?

    In other words, it would take some ms to process the packet and perform all the necessary tasks to generate and forward it, it would take some ms of time to put the bits on the medium, and it would take a few ms for the frame to actually travel across the medium.
    But when it comes to the Queuing delay, that one would only be involved if we experienced congestion, right? If there is no congestion, there is no need for any sort of queuing thus there’d be no queuing delay.

Thank you.

David

Hello David

When it comes to QoS, IP phones are typically easy to work with because they classify and mark voice traffic by default, and their traffic can be easily identified by other network devices due to the dedicated voice VLAN they use as well as the use of CDP or LLDP to identify the devices themselves. When voice traffic originates from applications like Discord, Skype, or Cisco Webex on a computer, the classification and prioritization become less automated, but it can still be achieved using several tactics:

  • Application-level marking can be performed by certain software applications (such as VoIP clients or videoconferencing software) to mark packets sent from these applications for QoS purposes.
  • Application recognition/deep packet inspection can be used by network devices to classify traffic coming from particular applications. This is where NBAR can be useful, inspecting packets deeply to recognize various applications, including voice services like Skype or Webex. This inspection goes beyond basic port numbers and can identify the application-specific signatures within the traffic.

Yes, that is correct. Queuing delay is only experienced when there is congestion. All the other forms of delay mentioned occur always.

I hope this has been helpful!

Laz

One of the best, no, the best introduction and explanation I have seen thus far. Thank you Rene.
Michel

1 Like

Hello, everyone.

If someone configures QoS classes and prioritizes different types of traffic and let’s say that they configure CBWFQ for any non-voice data and LLQ for voice data, these shouldn’t take effect unless the traffic amount exceeds the link’s or the device’s bandwidth capacities, right? Or in other words, the configured traffic prioritization and queuing and such doesn’t happen until we have congestion, correct?

So then it’s correct to say that if there is no congestion, the devices will forward the received traffic using the FIFO mechanism, right? Since if there is available bandwidth for everyone, why should someone be given preferential teratment?

My second question is, if we connect two devices using a 1 gigabit link, it’s not exactly right to say that congestion will occur once we go over 1 gigabit, right? Because we should also account the devices’ performance into this. Even if the link is 1000 mbits, congestion can occur earlier if the device’s performance can only handle, let’s say, 250 mbits, right?

Thank you.
David

Hello David

Nice to see you after a while!

Yes that is correct. Queuing only takes place when there is congestion. Think about it this way. If you go to a bank, and there is no lineup, you are served immediately. There is no queue to manage!! So there is no meaning to employing CBWFQ/LLQ or any other queue management algorithm if there is no queue!

Ah yes. The question here is “what is congestion?” Is it when the port itself reaches its capacity or does congestion also occur when the CPU and memory resources of the network device are overwhelmed? Or is it when the backplane of the device has reached capacity, even if the specific port in question has not?

QoS mechanisms are primarily designed to deal directly with port-level congestion. When the port’s egress queue (or ingress queue, depending on the device and QoS design) fills up, QoS mechanisms like traffic shaping, policing, scheduling, and queue management kick in.

So to answer your question, congestion, as it pertains to QoS, is most commonly tied to oversubscription of physical or logical interfaces and queue overflow. CPU, memory, or backplane congestion can also exist but are not the typical focus of QoS mechanisms.

As always, it’s a pleasure to answer your questions…

I hope this has been helpful!

Laz

Hello Laz.

Thank you so much, I appreciate your help.

However, I think we might have gotten one thing wrong. The assumption is that QoS mechanisms are only there during congestion. However, I think this could be a misconception.

I did happen to ask some people (one of them is doing QoS support for Cisco) and it turns out that the QoS mechanisms are always there, regardless of whether we are congested or not.

It’s explained well in this reddit post
https://www.reddit.com/r/networking/comments/qtqz9i/does_qos_really_matter_when_the_bandwidth_is/

To summarize. If we had no congestion at all, so “no QoS mechanisms should kick in”, everything would be FIFOed and nothing would be prioritized.

But consider this: if you happen to not be congested at all and you suddenly receive 100 packets where the 100th one is a voice packet, you’d still have to temporarily queue the 99 packets and then send them. Although this process would be extremely fast, this could still introduce ms of latency.

So I was later told that if you put packets into an LLQ (for example), regardless of whether we are congested or not, that’s where they’ll end up.

I suppose that considering “congestion” and “latency” are two different perspectives in QoS. You can have a link that is only 50% utilized but if there are 100 packets and a voice packet is the 100th one, the 99 packets before it would have to go out first (FIFO), introducing very small delay but still crucial for voice applications.

I haven’t really find any official documentation stating that QoS isn’t active during non-congestion times either.

What do you think about this?

David

Hello David

I think we’re addressing two different issues here. The first is, that there are cases where you have a 1Gbps link and it seems underutilized at only 200 Mbps for example. Now for this issue, I had an extensive conversation with @ReneMolenaar and it turns out that there is a major difference in how routers and switches handle queueing. What I described in my previous post is correct for the most part for routers. However, the behavior of switches is much more complicated. We now have several NetworkLessons notes that clarify these issues:

Now the other issue has to do with how a port perceives congestion. Concerning your statement:

This is true assuming that the arrival of 100 packets on a port occurs at a rate that will trigger a threshold (whether in a simple manner on a router, or in a more complex manner on a switch). What triggers that threshold? The rate at which the packets arrive. If 100 packets each a size of 1500 bytes arrive within 1 ms on a port, that’s 1.2Mb in 1 ms (check my math please :innocent:). Converting to seconds, that comes to 1.2 Gb/s on a GigabitEthernet port, which means you have congestion, which means you have queuing which means the voice packet will be prioritized.

If the 100 packets arrive at a consistent rate in 100 ms, doing the math, that’s 12 Mbps, which should not hit any threshold, either on a router or a switch. So no prioritization will take place.

So a statement like “if this type of packet comes in, it needs to be put on the wire ahead of any other packets that are waiting” assumes there are packets waiting. If thresholds are not met, packets are simply not waiting. Does that make sense? The question becomes “when are packets waiting?” For routers, it is usually when you get to the max throughput of the interface in question. For switches, it is likely less than the max throughput, for the reasons explained in the notes, but some threshold exists.

For such nuanced behaviors, it’s difficult to find documentation that explains such details. I believe that getting hands-on experience and seeing how devices behave in each situation is the best solution to understanding these features.

I hope this has been helpful!

Laz

Hello Laz.

Great explanation.

Unfortnately, it’s hard for me to tell at this point since I could not find something that just… “officially” states this. It feels like there’s always a different explanation to this but I agree the most with what you’ve said.

I’ll leave it as it be for now.

For such nuanced behaviors, it’s difficult to find documentation that explains such details. I believe that getting hands-on experience and seeing how devices behave in each situation is the best solution to understanding these features.

I agree. QoS is very difficult when it comes to understanding what is actually happening in the background since it’s impossible to even lab properly in an emulator like CML where you just cannot properly simulate congestion and latency.

Thank you so much.

David

1 Like

Hi David,

When I studied CCIE, QoS was probably the most difficult topic.

With routing protocols like OSPF, EIGRP, BFP, etc. the results are predictable. It works or it doesn’t. Same with something like MPLS or multicast. When it doesn’t work, we either lack some knowledge because we thought it would work in a certain way but it’s different in practice or we made a mistake in the design.

With QoS, you have all these exceptions on different platforms and it’s not as black and white.

With congestion, you can have an interface that’s congested 100% of the time in a 60 second period. It can also get congested within a second only because of (micro) packet bursts and then be 100% idle for 59 seconds. You’ll need queuing for this.

If you really want to figure out how it behaves, you’ll need to dive into how QoS works for the platform and you’ll need real hardware. You’ll need to test and benchmark it. Ideally with something that generated traffic and measures delay, latency, jitter. A simple tool could be IP SLA:

There are other tools though.

Rene

Hello Rene.

You’re right. I am studying for my ENCOR exam so I won’t go any further than this. If I ever get to work with QoS in production, I’ll probably learn most of the “secrets” that I didn’t know about until now.

Thank you and Laz for taking your time to discuss this with me.

I have one more question. What exactly is the difference between speed and bandwidth? I understand that speed is how fast the link can transmit the data while bandwidth is its maximum capacity. The thing is, these two are always the same value, so I tend to mix them a lot.

If that’s the case, I am not quite sure if I understand what speed means here. If we have a link with a speed of 2 gbits, would it transmit data (lets say, 100 mbits) faster than a 1 gbit link? Or what if we (let’s use our imagination here for a second) had a link with a BW of 1000 mbits but the speed would only be 500 mbit/s? The link’s capacity would not be reached but it still wouldn’t be able to transmit all 1000 mbits within a second, right?

That’s all, thank you.

David

Hello David

Speed and bandwidth are often used interchangeably to refer to similar things. When we use them loosely in conversation, what each one means often has to do with the context they are used in. But strictly speaking, they do have specific definitions.

Bandwidth is the maximum capacity of an interface, that is, the maximum amount of data that an interface can transmit. A Gigabit interface will have a bandwidth of 1000 Mbps or 1 Gbps.

Now speed is the rate at which data moves from one point to another. More specifically, it is the average rate at which data moves from one point to another over a period of time. (You’ll see why I make this distinction in a moment). It depends on a combination of factors, including the bandwidth of the link, latency, the quality of the transmission, as well as any QoS mechanisms that may be in place.

Now to address your specific questions:

To answer this question, we must first understand that at any point in time, an interface will either be sending data at its bandwidth rate or won’t send data at all. So a 1Gbps interface, at any instantaneous moment in time will be sending data at 1Gbps or won’t be sending data at all. So any “speeds” that are achieved that are smaller than the bandwidth are achieved due to intermittently sending data such that the average amount of data sent over a period of time (i.e. per second) is the speed.

So if you measure a speed of 100 Mbps on a GigabitEthernet interface, that means that over that one second of measured time, data is being transmitted for only 1/10th of a second resulting in an average of 100 Mbps.

So a 2Gbps link will transmit 100Mbps at the same average rate as a 1Gbps link simply because the 2 Gbps link will transmit for 1/20th of a second (on average) while the 1 Gbps link will transmit for 1/10th of a second, resulting in the same average speed.

If the speed is the same on two links of different bandwidth, then by definition the transmission will take the same amount of time, since speed is simply a measure of the rate at which data moves from one point to another.

Now let me throw in another definition which is also slightly different: throughput. I won’t answer it here, but I have created a NetworkLessons note on the topic. Take a look for more info.

I hope this has been helpful!

Laz

Hello Laz.

Thank you, that makes sense.

I have some more questions, though. With LLQs, if we don’t define a policing limit, the LLQ could starve out the other queues if it’s constantly receiving a stream of VOIP packets, for example.

What I don’t quite understand is that if you reserve, let’s say, 30% of BW for VOIP traffic and then you set a limit on it to not starve out the other queues, isn’t this the same as if you had just reserved less bandwidth for VOIP traffic?

Since it’s outside of the original reservation, you’re still limiting it to a lower value.

Also, how does policing prevent queue starvation? With policing, we basically limit the amount of packets that can be in that priority queue, but that doesn’t change the fact that if a priority queue fills up with packets, the scheduler will always try to empty that priority queue first before moving on to others, potentially starving them, or not?

Thank you.
David