IP SLA (Service-Level Agreement) on Cisco IOS

I have a question here , if I configured IP SLA with ICMP-echo and the frequency is every 10 seconds,
This means that it send echo request every 10 seconds, if the destination stopped replying , after how many echo requests failure the ip sla would be considered down, is there a certain count, 1 failure, 2 failures… 10 ?

Thanks

Hello Hisham

What you are looking for is the threshold parameter of the IP SLA configuration. By default, the threshold is set to 5000 milliseconds, or 5 seconds, regardless of the frequency of the IP SLA. But you can change this accordingly. To find out more about these parameters and how they should be configured, take a look at this NetworkLessons note on IP SLA Parameters.

I hope this has been helpful!

Laz

Hey all,

I have a question around best-practice of the SLA timeout, specifically relating to missed probes - I find mixed answers to this in documentation.

Take for example the following situation -

I setup an SLA monitor for ICMP-ECHOS to a neighbour that are sent every 5 seconds, but only want to declare the router dead after it misses 3 echos in a row -

If I configured this with a Track object and set the delay down timer, would I count the timer from the first missed ICMP echo? e.g. set the delay timer to then be 11 seconds rather than 16?

Here’s my logic - fist ICMP echo is missed so the IP SLA reports a timeout. I then start a timer for 11 seconds, because that would give it time for another 2 probes to be sent, totaling 3. If neither of those responds then I declare the router to be dead.

Does that make sense? Should the delay timer be a bit longer than the echo to give it a chance? This is where I find the advice a bit lack-luster.

Appreciate any help and guidance on this one!

Hello Sebastian

The delay down timer will begin counting from the first failed ICMP echo. The IP SLA will consider an SLA failed if either the configured timeout or the configured threshold are exceeded. See the NetworkLessons Note on IP SLA parameters for more information about that.

So to be precise, the down timer will begin counting after the timeout or threshold values have been exceeded. If they are set to 1 second for example, then the delay down timer will start counting 1 second after the initial ICMP echo request was sent.

Based on this, it makes sense for you to set the delay timer to be 11 seconds since you want three echoes to fail before the SLA is considered failed. Let’s take a look at the process second by second, assuming a threshold of 1 second.

0 - echo request sent
1 - echo reply was not received, SLA fails, delay down timer is started
2
3
4
5 - echo request sent, delay down timer is at 4
6 - echo reply was not received, SLA fails, delay down timer is at 5
7
8
9
10 - echo request sent, delay down timer is at 9
11 - echo reply was not received, SLA fails, delay down timer is at 10
12 - delay down timer is at 11, the down delay has expired, SLA now requires a reaction event (i.e. any tracking associated with SLA is triggered).

You could theoretically set it to 10 seconds, but that would make the timer expire exactly the same time as the third echo reply fails, so leaving an extra second is helpful.

Note that if your threshold/timeout values are larger, then you will have to appropriately change your delay down timer as well to accommodate the slightly larger times needed to define an SLA as down.

I hope this has been helpful!

Laz

1 Like

Hi i have a doubt about this configuration
hi i have a question

i want explanation about this configuration
what does mean threshold timeout and frequency ?
and what do the command track 1 delay down ?
thanks

ip sla 1
icmp-echo 10.1.12.100 source-interface vlan xyz
threshold 600
timeout 600
frequency 10


track 1 ip sla 1

delay down 5

Hello Ugo

Take a look at this lesson that describes the IP SLA configuration in detail:

Briefly:

  • threshold - the maximum time under which a response is considered successful. Anything about this would be considered a failure.
  • timeout - the amount of time the device will wait for a response. This should not be less than the threshold
  • frequency - how often the SLA test will be applied. In this case, it’s how often the ping will be sent.

Now the track 1 ip sla 1 command simply enables the SLA feature for this particular setup. The delay down 5 command is used as a kind of dampening mechanism for links that may be flapping. Take a look at this post for more info:

I hope this has been helpful!

Laz

Hello, everyone!

I have a few questions to confirm. I am not quite sure whether the timeout vs threshold explanation is 100% correct.

I don’t think that anything that goes above the threshold is considered a failure.

I’ve configured the timeout to be 100 seconds while the threshold was set to 100 ms which is a significantly lower value.

To summarize and provide more clarity, here’s my configuration.

The operation timeout is set to 100 seconds
The operation frequency is set to 100 seconds
The operation threshold is set to 100 milliseconds

R1 is connected to R2 (192.168.12.2). This is an icmp-echo operation which is used for basic connectivity verification. Once I schedule it, the first packet is successsful.
obrázok

obrázok

Since the frequency is set to 100 seconds, the next packet should be sent once the operation ttl hits 3500 seconds. However, I will shut down the link between R1 and R2 to break the connectivity.

After 100 seconds, another packet is sent.

However, notice that the number of failures does not increment, despite the threshold being already exceeded (20 seconds have passed since this packet was sent).
obrázok

The operation is considered as failed and the counter increments only after the timeout (100 seconds) expires.


obrázok

From how I understand the threshold, it’s a value that once exceeded, the administrator can configure the device to take a specific action, such as generate an SNMP trap. It’s basically “this is taking a bit too long, let’s raise some alarms” value.

Laz or Rene, can you confirm this please and verify my understanding? Thank you!

Hello David

Yes, you are correct. The threshold specifies after how much time a “reaction event” will take place. The timeout is the amount of time an IP SLA will wait for a response from its echo request packet. If this is exceeded, then a failure is recorded. Take a look at this NetworkLessons note on the topic.

I know you set the parameters to extreme values to test the operation, however, there are some guidelines that should be used when setting the threshold, timeout, and frequency values. These are outlined in the links to the related IP SLA command reference found in the note I linked to above.

The only comment I’m questioning from your post is this:

The next packet should be sent once the 100 seconds of the frequency elapse, not once the operation time to live elapses. The Operation time to live is the duration that the IP SLA will function. If you don’t configure it, it will have a default value of 3600 seconds, or one hour, and this is why you have a value of 3595 in your output.

I hope this has been helpful!

Laz