BGP Neighbor Adjacency States

Hello Kunj

Unfortunately, the answer is not very technical. As with many debugs, sometimes events occur quite quickly, and some information may simply be omitted. In this case, it seems that R1 simply went through the states of the FSM quite quickly, so the debug displayed multiple events using a single statement saying that it went from Idle to Active. On R2, the event was not so quick, so it displayed the idle to connect state change.

In the lesson, Rene states that “(it doesn’t show the Connect state in the debug)”.

I hope this has been helpful!

Laz

Hello Rene/Las,
Can u please confirm why and when we need to configure BGP neighborship with APIPA address?
Regards
Unni

Hello Unni

APIPA or Link Local addresses for IPv4 should never be used as the IP addresses of BGP peers. This goes contrary to the address usage rules dictated in RFC 3927. However, some vendors and systems do use them. One such situation is with Amazon’s AWS service, an example of which can be found here. This has been known to be used over an IPsec tunnel setup to Amazon VPC. I am not familiar with other uses of APIPA addresses for such purposes.

I hope this has been helpful!

Laz

When the router is in the OpenConfirm state, what happens if a keepalive is not received? Does it go back to Idle?

Thanks,
Michael

Hello Michael

When all else fails, check the RFC! :sunglasses: According to RFC4271 (pages 67-68), the OpenConfirm state is maintained until a Keepalive is received, in which case it goes to Established, or a notification message is received, in which case it goes to Idle. It also states that:

If the HoldTimer_Expires event (Event 10) occurs before a KEEPALIVE message is received, the local system: [among other things] changes its state to Idle.

Now the HoldTimer is a value that is contained in a field within the BGP OPEN message. The value is negotiated. Again, according to Page 13 of the RFC:

This 2-octet unsigned integer indicates the number of seconds the sender proposes for the value of the Hold Timer. Upon receipt of an OPEN message, a BGP speaker MUST calculate the value of the Hold Timer by using the smaller of its configured Hold Time and the Hold Time received in the OPEN message. The Hold Time MUST be either zero or at least three seconds. An implementation MAY reject connections on the basis of the HoldTime. The calculated value indicates the maximum number of seconds that may elapse between the receipt of successive KEEPALIVE and/or UPDATE messages from the sender.

During other negotiations, as described in the RFC, there are cases where the HoldTime can also be set to a value of up to four minutes in some cases.

I hope this has been helpful!

Laz

1 Like

@lagapidis: Thanks for the pointer, that answers my question!

Michael

1 Like

Hi all,

I think this is the best explanation I’ve read but still, after this, the OCG, RFC, CBT Nuggets, and other youtube videos, I’m still a little confused about the relationship and interactivity between the Connect and Active states.

From some playing around in GNS3 with debugging and packet captures it seems like there may some relationship between Connect/Active and whether a router becomes the active or passive side of the neighbor negotiation process. Is that on the right track or no?

Other than that it seems kind of random whether a router enters Connect or Active immediately following the Idle state.

Do you know how specific the ENCOR exams are with regard to this type of thing? Generally speaking, compared to prepping for the CCNA, I find myself digging really deep on every subject so far and I’m not sure if that’s the best use of my study time or if I should make myself stop at a certain point and move on.

Hello Aaron

The processes involved in the BGP finite state machine are actually more complicated than those described in the lesson. If you want to get into the full detail, check out Section 8.2.2 Finite State Machine of RFC 4271 that describes the process in detail.

The finite state machine of BGP has many timers and various events that determine the change from state to state. Concerning the Connect and Active states, according to this RFC, there is only one situation where the connect state will move to the active state:

 If the TCP connection fails (Event 18), the local system checks
      the DelayOpenTimer.  If the DelayOpenTimer is running, the local
      system:
        - restarts the ConnectRetryTimer with the initial value,
        - stops the DelayOpenTimer and resets its value to zero,
        - continues to listen for a connection that may be initiated by
          the remote BGP peer, and
        - changes its state to Active.

So if you’re in the connect state, and the TCP connection fails, but the DelayOpenTimer has not expired, (the DelayOpenTimer is used to delay the sending of an OPEN message on a connection), then the peer goes into Active state.

Similarly, when in an Active state, the BGP peer will only enter the Connect state in one situation:

 In response to a ConnectRetryTimer_Expires event (Event 9), the local system:
        - restarts the ConnectRetryTimer (with initial value),
        - initiates a TCP connection to the other BGP peer,
        - continues to listen for a TCP connection that may be initiated
          by a remote BGP peer, and
        - changes its state to Connect.

You can find out more info about these events in the RFC.

There are very specific events that have to take place to get an Idle peer into either Active or Connect state. You can see those in detail in the RFC as well.

As for the certification exams, it is highly unlikely that you would be asked any questions that go into such detail. Digging deeply into every subject is great when you want to learn for learning’s sake. However, for the certifications, I suggest that you go over the content on the site only to the depth in which the content itself has gone. You can make a list of the deeper questions you may have so you can revisit them later, but if you want to focus on the certifications, go only as deep as the lessons go…

I hope this has been helpful!

Laz

R1#
BGP: 192.168.12.2 active went from Idle to Active
BGP: 192.168.12.2 open active, local address 192.168.12.1
BGP: ses global 192.168.12.2 (0x4B43F3FC:0) act Adding topology IPv4 Unicast:base
BGP: ses global 192.168.12.2 (0x4B43F3FC:0) act Send OPEN
BGP: 192.168.12.2 active went from Active to OpenSent
BGP: 192.168.12.2 active sending OPEN, version 4, my as: 1, holdtime 180 seconds, ID C0A80C01
BGP: 192.168.12.2 active rcv message type 1, length (excl. header) 34
BGP: ses global 192.168.12.2 (0x4B43F3FC:0) act Receive OPEN
BGP: 192.168.12.2 active rcv OPEN, version 4, holdtime 180 seconds
BGP: 192.168.12.2 active rcv OPEN w/ OPTION parameter len: 24
BGP: 192.168.12.2 active rcvd OPEN w/ optional parameter type 2 (Capability) len 6
BGP: 192.168.12.2 active OPEN has CAPABILITY code: 1, length 4
BGP: 192.168.12.2 active OPEN has MP_EXT CAP for afi/safi: 1/1
BGP: 192.168.12.2 active rcvd OPEN w/ optional parameter type 2 (Capability) len 2
BGP: 192.168.12.2 active OPEN has CAPABILITY code: 128, length 0
BGP: 192.168.12.2 active OPEN has ROUTE-REFRESH capability(old) for all address-families
BGP: 192.168.12.2 active rcvd OPEN w/ optional parameter type 2 (Capability) len 2
BGP: 192.168.12.2 active OPEN has CAPABILITY code: 2, length 0
BGP: 192.168.12.2 active OPEN has ROUTE-REFRESH capability(new) for all address-families
BGP: 192.168.12.2 active rcvd OPEN w/ optional parameter type 2 (Capability) len 6
BGP: 192.168.12.2 active OPEN has CAPABILITY code: 65, length 4
BGP: 192.168.12.2 active OPEN has 4-byte ASN CAP for: 2
BGP: nbr global 192.168.12.2 neighbor does not have IPv4 MDT topology activated
BGP: 192.168.12.2 active rcvd OPEN w/ remote AS 2, 4-byte remote AS 2
BGP: 192.168.12.2 active went from OpenSent to OpenConfirm

The Open message must be sent on open sent state …Right…But why its showing that the active sending Open message ??

As i understood, The TCP 3 way handshake will be initiated in the idle state and also the connection retry timer will be set to 60,if 3way handshake gets successful then it will move to the opensent otherwise it will move to Active and it will reset the connection retry-time to 60 and will initiate another TCP 3 way handshake .and if the TCP 3 way handshake gets completed in second time then the sate will move to the open-sent and sent the open message and immediately will move the open confirm state ,in this state if the open message is contained the right information and if thats verified correctly by peer then the peer router will send the keep alive message and if the keep alive not received in open confirm state then the state will be go down to Active state . Kindly correct me here…

and once the state will reach on Open sent state state then there is no count of Connection retry timer…Right …!!

Hello Narad

The BGP Finite state machine (FSM), which is the defined set of processes that create and maintain peers, is defined in detail in RFC4271, which defines BGP in general. There, in the Finite State Machine section of the document, you will find a detailed description of the processes that are followed.

The FSM process is much more detailed and involved than how it is described in the lesson. For certification, you don’t need to know this level of detail. You will notice if you go through it, that there are some cases where the Open message is sent during either the Connect state or the Active state. Specifically, when in the active state, the process says:

If the local system receives a DelayOpenTimer_Expires event (Event 12), the local system:
        - sets the ConnectRetryTimer to zero,
        - stops and clears the DelayOpenTimer (set to zero),
        - completes the BGP initialization,
        - sends the OPEN message to its remote peer,
        - sets its hold timer to a large value, and
        - changes its state to OpenSent.

So you see, the OPEN message is sent during Active state.

For the details of the three-way handshake, the timers, and the sending of BGP messages, take a look at the FSM section of the RFC for a detailed description. If you have further questions after reviewing it, let us know!

I hope this has been helpful!

Laz

Hello,
I dont exactly understand what does it mean when state: Connect fails.
It is said “In case it fails, we continue to the Active state. If the ConnectRetry timer expires then we will remain in this state.”
It means that, TCP hand shake could not complete or ConnectRetry has expired?
And also “If the ConnectRetry timer expires then we will remain in this state.”
It means that it stays in connect state or Active?
I was trying to cath “connect” state by typing #show ip bgp sum during shutting interface or resetting BGP session but i couldn’t.
Connect state change so fast? And when exactly it goes to connect? When router receive SYN from other router?
It will be helpful if you can explain me that.
Best regards,
Marcin

Hello Marcin

It always helps to go back to the original official description of the BGP finite state machine (FSM). This can be found at RFC4271 in section 8.2.2.

As far as the connect state goes, the RFC details what happens in every case, whether a timer expires or another event occurs. Specifically, if the TCP connection fails, it states the following:

  If the TCP connection fails (Event 18), the local system checks
  the DelayOpenTimer.  If the DelayOpenTimer is running, the local
  system:

    - restarts the ConnectRetryTimer with the initial value,

    - stops the DelayOpenTimer and resets its value to zero,

    - continues to listen for a connection that may be initiated by
      the remote BGP peer, and

    - changes its state to Active.

Now the TCP failure is defined as Event 18. Event 18 is defined as:

  Event 18: TcpConnectionFails

     Definition: Event indicating that the local system has received
                 a TCP connection failure notice.

                 The remote BGP peer's TCP machine could have sent a
                 FIN.  The local peer would respond with a FIN-ACK.
                 Another possibility is that the local peer
                 indicated a timeout in the TCP connection and
                 downed the connection.

So to answer your question, a connect failure occurs when the TCP handshake does not complete, or when the BGP peer sends a FIN, or there is a timeout in the TCP connection. (This timeout is not to be confused with the timers that BGP has).

This is the only case when the Connect state will move to the Active state. All other cases described in the RFC move the FSM to either the Idle or the OpenSent state.

It actually stays in the Connect state. If you look at the RFC you’ll see that if this timer expires, the TCP connection is dropped, and a new TCP connection is actually initiated. So there is no change of state here.

A router will remain in the connect state as long as it takes to do a TCP handshake. A successful TCP handshake takes milliseconds, so you will most likely not catch it. However, you may be able to simulate a failure by creating a control plane policing scheme to block TCP port 179 on one of the peers. This will delay the handshake and will give you the opportunity to see the router in the Connect state.

So when does a router actually go into the connect state? Again, based on the RFC, when the TCP connection to the other BGP peer begins.

I hope this has been helpful!

Laz

Hello!

I’ve read that “Errors cause the state to revert back to Idle and the ConnectRetryTimer to be set to 60 seconds initially, doubling on subsequent failures”.

Is there any way to produce something like this in a lab? I’ve tried all sorts of different scenarios, yet I’ve never seen this timer in action, nor it doubling on subsequent failures.

For example, I’ve purposely mismatched the AS numbers specified in the neighbor statements which caused the connection to be torn down, the routers moved back into the Idle state and retried the connection for several attempts in a row. It took them just a few seconds, so when exactly does this ConnectRetryTimer come into play?

I’ve packet-captured the scenario I mentioned above (open it in a new tab to make it larger)

David

Hello David

The BGP finite state machine is the name given to the process by which BGP peerings are established or fail to be established. This is described in full in RFC4271 in the Finite State Machine section. The RFC further states details concerning the ConnectRetry timer as well.

Now it is difficult to recreate a situation in which this timer will expire. This is because the expiry occurs due to a missed update and not due to a misconfiguration. So if you were to mismatch ASes as you did, the failure of BGP is due to incorrect information exchanged and not an expiry of the timer itself.

What must happen in order for this timer to expire? Well, one particular scenario is this:

  1. A BGP peer enters the idle state and initiates a TCP connection to the BGP peer. It then changes its state to Connect.
  2. In the Connect state, the BGP peer is waiting for the TCP connection to be completed. If there is no response from the peer, then the ConnectRetryTimer may expire.

So it is a failure in the TCP exchange that will cause the timer to expire. How can you simulate this? You can create an access list on the remote peer to block TCP port 179 in an incoming direction, thus never allowing the TCP session to establish, thus not allowing the BGP peering to take place. Make sure to use the ACL in an incoming direction, and not outgoing.

If you try it out, let us know how get along!

I hope this has been helpful!

Laz

Hello Laz

Thank you for providing me with information on how to simulate this. I’ve configured R1 for BGP to peer with R2 and configured an ACL on R2 which denies any TCP traffic from R1 with the destination port of 179.


obrázok
obrázok

However, I’ve observed a simillar behaviour as with my example above. The configured BGP speaker attempted the connection every few seconds or so.

Kind regards,
David

Hi David,

This one has me scratching my head. The default Cisco IOS ConnectRetry timer should be 120 seconds:

I’ll look at the latest RFC (4271):

Idle state:
  Initially, the BGP peer FSM is in the Idle state.  Hereafter, the
  BGP peer FSM will be shortened to BGP FSM.

  In this state, BGP FSM refuses all incoming BGP connections for
  this peer.  No resources are allocated to the peer.  In response
  to a ManualStart event (Event 1) or an AutomaticStart event (Event
  3), the local system:

    - initializes all BGP resources for the peer connection,

    - sets ConnectRetryCounter to zero,

    - starts the ConnectRetryTimer with the initial value,

I tried a setup similar to yours. I first tried filtering with an access-list. I also tried using neighbor X update-source Loopback. with loopback interfaces that are not advertised.

Whatever I try, I never run into that 120 seconds delay :slight_smile: I tried this on IOS 15.x so I’m curious to see what older versions would do…

Rene

Hello Rene.

I appreciate your and Laz’s help here. We can leave it as it is, as it’s probably not that important, it’s just something I was curious about.

I’ve one more question that I want to ask.
obrázok
What’s up with these empty UPDATE messages? They are sent each time a neighborship is successfuly established.

The first one makes perfect sense since that is a network that I am advertising into BGP but what’s up with the second empty UPDATE message?

This occurs every time a neighbor adjacency is established, regardless of whether any networks are actually being advertised. Both routers send an empty UPDATE message. I’d assume that it’s some sort of KEEPALIVE, but KEEPALIVE messages are literally sent before it during the OpenConfirm state. Any ideas here?

Kind regards,
David

Hello David

It’s always worth investigating these things as they help us to dig deeper into the inner workings of BGP and the “why” concerning the way the protocol has been designed. Sometimes we encounter to such situations where it is difficult to interpret how they operate, and sometimes it just happens to be the way that a particular vendor implements the protocol on their devices.

Now concerning your other question, this is what is known as an End-of-RIB marker. In RFC 4724, which describes the Graceful Restart Mechanism for BGP, this is further explained like so:

An UPDATE message with no reachable Network Layer Reachability
Information (NLRI) and empty withdrawn NLRI is specified as the End-
of-RIB marker that can be used by a BGP speaker to indicate to its
peer the completion of the initial routing update after the session
is established…

Although the End-of-RIB marker is specified for the purpose of BGP
graceful restart, it is noted that the generation of such a marker
upon completion of the initial update would be useful for routing
convergence in general, and thus the practice is recommended.

Also, note that multiple BGP messages can be grouped together within a single TCP segment rather than being sent separately. In the Wireshark output that you shared, we see that the End-of-RIB marker is actually sent as a separate Update message. Whereas the first update message has a non-zero value for the path attribute length, the send update is indeed an End-of-RIB marker since both values are set to 0. Does that make sense?

I hope this has been helpful!

Laz

Hello, everyone!

The most confusing thing to me that I never wrapped my head around are the first 3 BGP neighbor states (well the first 2, the 3rd one is optional).

Idle
So when we configure BGP and specify a start event (a neighbor command, for example) the router sets the state for that neighbor to Idle. Here it will find a matching route, initialize all the necessary resources and timers, start listening for TCP port 179 connections and send a SYN message and then move on to the Connect state, is this correct?

Connect
In this state, the routers are trying to establish the TCP connection using the 3-way handshake. Once this process is successfull, an OPEN message is sent and the router (whoever is the first to do this) moves into the OpenSent state, and so does the other router.

What confuses is me is all the additional… hassle around it :smiley:. If an error occurs during the adjacency formation which causes it to move back to Idle, the ConnectRetryTimer is set to 60 seconds and doubles on subsequent failures?

Active
Then, if the TCP connection in the Connect state fails and the ConnectRetryTimer depletes, a new TCP connection is attempted, the timer is reset and the adjacency state is moved to Active?

And then when the ConnectRetryTimer depletes itself during the Active state… we move back to the Connect state and reset the ConnectRetryTimer? :smiley:

Isn’t this just redundant? It feels like the two states are just playing ping pong with eachother. If the ConnectRetryTimer depletes itself in Connect, the state is moved to Active and vice-versa. This is what confuses me. The ConnectRetryTimer and all this state moving/timing behind it.

Not to mention that I was never successful in labbing these (as indicated by the posts above) :smiley:

Can someone please shed some light onto this?

Thank you.
David

Hello David

Yes that is correct. The Idle state is essentially a preparation state, preparing all that’s necessary to start communication. As soon as the SYN message is sent or received, it moves to the Connect state.

The Connect state can be seen as the TCP three-way handshake itself. Once it is successful, it moves on to the OpenSent state. Now, what if there is a failure? In that case, the ConnectRetryTimer is indeed set to 60 seconds and it doubles on subsequent failures. Why? This is a mechanism to prevent constant, rapid attempts to establish a connection, which could consume significant resources. It is a kind of dampening method to avoid flapping.

BGP is not designed to converge quickly like IGPs. Its stability is much more important than its speed of convergence, because flapping BGP routes can have devastating effects on the network, and on the Internet as a whole.

If the TCP connection fails in the Connect state and the ConnectRetryTimer expires, the router attempts a new TCP connection, resets the timer, and transitions to the Active state.

When the ConnectRetryTimer expires in the Active state, the router transitions back to the Connect state and resets the timer. This is not redundant but rather a way to continuously attempt to establish a connection until successful, with a delay between attempts to conserve resources.

The confusion might arise from the fact that the Active state is optional and often skipped in modern implementations. In the past, the Active state was used to initiate a new connection when the initial attempt failed, but nowadays, routers often move directly from Connect to OpenSent state, skipping the Active state altogether.

In a lab environment, you might not see these states due to the speed of modern networks and devices. The transition between states usually happens too fast to be observed.

I hope this has been helpful!

Laz