TCP Window Size Scaling


(AZM U) #21

Would someone please help me understand the relation among sequence number, acknowledgement and window size? I am totally confused when I am putting all three things together. Thank you so much.


(Lazaros Agapides) #22

Hello Azm

Yes, these numbers can be confusing. Here is an attempt to clarify these parameters:

The first thing to keep in mind is that in any TCP communication, there are actually TWO sequence numbers and TWO acknowledgement numbers: those of each party in the exchange of data. For the sake of this example, and for the diagram below, let’s call these SNL and SNR for Sequence Number Left and Sequence Number Right for the left and right hosts. Similarly, the acknowledgement numbers will be called ANL and ANR.

Note, these abbreviations are my own and are not generally accepted. I am using them only for the purpose of this example.

Take a look at this diagram:

The current state of this diagram is when the three way handshake has already been completed and transmission of data has begun. The current window size is 10.

So, the left host begins transmitting and sends a frame where SNL is 1 and ANL is 1. The SNL and ANL have been determined after the procedure of the three way handshake. (To find out how these are initially determined, take a look at Rene’s lesson here: https://networklessons.com/cisco/ccna-routing-switching/introduction-to-tcp-and-udp/ )

Since the window size is 10, the left host will send 10 bytes (this can be sent in one or more segments) and the header of the segment will have an SNL of 1 and an ANL of 1. Once 10 bytes are sent (the window size) the left host will stop.

The right host will continue to receive data and will do nothing until 10 bytes have been received. Once they have been received, it will compose an acknowledgement segment with the following information:

SNR = 1 This has been determined after the three way handshake. Note this is independent of the SNL
ANR = SNL + window size = 11 This essentially is the number of the next expected byte

Once this acknowledgement segment is received by the left host, it prepares the next batch of bytes to be sent, specifically, 10 since the window size is 10. In the segment it sends, it puts the following values:

SNL = ANR = 11
ANL = SNR + 1 = 2

And the process continues.

I hope this has been helpful!

Laz


(Avinash C) #23

Hi Rene,

Can you please clarify the difference between Window Size Value and Calculated Window Size?

Also is the Scaling Factor value fixed i.e. 64?

Regards
Avinash


(Lazaros Agapides) #24

Hello Avinash

* Window size is the current size of the window in bytes.
* Scaling factor is a multiplier sent back along with the acknowledgement by the receiver that indicates to the sender that a new window size is requested.
* Calculated window size is the new window size that has been requested. This is determined like so: Window Size * Scaling factor = Calculated window size.

So, you would see on your wireshark output something like this:

window size value:593
{Calculated window size: 151808}
{Window size scaling factor: 256}

You can confirm the above by doing the math: 593*256 = 151808

I hope this has been helpful!

Laz


(AZM U) #25

Hello Laz,
Thanks for taking the time to explain it to me and I am feeling like I am almost there. I just need clarifications for a few more questions. I was always thinking that when one host is sending some data to another host, the sender will break that piece of data into smaller pieces(called segment in transport layer) and puts a tag(sequence number) on every single segment so the other end(receiver) can organize them in the right order by using the sequence numbers.
According to your explanation, it seems like what I knew was completely wrong because you are referring sequence number to window size(the amount of data is being sent). So what is the mechanism a receiver uses to track every single segment and organize them in the proper meaningful order?
One more thing is very confusing to me. In Rene’s tutorial, he is saying “H1 has setup a connection with H2 by using the 3 way handshake. We are sending 10 bytes of Data which means our “window size” is 10 bytes. The sequence number is 10”, but in your example even though when Left device is sending the first chunk of data of 10 bytes, still the sequence number is 1. I am not sure why. Would you please explain it to me? I thought the sequence number would be 10 since Left device is sending 10 bytes of data.
What field of a packet represents how much data that packet is carrying?

Thank you so much for your great help.

Best Regards,
Azm Uddin


(Lazaros Agapides) #26

Hello AZM

I can understand the confusion. Keep in mind that the window size, the sequence number and the number of segments sent are somewhat independent from each other. What do I mean?

Well, let’s say we have a window size of 21000 bytes. It is very unlikely that this will all be sent in one segment. It will definitely be split into several segments. The window size is “the number of bytes sent before an acknowledgement is required from the receiver.” These bytes can be sent in one or more segments. So, let’s take the following example:

Host A is sending a total of 100000 bytes (or 1 KB) of data to Host B. The window size is 7000 bytes. Let’s say the maximum segment size (which is affected by the Layer 2 and Layer 3 MTU configuration) is 1500 bytes. Host A will begin sending a series of segments with the following elements (SEQa is the sequence number sent by host A):

**Segment 1: SEQa = 1, Window Size = 7000, segment size = 1500 Bytes**
**Segment 2: SEQa = 1501, Window Size = 7000, , size = 1500 Bytes**
**Segment 3: SEQa = 3001, Window Size = 7000, , size = 1500 Bytes**
**Segment 4: SEQa = 4501, Window Size = 7000, , size = 1500 Bytes**
**Segment 5: SEQa = 6001, Window Size = 7000, , size = 1000 Bytes**

At this point, 7000 bytes have been sent. The TCP window has been exhausted. Host A stops sending and waits for a response. Assuming all went well, Host B send the following response:

**Acknowledgement from Host B to Host A: ACKb = 7001**

The ACKb number returned is the next expected byte, that is, byte 7001.

Once Host A successfully receives this acknowledgement, it continues with the next batch of segments:

**Segment 1: SEQa = 7001, Window Size = 7000, , segment size = 1500 Bytes**
**Segment 2: SEQa = 8501, Window Size = 7000, , size = 1500 Bytes**
**Segment 3: SEQa = 10001, Window Size = 7000, , size = 1500 Bytes**
**Segment 4: SEQa = 11501, Window Size = 7000, , size = 1500 Bytes**
**Segment 5: SEQa = 13001, Window Size = 7000, , size = 1000 Bytes**

Host B sends the following response

**Acknowledgement from Host B to Host A: ACKb = 14001**

… and so on…

So to specifically answer your question, yes sequence number is used to put the segments back in the proper order while window size is the number of bytes that must be sent before an acknowledgement must be received.

It just happens that in both my example and Rene’s example, the window size was very small, so only one segment was necessary to exhaust the window. That is why it seemed that the sequence number was directly related to the window size.

As for your second question:

One more thing is very confusing to me. In Rene’s tutorial, he is saying “H1 has setup a connection with H2 by using the 3 way handshake. We are sending 10 bytes of Data which means our “window size” is 10 bytes. The sequence number is 10”, but in your example even though when Left device is sending the first chunk of data of 10 bytes, still the sequence number is 1. I am not sure why. Would you please explain it to me?

When beginning a transfer, the SEQ number in the header refers to the first byte that is being sent in that specific segment. But you must also keep in mind that the first SEQ number is always generated randomly, so it can be anything between 0 and something over 4 billion (32 bits represent the SEQ number). It is possible that @ReneMolenaar is using a random number in that example and it just happened to be 10. This can be confusing, and I can ask Rene to look it over and see if it needs revising.

As for the field in the header that indicates the amount of data, there is no such field per se. However, this can be determined by examining the subsequent SEQ numbers in each received segment. Also, when the segments are received on the other end, at the end of the current window, the ACK number that is returned indicates the next expected byte, which in essence reveals the size of the last segment.

I hope this has been helpful!

Laz


(AZM U) #27

Hello Laz,
GREAT EXPLANATION. THAT IS SPECTACULAR.
I have one more little question. I will refer to the below example. When we look at the wireshark capture, does it show us the complete breakdown segment by segment/ packet by packet? My understanding is it does not show segment by segment/packet by packet. I just see host A is sending some data to host B with sequence number 1 in line number 1 and right after that line, host B sends an acknowledgement with 7001. It is not possible to send 7000 bytes in one segment/packet since the MTU is 1500 bytes. Is it? How does wireshark work to show information about communication?

Segment 1: SEQa = 1, Window Size = 7000, segment size = 1500 Bytes
Segment 2: SEQa = 1501, Window Size = 7000, , size = 1500 Bytes
Segment 3: SEQa = 3001, Window Size = 7000, , size = 1500 Bytes
Segment 4: SEQa = 4501, Window Size = 7000, , size = 1500 Bytes
Segment 5: SEQa = 6001, Window Size = 7000, , size = 1000 Bytes

Acknowledgement from Host B to Host A: ACKb = 7001

Thank you so much.

Azm


(Lazaros Agapides) #28

Hello Azm

When looking at a wireshark capture and you want to record a situation where you would have a window size that is much larger than a segment size, you have to generate traffic that will give you that result. Most idle traffic on your PC will generate segments that are very small. So sessions are opened that will essentially send only one segment. Window size scaling does not come into the picture at all in those cases.

I suggest you generate traffic by sending a large file either via SMB or FTP so that window sizes will increase and be significantly larger than the segment size. In this case, you will see something like this:

I copied a large file from a NAS drive I have (192.168.10.210) to my PC (192.168.10.70) and this is what wireshark captured. Notice that between each acknowledgement, there are seven segments each with a size of 1514 bytes including headers. If you take a look at the segments themselves, they each contain 1460 bytes of data. The Info column states that each one is a segment of a reassembled Protocol Data Unit (PDU is the generic term of the unit of data at each layer of the OSI model,). So the transport layer PDU is the whole “window-sized” chunk of data that is separated into segments.

Try it out and see what you get.

I hope this has been helpful!

Laz


(AZM U) #29

Hello Laz,
I have done some experiment where I was downloading some file from a site. Pretty much everything is clear to me so far. However, couple more questions have popped up. I am going to refer to the below screenshots for my questions.

<img

  1. If we look at the first highlighted line in yellow, it has two fields, window size value and calculated window size. What are the differences between them?

  2. As we know, window size will keep on going up until the receiver is unable to handle all the packets sent by the sender. How do two hosts let each other know what the window size will be for the next transmission( set of segments)?

  3. If we look at the second highlighted line. It has two fields related to window size(window size value=11 and calculated window size=2816) and the acknowledgement of 10441937. I am not sure which one refers to the actual window size. What would be the window size 11 or 2816? However, after the second highlighted line, the sender has sent one packet of 1460 bytes to the receiver 192.168.1.9. Right after that, the receiver sent an acknowledgement to the sender of 10443397. So, that makes sense because 10443397-10441937= 1460 which is the payload, but that 1460 bytes does not match window size(11) and calculated window size(2816) either. Would you please explain it to me?

  4. Let’s say in a communication, two hosts have agreed upon a window size 400 bytes. Then the sender sent four packets to the receiver successfully. Isn’t the total of those four packets supposed to be 400 bytes?

window size 400 determined.
sender receiver
100 bytes-------------------------------->
100 bytes-------------------------------->
100 bytes-------------------------------->
100 bytes-------------------------------->

total of four packets =400 bytes

then the receiver will send an acknowledgement of 401. Am I correct?

Thank you so much for your endless help.

Azm


(Lazaros Agapides) #30

Hello AZM!

I’ll try to answer your questions below:

Take a look at the third field below the two you indicated: Window size scaling factor: 256. If you’ll notice,

Calculated window size = Window size value * Window size scaling factor If you do the calculations, you’ll find that 256*28=7168. It is this Calculated window size that is the window size that is actually used in the transmission.

Now this goes beyond CCNA or even CCNP as far as I can remember, but it’s good to know. Take a look at the TCP header:

The TCP window size field is 16 bits in length, so the maximum window size is 2^16 = 65536. In modern networks, window sizes can be much larger than this. Within the optional data field, an additional field of 24 bits called windows scaling factor has been added as a multiplier to increase the maximum window size. This was added in a subsequent version of TCP (and is fully used today) and can be found in RFC 1323.

The window size is determined by the receiver. It is the acknowledgement that is sent back to the sender that requests changes in size to the window. Rene describes this well in his lesson for TCP Window Size Scaling.

Like I said for question 1, it is the Calculated Window Size that is the size of the window used. In order to see that the window size matches the number of bytes sent before an acknowledgement, you must also be sure that the captured segments have reached the destination successfully. You may be looking at a segment that had an error and was resent, so the window sizes don’t match up. Look through the captured data to find a clean window so to speak. Because I cannot see details of all the segments of your screenshot, I am unable to be clearer for this question. Sorry! :slight_smile:

Yes, you are exactly correct assuming that the maximum segment size allowed on the network is 100 bytes.

I hope this has been helpful!

Laz


(AZM U) #31

Hello Laz,
Once again SPECTACULAR. Thanks for your great help.

Azm


(sims) #32

Hi,he interface gets congested and packets of all TCP connections are dropped
But the interface utilization is not high .
Can you explain
Thanks


(Lazaros Agapides) #33

Hello sims

This could be many things. Here are some questions you should ask yourself:

  1. You say the interface gets congested. How do you know this? If utilization is not high then it is not congested. The TCP segments must then be dropped for other reasons.
  2. Are ALL TCP connections dropped? Are the dropped from the beginning or after a session has begun? If they are dropped from the beginning, then check Access Lists that may be blocking ports.
  3. If they are not dropped from the beginning, when are they dropped? After a certain event? What event could this be?

You need to be able to look at more information about the circumstances surrounding the problem, when it occurs and what kinds of sessions are being dropped.

I hope this gives you a little more guidance as to how to troubleshoot. Please share your results with us so we can continue to help you out in determining the problem.

I hope this has been helpful!

Laz


(rosna s) #34

Hi Rene,

You have mentioned " Above you can see that in the SYN,ACK message that the raspberry pi wants to use a window size of 29200. My computer wants to use a window size of 4194304 which is irrelevant now since we are sending data to the raspberry pi."
But in the Wireshark, win = 65535 and ws = 128. Hence window size would be 8,388,480 which is double of your mentioned value.


(Rene Molenaar) #35

Hi Rosna,

You are correct, not sure how I came up with 4194304 but it has been fixed.

Thanks for letting me know!

Rene


(Ray J) #36

Hello Rene,
I’ve got a question regarding this statement:

“the raspberry pi is unable to receive any more data at this moment and the TCP transmission will be paused for awhile while the receive buffer is processed.”

* This “awhile”, how long exactly will the sender wait before it calls it timeout?
* Is the timeout value something dependent of the device OS/NIC ? As there is no TTL value specified in the TCP header?

Thanks!


(Rene Molenaar) #37

Hello Ray,

This all depends on the OS and its applications. Here’s an example for Windows 2000/XP/2003/Vista/7:

https://support.microsoft.com/nl-nl/help/170359/how-to-modify-the-tcp-ip-maximum-retransmission-time-out

TCP starts a retransmission timer when each outbound segment is handed down to IP. If no acknowledgment has been received for the data in a given segment before the timer expires, the segment is retransmitted, up to the TcpMaxDataRetransmissions value. The default value for this parameter is 5.

The retransmission timer is initialized to three seconds when a TCP connection is established. However, it is adjusted on the fly to match the characteristics of the connection by using Smoothed Round Trip Time (SRTT) calculations as described in RFC793. The timer for a given segment is doubled after each retransmission of that segment. By using this algorithm, TCP tunes itself to the normal delay of a connection. TCP connections that are made over high-delay links take much longer to time out than those that are made over low-delay links.

By default, after the retransmission timer hits 240 seconds, it uses that value for retransmission of any segment that has to be retransmitted. This can cause long delays for a client to time-out on a slow link.

You can see these TCP settings on Windows with this command:

netsh interface tcp show global
Querying active state...

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State          : enabled
Chimney Offload State               : disabled
Receive Window Auto-Tuning Level    : normal
Add-On Congestion Control Provider  : default
ECN Capability                      : disabled
RFC 1323 Timestamps                 : disabled
Initial RTO                         : 3000
Receive Segment Coalescing State    : enabled
Non Sack Rtt Resiliency             : disabled
Max SYN Retransmissions             : 2
Fast Open                           : enabled
Fast Open Fallback                  : enabled
Pacing Profile                      : off

A TCP connection can stay up indefinitely, unless you use some keepalives. Windows uses a keepalive of 2 hours for ACKs:

I hope this helps!

Rene


(Juan C) #38

Nice explanation.

I’ve read these TCP and TCP Window Size Scaling lessons carefully.

Sometines I have to tune the MTU subinterface value to get to the BGP established state. I only know it is due to a MTU mismatch with the remote peer but I don’t know how it really works.

Could you explain it in more detail ?


(Lazaros Agapides) #39

Hello Juan

Just like any other protocol communicating on the network, BGP requires the appropriate MTU sizes to be set in order for communication to occur successfully. If you have to tune the MTU value to get BGP to work then it seems that BGP is sending IP packets larger than the interface MTU that have the DF set to 1, which means to not fragment. By adjusting the (IP or interface MTU) of the subinterface, you are essentially adjusting the allowable MTUs such that the IP MTU will be small enough to fit into the interface MTU. It also depends on what other protocols you are running such as QinQ, tunneling or encryption that may add overhead to a packet or frame making it larger than the allowable MTU.

For more about MTU sizes, take a look at this lesson:

I hope this has been helpful!

Laz


(Lazaros Agapides) #41

Hello Deepak

The segment size will not change over the course of the transmission. For most transmission media, TCP segment sizes are usually (but not always!) around 1500 bytes. This lesson and its example deals only with the change in size of the window. It is the window size that grows exponentially (when using slow start). So in the example I gave, the next window size should be 14000 bytes. This means that we would have the following set of segments sent:

Segment 1: SEQa = 7001, Window Size = 14000, , segment size = 1500 Bytes
Segment 2: SEQa = 8501, Window Size = 14000, , size = 1500 Bytes
Segment 3: SEQa = 10001, Window Size = 14000, , size = 1500 Bytes
Segment 4: SEQa = 11501, Window Size = 14000, , size = 1500 Bytes
Segment 5: SEQa = 13001, Window Size = 14000, , size = 1500 Bytes
Segment 6: SEQa = 14501, Window Size = 14000, , size = 1500 Bytes
Segment 7: SEQa = 16001, Window Size = 14000, , size = 1500 Bytes
Segment 8: SEQa = 17501, Window Size = 14000, , size = 1500 Bytes
Segment 9: SEQa = 19001, Window Size = 14000, , size = 1500 Bytes
Segment 10: SEQa = 20501, Window Size = 14000, , size = 1500 Bytes
Segment 11: SEQa = 21001, Window Size = 14000, , size = 500 Bytes

So the window size doubles, the segment size remains the same (except for the last one which is 500 bytes) and now 11 segments are sent within that window. The window will double again after every successful acknowledgement of the receiver until some data is dropped. If this happens, the window size goes down to one segment again, that is 1500 bytes. It will grow exponentially again until the window is at half of what it was when the original congestion occurred. Window sizes will then grow linearly.

I hope that has been helpful!

Laz