TCP Window Size Scaling

Hi Rene,

You have mentioned " Above you can see that in the SYN,ACK message that the raspberry pi wants to use a window size of 29200. My computer wants to use a window size of 4194304 which is irrelevant now since we are sending data to the raspberry pi."
But in the Wireshark, win = 65535 and ws = 128. Hence window size would be 8,388,480 which is double of your mentioned value.

Hi Rosna,

You are correct, not sure how I came up with 4194304 but it has been fixed.

Thanks for letting me know!

Rene

Hello Rene,
I’ve got a question regarding this statement:

“the raspberry pi is unable to receive any more data at this moment and the TCP transmission will be paused for awhile while the receive buffer is processed.”

* This “awhile”, how long exactly will the sender wait before it calls it timeout?
* Is the timeout value something dependent of the device OS/NIC ? As there is no TTL value specified in the TCP header?

Thanks!

Hello Ray,

This all depends on the OS and its applications. Here’s an example for Windows 2000/XP/2003/Vista/7:

https://support.microsoft.com/nl-nl/help/170359/how-to-modify-the-tcp-ip-maximum-retransmission-time-out

TCP starts a retransmission timer when each outbound segment is handed down to IP. If no acknowledgment has been received for the data in a given segment before the timer expires, the segment is retransmitted, up to the TcpMaxDataRetransmissions value. The default value for this parameter is 5.

The retransmission timer is initialized to three seconds when a TCP connection is established. However, it is adjusted on the fly to match the characteristics of the connection by using Smoothed Round Trip Time (SRTT) calculations as described in RFC793. The timer for a given segment is doubled after each retransmission of that segment. By using this algorithm, TCP tunes itself to the normal delay of a connection. TCP connections that are made over high-delay links take much longer to time out than those that are made over low-delay links.

By default, after the retransmission timer hits 240 seconds, it uses that value for retransmission of any segment that has to be retransmitted. This can cause long delays for a client to time-out on a slow link.

You can see these TCP settings on Windows with this command:

netsh interface tcp show global
Querying active state...

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State          : enabled
Chimney Offload State               : disabled
Receive Window Auto-Tuning Level    : normal
Add-On Congestion Control Provider  : default
ECN Capability                      : disabled
RFC 1323 Timestamps                 : disabled
Initial RTO                         : 3000
Receive Segment Coalescing State    : enabled
Non Sack Rtt Resiliency             : disabled
Max SYN Retransmissions             : 2
Fast Open                           : enabled
Fast Open Fallback                  : enabled
Pacing Profile                      : off

A TCP connection can stay up indefinitely, unless you use some keepalives. Windows uses a keepalive of 2 hours for ACKs:

I hope this helps!

Rene

1 Like

Nice explanation.

I’ve read these TCP and TCP Window Size Scaling lessons carefully.

Sometines I have to tune the MTU subinterface value to get to the BGP established state. I only know it is due to a MTU mismatch with the remote peer but I don’t know how it really works.

Could you explain it in more detail ?

Hello Juan

Just like any other protocol communicating on the network, BGP requires the appropriate MTU sizes to be set in order for communication to occur successfully. If you have to tune the MTU value to get BGP to work then it seems that BGP is sending IP packets larger than the interface MTU that have the DF set to 1, which means to not fragment. By adjusting the (IP or interface MTU) of the subinterface, you are essentially adjusting the allowable MTUs such that the IP MTU will be small enough to fit into the interface MTU. It also depends on what other protocols you are running such as QinQ, tunneling or encryption that may add overhead to a packet or frame making it larger than the allowable MTU.

For more about MTU sizes, take a look at this lesson:

I hope this has been helpful!

Laz

Hello Deepak

The segment size will not change over the course of the transmission. For most transmission media, TCP segment sizes are usually (but not always!) around 1500 bytes. This lesson and its example deals only with the change in size of the window. It is the window size that grows exponentially (when using slow start). So in the example I gave, the next window size should be 14000 bytes. This means that we would have the following set of segments sent:

Segment 1: SEQa = 7001, Window Size = 14000, , segment size = 1500 Bytes
Segment 2: SEQa = 8501, Window Size = 14000, , size = 1500 Bytes
Segment 3: SEQa = 10001, Window Size = 14000, , size = 1500 Bytes
Segment 4: SEQa = 11501, Window Size = 14000, , size = 1500 Bytes
Segment 5: SEQa = 13001, Window Size = 14000, , size = 1500 Bytes
Segment 6: SEQa = 14501, Window Size = 14000, , size = 1500 Bytes
Segment 7: SEQa = 16001, Window Size = 14000, , size = 1500 Bytes
Segment 8: SEQa = 17501, Window Size = 14000, , size = 1500 Bytes
Segment 9: SEQa = 19001, Window Size = 14000, , size = 1500 Bytes
Segment 10: SEQa = 20501, Window Size = 14000, , size = 1500 Bytes
Segment 11: SEQa = 21001, Window Size = 14000, , size = 500 Bytes

So the window size doubles, the segment size remains the same (except for the last one which is 500 bytes) and now 11 segments are sent within that window. The window will double again after every successful acknowledgement of the receiver until some data is dropped. If this happens, the window size goes down to one segment again, that is 1500 bytes. It will grow exponentially again until the window is at half of what it was when the original congestion occurred. Window sizes will then grow linearly.

I hope that has been helpful!

Laz

1 Like

Hi, getting confused with Scaling here. On Rene’s capture with Raspberry
1# On initial SYN, PC’s Windows Size advertised is 8192 and Scaling is 256. How are those being calculated? Are they OS dependent?
2# Same question for Rasp [29200/ WS= 64] on SYN/ACK packet. On packet itself scaling is 64 but calc window still remains 29200. How does the underlying algorithm would work as WS’d be multiplied for calculated window size?

Moreover, after handshake is completed, how come ACK turns to be 29?
image
image

I am taking about Rene’s other capture that is on comment section.

Hi Deep,

The window size depends on the OS. Windows uses a window size of 8192 bytes by default. Here’s a Microsoft document that shows some of the default values and how they can be changed in the registry:

https://technet.microsoft.com/library/bb878127

The raspberry shows a window size of 29200 bytes. The highest value of the window size without scaling is 65535 bytes. You can ignore the scaling factor here since 29200 is less than 65535, we don’t need scaling for now.

I just checked the capture file again. The ACK you mention is packet #5. The ACK has a value of 29 since the packet #4 has a length of 28 bytes.

Hope this helps!

Rene

Hi Deepak,

I’ll create a capture for then when I write something on WRED. RED is a technique that randomly drops TCP traffic in order to slow down traffic, trying to prevent congestion. It’s a perfect way to see TCP and how it behaves with drops.

Rene

Hello Networklessons.com team

I would like to make a question regarding the topic. You said in the article that if a packet is dropped then the sender will send initially only one segment and after this will wait for the acknowledgement from the receiver for that one segment. So, the receiver sends a window size before the sender sends that one segment of let’s say 1460 bytes? Does the receiver know the MTU of the sender so that it can adjust the window size? Thank you in advance!

Best Regards
Markos

Hello Markos

First of all, the MTU is not involved in the process of windowing. The segment sizes are set based on the MTUs that are configured on the interfaces and the devices that are communicating.

So if there is congestion on the network, and a segment is lost, the receiver will respond by sending an acknowledgement that has a smaller window size, not necessarily a window size of one segment, but a window size much smaller than the current one. Once that is done, the segments will begin sending from the last successfully received acknowledgement. The window size will slowly increase once again.

I hope this has been helpful!

Laz

Hi Rene,

How do you configure window size in routers?

Thank you,

Shawn

Hello Shawn

It’s important to keep in mind where precisely window size scaling takes place. It is a mechanisms that is implemented using the TCP protocol, and is adjusted by the receiver of a TCP session. This means that if a router is transmitting information between two hosts that are currently in a TCP session, it is unable to interfere with any of the TCP parameters. Only the two devices involved in that particular TCP session are able to do so. The only way that a router could adjust the window size is if it itself is the destination of a TCP session.

It is also important to keep in mind that a router is primarily a layer 3 device and is involved in mechanisms that take place within the realm of the IP protocol. It rarely deals with layer four except in specialized services such as NAT and others.

I hope this has been helpful!

Laz

sir can u explain what is PDU

I got a question. Does delay in the RTT cause TCP slow start?

Hello Harshit

PDU is short for Protocol Data Unit. It is a generic term used to refer to a piece of data at various layers of the OSI model. For example, at the Transport Layer, if we’re using TCP, the PDU is called a Segment. If we’re using UDP, it is called a Datagram. At the Network Layer, it is called a Packet. At the Datalink Layer, it is called a Frame. We can use the term PDU to refer to these units in general. Here’s an example of the OSI model that includes the names of the PDUs at the various layers.

image

I hope this has been helpful!

Laz

1 Like

Hello Carlo

TCP slow start is caused when segments are dropped. A long round trip delay time (RTT) alone will not induce packet drops and will thus not directly cause TCP slow start. Now if the RTT is so long that TCP timeouts are reached and segments are considered lost or dropped, then yes, it can result in TCP slow start, but only because segments are considered dropped.

I hope this has been helpful!

Laz

1 Like

Thanks it has been helpully

1 Like