tcp.txt 4.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
  1. TCP protocol
  2. ============
  3. Last updated: 9 February 2008
  4. Contents
  5. ========
  6. - Congestion control
  7. - How the new TCP output machine [nyi] works
  8. Congestion control
  9. ==================
  10. The following variables are used in the tcp_sock for congestion control:
  11. snd_cwnd The size of the congestion window
  12. snd_ssthresh Slow start threshold. We are in slow start if
  13. snd_cwnd is less than this.
  14. snd_cwnd_cnt A counter used to slow down the rate of increase
  15. once we exceed slow start threshold.
  16. snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to.
  17. snd_cwnd_stamp Timestamp for when congestion window last validated.
  18. snd_cwnd_used Used as a highwater mark for how much of the
  19. congestion window is in use. It is used to adjust
  20. snd_cwnd down when the link is limited by the
  21. application rather than the network.
  22. As of 2.6.13, Linux supports pluggable congestion control algorithms.
  23. A congestion control mechanism can be registered through functions in
  24. tcp_cong.c. The functions used by the congestion control mechanism are
  25. registered via passing a tcp_congestion_ops struct to
  26. tcp_register_congestion_control. As a minimum name, ssthresh,
  27. cong_avoid must be valid.
  28. Private data for a congestion control mechanism is stored in tp->ca_priv.
  29. tcp_ca(tp) returns a pointer to this space. This is preallocated space - it
  30. is important to check the size of your private data will fit this space, or
  31. alternatively space could be allocated elsewhere and a pointer to it could
  32. be stored here.
  33. There are three kinds of congestion control algorithms currently: The
  34. simplest ones are derived from TCP reno (highspeed, scalable) and just
  35. provide an alternative the congestion window calculation. More complex
  36. ones like BIC try to look at other events to provide better
  37. heuristics. There are also round trip time based algorithms like
  38. Vegas and Westwood+.
  39. Good TCP congestion control is a complex problem because the algorithm
  40. needs to maintain fairness and performance. Please review current
  41. research and RFC's before developing new modules.
  42. The method that is used to determine which congestion control mechanism is
  43. determined by the setting of the sysctl net.ipv4.tcp_congestion_control.
  44. The default congestion control will be the last one registered (LIFO);
  45. so if you built everything as modules, the default will be reno. If you
  46. build with the defaults from Kconfig, then CUBIC will be builtin (not a
  47. module) and it will end up the default.
  48. If you really want a particular default value then you will need
  49. to set it with the sysctl. If you use a sysctl, the module will be autoloaded
  50. if needed and you will get the expected protocol. If you ask for an
  51. unknown congestion method, then the sysctl attempt will fail.
  52. If you remove a tcp congestion control module, then you will get the next
  53. available one. Since reno cannot be built as a module, and cannot be
  54. deleted, it will always be available.
  55. How the new TCP output machine [nyi] works.
  56. ===========================================
  57. Data is kept on a single queue. The skb->users flag tells us if the frame is
  58. one that has been queued already. To add a frame we throw it on the end. Ack
  59. walks down the list from the start.
  60. We keep a set of control flags
  61. sk->tcp_pend_event
  62. TCP_PEND_ACK Ack needed
  63. TCP_ACK_NOW Needed now
  64. TCP_WINDOW Window update check
  65. TCP_WINZERO Zero probing
  66. sk->transmit_queue The transmission frame begin
  67. sk->transmit_new First new frame pointer
  68. sk->transmit_end Where to add frames
  69. sk->tcp_last_tx_ack Last ack seen
  70. sk->tcp_dup_ack Dup ack count for fast retransmit
  71. Frames are queued for output by tcp_write. We do our best to send the frames
  72. off immediately if possible, but otherwise queue and compute the body
  73. checksum in the copy.
  74. When a write is done we try to clear any pending events and piggy back them.
  75. If the window is full we queue full sized frames. On the first timeout in
  76. zero window we split this.
  77. On a timer we walk the retransmit list to send any retransmits, update the
  78. backoff timers etc. A change of route table stamp causes a change of header
  79. and recompute. We add any new tcp level headers and refinish the checksum
  80. before sending.