分享

Welcom on docum.org

 sven_ 2014-04-16


Dear reader, I'm not updating these pages anymore. If you have tc or ip related questions, you can post them on the LARTC mailing list.



Intro

On the LARTC mailing list, there was a long discussion about how a packet is handled by the kernel. Finally, there was a post by Leonardo Balliache that I copied onto this page. I hope this helps people to better understand how it all works.

I added extra info aboute the IMQ device, I hope I didn't make any mistakes. All info/updates/corrections are welcome.

Kernel Packet Traveling Diagram

                            Network
                    -----------+-----------
                               |
                  +--------------------------+
          +-------+-------+        +---------+---------+
          |    IPCHAINS   |        |      IPTABLES     |
          |     INPUT     |        |     PREROUTING    |
          +-------+-------+        | +-------+-------+ |
                  |                | |   conntrack   | |
                  |                | +-------+-------+ |
                  |                | |    mangle     | | <- MARK WRITE  
                  |                | +-------+-------+ |
                  |                | |      IMQ      | |
                  |                | +-------+-------+ |
                  |                | |      nat      | | <- DEST REWRITE
                  |                | +-------+-------+ |     DNAT or REDIRECT or DE-MASQUERADE
                  |                +---------+---------+
                  +------------+-------------+
                               |
                       +-------+-------+
                       |      QOS      |
                       |    INGRESS    |
                       +-------+-------+
                               |
         packet is for +-------+-------+ packet is for
          this machine |     INPUT     | another address
        +--------------+    ROUTING    +--------------+
        |              |    + PDBB     |              |
        |              +---------------+              |
+-------+-------+                                     |
|   IPTABLES    |                                     |
|     INPUT     |                                     |
| +-----+-----+ |                                     |
| |   mangle  | |                                     |
| +-----+-----+ |                                     |
| |   filter  | |                                     |
| +-----+-----+ |                                     |
+-------+-------+                                     |
        |                               +---------------------------+
+-------+-------+                       |                           |
|     Local     |               +-------+-------+           +-------+-------+
|    Process    |               |    IPCHAINS   |           |    IPTABLES   |
+-------+-------+               |    FORWARD    |           |    FORWARD    |
        |                       +-------+-------+           | +-----+-----+ |
+-------+-------+                       |                   | |  mangle   | | <- MARK WRITE
|    OUTPUT     |                       |                   | +-----+-----+ |
|    ROUTING    |                       |                   | |  filter   | |
+-------+-------+                       |                   | +-----+-----+ |
        |                               |                   +-------+-------+
+-------+-------+                       |                           |
|    IPTABLES   |                       +---------------------------+
|     OUTPUT    |                                     |
| +-----------+ |                                     |
| | conntrack | |                                     |
| +-----+-----+ |                                     |
| |   mangle  | | <- MARK WRITE                       |
| +-----+-----+ |                                     |
| |    nat    | | <-DEST REWRITE                      |
| +-----+-----+ |     DNAT or REDIRECT                |
| |   filter  | |                                     |
| +-----+-----+ |                                     |
+-------+-------+                                     |
        |                                             |
        +----------------------+----------------------+
                               |
                  +------------+------------+
                  |                         |
          +-------+-------+       +---------+---------+
          |    IPCHAINS   |       |      IPTABLES     |
          |     OUTPUT    |       |    POSTROUTING    |
          +-------+-------        | +-------+-------+ |
                  |               | |    mangle     | | <- MARK WRITE  
                  |               | +-------+-------+ |
                  |               | |      nat      | | <- SOURCE REWRITE
                  |               | +-------+-------+ |      SNAT or MASQUERADE
                  |               | |      IMQ      | |
                  |               | +-------+-------+ |
                  |               +---------+---------+
                  +------------+------------+
                               |
                        +------+------+
                        |     QOS     |
                        |    EGRESS   |
                        +------+------+
                               |
                    -----------+-----------
                            Network
  • Name of firewall chain (in bold)
  • Controlled by iptables/ipchains (in blue)
  • Controlled by ip/tc (in red)

My remarks on the diagram

  • Output routing : the local process selects a source address and a route. This route is attached to the packet and used later.
  • Postrouting : there is also rerouting possible if netfilter changes some parts of the packets like address, tos, ... .
  • RPDB : routing policy database, controlled by ip. That's also the place where the kernel does source validation and nexthop decision.
  • IMQ : Packets put in the imq device travel also thru the "EGRESS" part of the diagram so you can use htb/cbq to control the packets in the imq device.
  • ipchains : Yes, there is some ipchains code in kernel 2.4. If you load the ipchains module, you can't use iptables anymore. You can even load the ipfwadm module if you want ipfwadm support. So it's iptables, or ipchains, or ipfwadm, but no combination is possible.
  • mangle : since kernel 2.4.18, you have a mangle table in all 5 netfilter hooks.
  • IMQ in input comes before nat so IMQ does not know the real ip address. Ingress comes after nat, so ingress knows the real ip address.

Leonardo notes

  • The input routing determines local/forward.
  • ip rule (routing policy database RPDB) is input routing, more correctly, part of the input routing.
  • The output routing is performed from "higher layer".
  • nexthop and output device are determined both from the input and the output routing.
  • The forwarding process is called at input routing by functions from specific places in the code. It executes after input routing and does not perform nexthop/outdev selection. It's the process of receiving and sending the same packet, but in the context of all these hooks the code that sends ICMP redirects (demanded from input routing), decrements the IP TTL, performs dumb NAT and calls the filter chain. This code is used only for forwarded packets.
  • Sometimes the word "Forwarding" with "big F", is used for referencing both, the routing and forwarding process.

Updates

I remove conntrack from POSTROUTING. More info on http://iptables-tutorial./iptables-tutorial.html#STATEMACHINE. See last part of section 4 :

"All connection tracking is handled in the PREROUTING chain, except locally generated packets which are handled in the OUTPUT chain. What this means is that iptables will do all recalculation of states and so on within the PREROUTING chain. If we send the initial packet in a stream, the state gets set to NEW within the OUTPUT chain, and when we receive a return packet, the state gets changed in the PREROUTING chain to ESTABLISHED, and so on. If the first packet is not originated by ourself, the NEW state is set within the PREROUTING chain of course. So, all state changes and calculations are done within the PREROUTING and OUTPUT chains of the nat table."


I received this email :

Since I've recently failed in a few iproute2 experiments, I have the following comment on the packet travelling guide: It is incomplete in respect to locally generated packets.

The traversal guide states that routing actually happens before the packet enters the netfilter OUTPUT queue. However, this is not all that happens in current kernels. If the packet is somehow modified while traversing the output queue (for example, by putting a fwmark on it), netfilter recognizes that the packet needs to be routed again and does so. So, there is possibly another 'OUTPUT ROUTING' rectangle after the netfilter OUTPUT chain.

However, if I'm not mistaken, there are some problems with that (as per my recent posting to the lartc- and netdev-list, it seems that the source address is chosen before the netfilter OUTPUT is traversed, and subsequent the subsequently chosen other route's src attribute no longer affects the source address of the socket/packet).

Now, I also might be totally wrong, but so far nobody has been able to point out what exactly I'm misunderstanding...

More info

http://www./ftp/pub/doc/packet-journey-2.4.html


http://www./Joseph.Mack/HOWTO/LVS-HOWTO-19.html#ss19.21

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多