$Id$ Here is my proposal for v4/v6 addressing and gateway selection on the network that we are developing with the OSI funds. I give a functional overview, and then I describe an IP network protocol for discovery. In this proposal, I assume the use of Hazy Sighted Link State (HSLS) routing. I anticipate that our HSLS link-state updates (LSUs) will carry opaque extension fields in length-type-value tuples. We will reserve an extension field HSLS_LEXT_GAD for Gateway ADvertisements (GADs). I think that GADs can also be distributed using OSPF Opaque LSAs. It might be that we will test gateway discovery that way. All routers route and forward both IPv6 and IPv4. They use IPv6 transport for the routing protocol messages. At bootstrap time, each router picks an arbitrary globally-routable IPv6 number for its rooftop interface. Each router picks its number from the same prefix as the others---f00d, say. The prefix is a static configuration item. A router also chooses an arbitrary /28 IPv4 subnet in 10/8. It assigns the first number in the /28 to its rooftop interface with netmask 0xff000000. It assigns one /29 and a /30 network to its remaining interfaces. In another document, we will define an algorithm that will detect when a router duplicates another's selection of f00d/16 or 10/8 subnet, and force one of the routers to make a new selection. Also at bootstrap time, a router tries to get an IPv4 number on its ethernets, refusing any IP in 10/8. Depending whether the router receives a routable IPv4 number, a private IPv4 number, or no v4 number on an ethernet, it sets different link flags for that ethernet. The flags are 6to4, nat, and natr. If a DHCP server assigns a routable IPv4 number to a router's ethernet, then that router sets all three of the "6to4," "natr," and "nat" properties for that ethernet to the IP number that was assigned. If a DHCP server assigns a private IPv4 number, then only the ethernet's "nat" property is set. I will call a link with either a "nat" or "6to4" property, a "gateway link," and the rooftop router that the link belongs to, "a gateway router." Gateways advertising a link with "6to4" property are "6to4 gateways." Similarly, there are "nat gateways" and "natr gateways." HSLS link-state updates will carry an interface's "nat/natr/6to4" properties. Routers wanting v6 connectivity will select the 6to4 gateway with the best path metric, breaking ties by choosing the gateway whose 6to4 number is smallest (0.0.0.128 is smaller than 128.0.0.0). When a router selects a 6to4 gateway, its role changes to "gateway supplicant" or simply "supplicant." A v6 supplicant asks its chosen 6to4 gateway for a /61 network. If any unused /61s remain in the gateway's 6to4 subnet, the gateway answers the supplicant with one. The supplicant assigns /64s from the /61 to each of its interfaces, including its rooftop interface---however, it does not shed its f00d address. Then it creates a IPv6-in-IPv6 tunnel to the 6to4 gateway, and points the default route through the tunnel. HSLS will propagate routes to the supplicants interfaces. What you end up with is IPv6 tunnel-encapsulated traffic between the supplicant and the 6to4 gateway, and unencapsulated traffic in the other direction. This breaks strict layering, however, it has the advantages that a host can be addressed using the same IPv6 number on both the rooftop network and on the Internet, and the added encapsulation overhead of a two-way tunnel is avoided. Routers wanting v4 connectivity choose a nat gateway. They choose it by the same criteria as a 6to4 gateway. A nat supplicant will not assign different numbers to its ethernets than the 10/8 numbers already assigned. It will tunnel to its gateway, but its gateway will send unencapsulated packets back. Routers that want for their v4 hosts to be reachable from the Internet will choose a natr gateway. A natr supplicant will not change the numbers on its ethernets from the 10/8 numbers. A supplicant negotiates with its natr gateway for select 10/8 numbers to be NAT'd one-to-one to routable numbers. A router will select a new gateway with some hysteresis: it will not choose a new gateway unless the gateway has a "much better" metric for a "short time," or else a slightly better metric for a long time. A candidate for a new choice of gateway, G, is assigned a score according to the following formula, where delta-metric is the average improvement in path metric that G offers over the current gateway, improvement-duration is the amount of time that G's path metric has offered an improvement over the current gateway's metric, duration-coeff and delta-coeff are non-negative configuration variables set by the operator: improvement-duration * duration-coeff + delta-metric * delta-coeff A candidate gateway's score has to be at least new-gateway-threshold before a router installs a tunnel and creates a default route through it, unless the existing gateway becomes unreachable. A router always tries to tunnel to the candidate with the highest score, first. So that sessions that go through an old gateway are not needlessly terminated, a router will leave the tunnel to its old gateway open even after it has chosen a new gateway, and in the case of a 6to4 gateway, a router will leave the old IPv6 addresses on its interfaces. However, the default route will be changed to point through the new tunnel (this should not break existing sessions, since they will have host routes through the old tunnel), and new IPv6 aliases will be assigned (also should not break sessions, but I'm not as sure of it). After a router chooses a new gateway, the tunnel to the old gateway is not eligible to be shut down until (teardown-coefficient * tunnel-uptime) seconds after the last packet was sent through it, where tunnel-uptime is the amount of time that the router used the old gateway, and teardown-coefficient is a non-negative configuration variable set by the operator. In the future, there will be v4 and v6 link properties which indicate a gateway w/o NAT or 6to4. Ordinarily a gateway will speak OSPF or BGP on these interfaces, however, routers will still tunnel to the gateway. We will not let HSLS propagate a default route in its default configuration. We expect gateway selection and migration to get more complicated before it gets less complicated, so we will put the function in its own module, separate from HSLS, until we have given new ideas about gateway selection a good shake-out. Implementation Proposal -------------- -------- In an HSLS the GAD extension field carries the gateway-property flags: nat, natr, 6to4, v4, and v6. The extension field will probably be a length-type-value tuple. The length of the GAD value is 32 bits. Each bit indicates the presence (1) or absence (0) of a gateway property. I assign these flags to the bits: #define GAD_P_NAT 0x01 /* NAT to unroutable number, for example, * 192.168.1.1 */ #define GAD_P_NATR 0x02 /* NAT to routable numbers, for example, * 64.138/16. */ #define GAD_P_6TO4 0x04 /* an IPv6 subnet in 2002/16 is available */ /* XXX not fully defined yet */ #define GAD_P_V4 0x08 /* v4 routing */ #define GAD_P_V6 0x10 /* v6 routing */ After a router has chosen candidate gateways according to its criteria, it uses the Gateway Negotiation Protocol (GNP) to negotiate for the type and number of subnets/addresses it needs. The negotiation resembles DHCP address assignment: the supplicant sends a message requesting a certain number of subnets/addresses that meet certain criteria, for a certain amount of time. The gateway responds with a refusal, or else with an offer that meets the supplicant's request in whole or in part. Call this offer, a lease. The supplicant responds, indicating its acceptance or refusal of the offer, and then the gateway completes the negotation by sending a acknowledgement (indicating that tunneling may commence), or a refusal (indicating a lack of resources or a policy breach). Call the result of the negotiation, a "lease." A supplicant negotiates for a lease lasting between ten minutes and 24 hours. A supplicant starts trying to renew a lease at least five minutes before it is set to expire. There are short messages reserved for renewal. After a lease closes, a gateway may close down the associated tunnel. /* A router may request a lease with lifetime in * [GNP_LEASE_MIN, GNP_LEASE_MAX]. * * It starts trying to renew GNP_LEASE_RENEW milliseconds before the * lease expires. */ /* 10 minutes in milliseconds. */ #define GNP_LEASE_MIN (600 * 1000) /* 5 minutes in milliseconds. */ #define GNP_LEASE_RENEW (GNP_LEASE_MIN / 2) /* 24 hours in milliseconds. */ #define GNP_LEASE_MAX (24 * 3600 * 1000) GNP messages are IP packets with IP protocol number TBD. All GNP messages begin with a uint16_t, the message type. I define the message types here: /* A message type is a combination of a gateway role and * a message function. */ /* Gateway roles. */ #define GN_6TO4 0x0000 /* 6to4 gateway */ #define GN_NAT 0x0001 /* NAT gateway, unroutable numbers */ #define GN_NATR 0x0002 /* NAT gateway, routable numbers */ /* Message functions. */ /* SENDER FUNCTION */ #define GN_ACCEPT 0x0000 /* supplicant accept lease */ #define GN_ACK 0x0010 /* gateway indicate success */ #define GN_OFFER 0x0020 /* gateway offer lease */ #define GN_REFUSE 0x0030 /* gw/supp terminate transaction */ #define GN_RENEW 0x0040 /* supplicant renew lease */ #define GN_REQ 0x0050 /* supplicant request lease */ /* Message types. */ #define GN_ACCEPT_6TO4 (GN_ACCEPT | GN_6TO4) #define GN_ACCEPT_NAT (GN_ACCEPT | GN_NAT) #define GN_ACCEPT_NATR (GN_ACCEPT | GN_NATR) #define GN_ACK_6TO4 (GN_ACK | GN_6TO4) #define GN_ACK_NAT (GN_ACK | GN_NAT) #define GN_ACK_NATR (GN_ACK | GN_NATR) #define GN_OFFER_6TO4 (GN_OFFER | GN_6TO4) #define GN_OFFER_NAT (GN_OFFER | GN_NAT) #define GN_OFFER_NATR (GN_OFFER | GN_NATR) #define GN_REFUSE_6TO4 (GN_REFUSE | GN_6TO4) #define GN_REFUSE_NAT (GN_REFUSE | GN_NAT) #define GN_REFUSE_NATR (GN_REFUSE | GN_NATR) #define GN_RENEW_6TO4 (GN_RENEW | GN_6TO4) #define GN_RENEW_NAT (GN_RENEW | GN_NAT) #define GN_RENEW_NATR (GN_RENEW | GN_NATR) #define GN_REQ_6TO4 (GN_REQ | GN_6TO4) #define GN_REQ_NAT (GN_REQ | GN_NAT) #define GN_REQ_NATR (GN_REQ | GN_NATR) Every well-formed GNP message begins with a type code and a transaction number. A supplicant chooses a new transaction number before it initiates a new negotiation by sending a request. Network byte order is used for all multiple-byte GNP message fields. A REFUSE message is shortest. It contains a code that tells the reason that either a lease was denied or a transaction was cancelled. The transaction number is copied from the last message in the cancelled transaction. struct gn_msg_refuse { uint16_t mk_type; uint16_t mk_txno; uint32_t mk_reason; }; CANCEL messages are also shortest. They consist of a type code, a transaction number, and a lease handle. A supplicant chooses a new transaction number for every CANCEL message. A gateway answers with an ACK message containing the same transaction number. struct gn_msg_cancel { uint16_t mc_type; uint16_t mc_txno; uint32_t mc_handle; }; ACK and RENEW messages have the same format. A supplicant chooses a new transaction number for each RENEW message. A gateway copies the transaction number from the ACCEPT/CANCEL message it is acknowledging into the ACK. ma_handle identifies the lease to acknowledge or renew. For RENEWals, ma_leasetime indicates the duration for the new lease. For ACKs, ma_leasetime indicates the minimum of the OFFERed and the ACCEPTed lease time or else the minimum of the RENEWed lease time and the gateway's preferred lease time. struct gn_msg_ack { uint16_t ma_type; uint16_t ma_txno; uint32_t ma_leasetime; uint32_t ma_handle; }; The formats for request/offer messages follow. Here is a NAT request/offer/accept message: struct gn_msg_nat { uint16_t mn_type; /* GN_REQ_NAT, GN_OFFER_NAT, * GN_ACCEPT_NAT */ uint16_t mn_txno; uint32_t mn_leasetime; uint32_t mn_nsubnet; struct { in_addr_t network; uint8_t masklen; } mn_subnets[]; /* XXX C99-ism */ }; A supplicant sends a NAT REQUEST in this format. There are mn_nsubnet different subnets, for mn_nsubnet > 0. Subnet i, for 0 <= i < mn_subnet, is described by an IP number and a mask length, mn_subnets[i].network and mn_subnets[i].masklen. Each of the IPv4 subnets in mn_subnets is REQUESTed for mn_leasetime milliseconds, where GN_LEASE_MIN <= mn_leasetime <= GN_LEASE_MAX. A gateway either responds with an OFFER of at least one subnet, in the same format, or else it sends a REFUSE with a suitable reason code. [TBD define reasons for refusal.] A gateway copies subnets into an OFFER verbatim from the REQUEST. A supplicant copies subnets into an ACCEPT verbatim from the OFFER, after verifying them against its REQUEST. A NAT OFFER indicates that the gateway offers NAT for each of the mn_nsubnet subnets in mn_subnets. A NAT ACCEPT indicates which subnets the supplicant still desires to NAT. A NAT ACK indicates that the lease is granted, and v4-in-v6 tunneling to the gateway may commence. Here is a natr request/offer/accept message: struct gn_msg_natr { uint16_t mnr_type; /* GN_REQ_NATR, GN_OFFER_NATR, * GN_ACCEPT_NATR */ uint16_t mnr_txno; uint32_t mnr_leasetime; uint32_t mnr_nsubnet; struct { in_addr_t network; uint16_t masklen; uint16_t naddr; in_addr_t onetwork; uint16_t omasklen; } mnr_subnets[]; /* XXX C99-ism */ }; The messages' meanings are alike to the NAT case, but the NATR case adds fields naddr, onetwork, and omasklen to each subnet record. In a REQUEST, naddr tells how many consecutive addresses in the subnet, beginning with the first, should be translated one-to-one to routable IPv4 addresses. Also, in a REQUEST, a supplicant indicates its preference for a particular range of routable IPv4 addresses using onetwork and omasklen. A supplicant indicates no preference using omasklen == 0. A gateway's OFFER indicates the actual number and range of routable IPv4 addresses in naddr, onetwork, and omasklen. A supplicant copies one or more of the OFFERed subnets, verbatim, into its ACCEPT message, after verifying the subnets with its REQUEST. Here is the packet format for a 6to4 gateway negotiation: struct gn_msg_6to4 { uint16_t m6_type; /* GN_REQ_6TO4, GN_OFFER_6TO4, * GN_ACCEPT_6TO4 */ uint16_t m6_txno; uint32_t m6_leasetime; uint32_t m6_nsubnet; struct { in_addr_t addr; uint16_t subnet0; uint16_t nsubnet; } m6_subnets[]; /* XXX C99-ism */ }; The 6to4 negotiation works by analogy to the NATR negotiation. A supplicant indicates its preference for a minimum number of IPv6 subnets with the same 6to4 prefix using the nsubnet field. In the addr field, it indicates a preference for a particular 6to4 prefix, 2002:A.B:C.D, or else no preference (A.B.C.D = 0.0.0.0). If a supplicant indicates a prefix preference, it tells the first nsubnet consecutive 6to4 subnets using subnet0. For example, addr = 1.2.3.4, nsubnet = 5, subnet0 = 0x0fff requests the subnets 2002:0102:0304:fff through 2002:0102:0304:1003. A few words about REFUSE messages. A transaction may be cancelled at any time, either by the supplicant or by the gateway, using a REFUSE message. An ACK ends every transaction; a REFUSE cannot be used to renege on an ACKed transaction. If a gateway REFUSEs a RENEW, it does not thereby cancel the unRENEWed lease. A gateway answers a supplicant's request to CANCEL a lease that it does not hold with a REFUSE. TBD Define the API for finding out route metrics from the routing daemon. (Also, define the route metrics API for our HSLS daemon.) $Id$