The Internetwork Layer and the Nimrod Routing Architecture
Overview of Nimrod
Nimrod
is a project which aims, in part, to produce a next-generation
routing architecture for the Internet; but also, more generally, to try and
produce a basic design for routing in a single global-scale communication
substrate, a design which will prove sufficiently flexible and powerful to
serve into a future as yet unforseeable.
Nimrod does this through the conjunction of two powerful basic mechanisms:
- distribution of maps, as opposed to distribution of routing tables; and
- selection of routes by clients of the network, not by the switches in the
network.
(This latter approach is sometimes called source routing, but
this term
can be a bit misleading, since in Nimrod the route does not have to be
chosen by the actual source, but can be the responsibility of an agent
working on the source's behalf. The terms unitary routing or
explicit routing
have therefore been used to describe this approach. The important
thing to realize is that the path is not selected in a fully distributed,
hop-by-hop manner, in which each switch has an equal role to play in
selecting the path.)
The actual operation is fairly simple, in principle. Maps of the network's
actual connectivity (maps which will usually include high-level
abstractions for large parts of that connectivity, in the same way road
maps of an area may not show all the roads, just the 'important' ones) are
made available to all the entities which need to select paths. Those
entities use these maps to compute paths, and those paths are passed to the
actual switches, along with the data, as directions on how to forward the
data.
The rest of this section discusses how the Nimrod routing and addressing
architecture interacts with the rest of the internetwork layer, and what
requirements it has upon the internetwork layer protocol's packet format.
The Nimrod Subsystems of the Internetwork Layer
As mentioned elsewhere, Nimrod springs, in part, from a design vision which
sees the entire internetwork layer, even though it is distributed across
all the hosts and routers of the internetwork as a single system; in such a
vision, one starts to think about what subsystems are needed, and what the
interactions among them should look like. (See
this
for more information.)
The Nimrod routing architecture is simply a number of the subsystems of
this larger system, the internetwork layer. It is not intended to be a
purely standalone set of subsystems, but rather, to work together in close
concert with the other subsystems of the internetwork layer (resource
allocation, security, charging, etc.) to provide the internetwork layer
service model.
One reason that Nimrod is not simply a monolithic subsystem is that some of
the interaction with the other subsystems of the internetwork layer, for
instance the resource allocation subsystem, is much clearer and easier to
manage if the routing is broken up into several subsystems, with the
interaction between them open.
It is important to realize that Nimrod was initially broken up into
separate subsystems for purely internal reasons. It so happens that,
considered as a separate problem, the fundamental boundary lines for
dividing routing up into subsystems are the same boundaries that make
interaction with other subsystems cleaner; this provides added evidence
that these boundaries are in fact the right ones.
The subsystems which comprise the functionality covered by Nimrod are
- Routing information distribution (in the case of Nimrod, topology map
distribution, along with the attributes [policy, QOS, etc.] of the topology
elements),
- Route selection (strictly speaking, not part of the Nimrod system
per se, but a local matter, although they are necessary), and
- User traffic handling.
The first can be defined fairly well without reference to other subsystems,
but the second and third (route selection and user traffic handling) are
necessarily more involved. For instance, route selection might involve
finding out which links have the resources available to handle some
required level of service. For user traffic handling, if a particular
application needs a resource reservation, getting that resource reservation
to the routers is as much a part of getting the routers ready as making
sure they have the correct routing information, so here too, routing is
tied in with other subsystems.
It is possible to talk about the relationship between the Nimrod
subsystems, and the other functional subsystems of the internetwork layer,
but until the service model of the internetwork layer is more clearly
visible, along with the functional boundaries within that layer, such a
discussion is necessarily rather nebulous.
Specific Interaction Issues
Here is an incomplete list of the things that Nimrod would like to see from
the internetwork layer as a whole:
- A unified definition of flows in the internetwork layer, and a unified
way of identifying, through a separate flow-id field, which packets belong
to a given flow.
- A unified mechanism (potentially distributed) for installing state about
flows (including multicast flows) in routers.
- A method for getting information about whether a given resource allocation
request has failed along a given path; this might be part of the unified flow
setup mechanism.
- An interface to (potentially distributed) mechanism for maintaining the
membership in a multicast group.
- Support for multiple interfaces; that is, multi-homing. Nimrod does this
by decoupling transport level entity (TLE) identification from
interface identification (called locators in Nimrod).
For example, a packet with any valid destination locator
should be accepted by a TCP module, if the destination TLE identifier
(TLEI) is the one assigned to that entity.
- Support for multiple locators per network interface. This is
needed for a number of reasons, among them to allow for less painful
transitions in the locator abstraction hierarchy as the topology changes.
- Support for multiple TLEI's per TLE. This would definitely include both
multiple multicast TLEI's (i.e. TLEI's that refer to a group of TLE's), and
at least one unicast TLEI (the need for multiple unicast TLEI's per TLE is
not obvious).
- Support for distinction between a multicast group as a named entity, and
a multicast flow which may not reach all the members.
- A distributed, replicated, user name translation system (perhaps using
the existing Domain Name System - DNS) that maps such user names into (TLEI,
locator0, ... locatorN) bindings.
General Principles for Packet Formats
As a general rule, the design philosophy of Nimrod is 'maximize the
lifetime (and flexibility) of the architecture'. Design tradeoffs (i.e.
optimizations) that will adversely affect the flexibility, adaptability and
lifetime of the design are not not necessarily wise choices; they may cost
more than they save. Such optimizations might be the correct choices in a
stand-alone system, where the replacement costs are relatively small; in
the global communication network, the replacement costs are very much
higher.
Providing the Nimrod functionality requires the carrying of certain
information in the packets. The design principle noted above has a number
of corollaries in specifying the fields to contain that information.
First, the design should be 'simple and straightforward', which means that
various functions should be handled by completely separate mechanisms, and
fields in the packets. It may seem that an opportunity exists to save space
by overloading two functions onto one mechanism or field, but general
experience is that, over time, this attempt at optimization costs more, by
restricting flexibility and adaptability.
Second, fixed field lengths should be specified to be somewhat larger than
can conceivably be used; the history of system architecture is replete with
examples (processor address size being the most notorious) where fields
became too short over the lifetime of the system. Below, for each item, an
indication is given as what the smallest reasonable 'adequate' lengths are,
but this is more of a 'critical floor' than a recommendation. A
'recommended' length is also given, which is the length which corresponds
to the application of this principle, but note that these are not upper
bounds; modulo efficiency concerns, fields cannot be 'too long', only 'too
short'!
It is important to note that this does not mean that implementations must
support the maximum value possible in a field of that size. In practise,
system-wide administrative limits would be placed on the maximum values
which must be supported. Then, as the need arises, the administrative limit
can be increased. This allows an easy, and completely interoperable (with
no special mechanisms) path to upgrade the capability of the network. If
the maximum supported value of a field needs to be increased from M to N,
an announcement is made that this is coming; during the interim period, the
system continues to operate with M, but new implementations are deployed;
while this is happening, interoperation is automatic, with no transition
mechanisms of any kind needed. When things are 'ready' (i.e. the
proportion of old equipment is small enough), use of the larger value
commences.
Packet Format Fields
In considering the packet format, one can distinguish between the
host-router part of the packet's path, and the router-router part. It is
not absolutely necessary that they be the same (although it is simpler),
and if they are different, a format that is good for one may not be best
suited to the another. In addition, Nimrod has three forwarding modes
(flows, datagram, and source-routed packets), and some modes use fields
that are not used by other modes.
Bearing these points in mind, what Nimrod would like to see in the
internetworking packet is:
- Source and destination TLE identification. There are several
possibilities here, such as:
- 'Shortish', fixed length fields which appear in
each packet, in the internetwork header, which contain globally unique,
topologically insensitive identifiers for TLE's. For these 48 bits is
adequate, but at least 64 bits is recommended.
- Some shorter field which appears in all packets which contains a value
which is not globally unique, but only unique in mapping tables on each end,
tables which map from the small value to a globally unique value, such as
a DNS name. For these 8 bits is adequate, but at least 16 bits is recommended.
- Some overall design which
does not include any TLE identification in the packet at all, but
transfers it at the start of a communication, and from then on infer it.
This alternative would have to have some other means of telling which
TLE a given packet is for, if there are several TLE's at a
given destination. Some coordination on allocation of flow-ids, or
higher level port numbers, etc, might do this.
Any of these will do, but the exact choice of which is an question that
includes factors outside the routing, so a specific recommendation cannot
be made here.
- Flow identification. There are two basic approaches here, depending on
whether flows are aggregated (in intermediate switches) or not. (It
should be emphasized at this point that it is not yet known whether
flow aggregation will be needed. The only reason to do it is to control
the growth of state in intermediate routers, but there is no definite case
made that either this growth will be unmanageable, or that aggregating
flows will be feasible.)
For the non-aggregated case, a single flow-id field will suffice.
This must not use one of the two previous TLE fields, as in
datagram mode the flow-id will be over-written during transit of the network.
It is possible to use non-globally
unique flow-ids, (which would allow a shorter length to this field),
although this would mean that collisions would result, and have to be
dealt with. For a non-globally unique flow-id, 24 bits would be adequate,
32 are recommended.
For the aggregated case, three broad classes of mechanism are possible.
- The packet contains a sequence of flow-ids. Whenever aggregation
or deaggregation occurs, the next one along the list is used. This takes the
most space, but is otherwise the least work for the routers.
- The packet contains a stack of flow-ids, with the current one on the top.
On aggregation, push a new one on; on de-aggregation, take one off. This
takes more work, but less space in the packet. Encapsulating packets to do
aggregation does basically this, but entire headers are being stacked, not
just flow-ids. The clever way to do this flow-id stacking is to find out
from flow-setup how deep the stack will get, and allocate the necessary
space in the packet when it's created. That way, all that
ever has to happen is adding a new flow-id, or removing one; it is never
necessary to modify the header to make room for more flow-ids.
- The packet contains only the 'base' flow-id, and the current flow-id.
On aggregation, the latter is updated. The tricky part comes one
de-aggregation; the correct value must be replaced. To do this, there has
to be state in the router at the end of the aggregated flow, which says
what the de-aggregated flow for each base flow is. The downside
here is obvious: there is no need for individual flow state for each of
the constituent flows in all the routers along the path of that
aggregated, flow, except for the last one.
Other than encapsulation, which has significant inefficiency in space
overhead fairly quickly, after just a few layers of aggregation, there
appears to be no way to do it with just one flow-id in the packet header.
Even if there is no modification of the packets, but the aggregation is
done by mapping some number of 'base' flow-id's to a single aggregated flow
in the routers along the path of the aggregated flow, the table that does
the mapping is still going to have to have a number of entries directly
proportional to the number of base flows going through the switch.
- A looping packet detector. This is any mechanism that will detect a
packet which is 'stuck' in the network; a timeout value in packets,
together with a check in routers, is an example. If this is a hop-count, it
has to be more than 8 bits; 12 bits would be adequate, and 16 is
recommended. (which also makes it easy to update). This is not to say that
networks with diameters larger than 256 are good, or that one should design
such nets, but limiting the maximum path through the network to 256 hops is
likely to provide a very painful limit, later on, just as making 'infinity'
16 in RIP did, eventually. When that ceiling is reached, it's going to be
very painful, and there won't be an easy fix. Note that path lengths of over
30 hops are already been seen in the Internet.
- Optional source and destination locators. These are
structured, variable length
items which are topologically sensitive identifiers for the
place in the network from which the traffic originates or to which the
traffic is destined. The locator will probably contain internal
separators which divide up the fields, so that a particular field can be
enlarged without creating a great deal of upheaval. An adequate value
for maximum length supported would be up to 32 bytes per locator, and
longer would be even better; up to 256 bytes per locator is recommended.
- Perhaps (paired with the above), an optional pointer into the locators.
This is forwarding state
i.e. state in the packet which records something
about its progress across the network)
which is used in the datagram forwarding mode to help ensure that the
packet does not loop. It can also improve the forwarding processing
efficiency. It is thus not absolutely essential, but is very desirable
from a real-world engineering view point. It needs to be large enough to
identify locations in either locator; e.g. if locators can be up to 256
bytes, it would need to be 9 bits.
- An optional source route. This is used to support the source routed
packet forwarding mode. Although not designed in detail yet,
two possible approaches can be discussed.
- In one, used with semi-strict source routing
(in which a contiguous
series of entities is named, albeit perhaps at a high layer of
abstraction), the syntax will likely look much like source routes in PIP;
in Nimrod they would be a sequence of Nimrod entity identifiers (i.e.
locator elements, not complete locators), along with clues as to the
context in which each identifier is to be interpreted (e.g. up, down,
across, etc). Since those identifiers themselves are variable length
(although probably most will be two bytes or less, otherwise the routing
overhead inside the named object would be excessive), and the hop count
above contemplates the possibility of paths of over 256 hops, it would
seem that these might possibly some day exceed 512 bytes, if a lengthy
path was specified in terms of the actual physical assets used. An
adequate length would be 512 bytes; the recommended length would be 2^16
bytes (although this length would probably not be supported in practise;
rather, the field length would allow it).
- In the other, used with classical loose
source routes, the source
consists of a number of locators. It is not yet clear if this mode will
be supported. If so, the header would need to be able to store a
sequence of locators (as described above). Space might be saved by
not repeating locator prefixes that match that of the previous locator
in the sequence; Nimrod will probably allow use of such "locally useful"
locators. It is hard to determine what an adequate length would be for
this case; the recommended length would be 2^16 bytes (again, with the
previous caveat).
- Perhaps (paired with the above), an optional pointer into the source
route. This is also optional forwarding state. It needs to be large
enough to identify locations anywhere in the source route; e.g. if the
source router can be up to 1024 bytes, it would need to be 10 bits.
- An internetwork header length. This is mentioned since the above fields
could easily exceed 256 bytes, if they are to all be carried in the
internetwork header (see comments below as to where to carry all this
information), the header length field needs to be more than 8 bits;
12 bits would be adequate, and 16 bits is recommended. The approach of
putting some of the data items above into an interior header, to limit
the size of the basic internetworking header, does not really seem optimal,
as this data is for use by the intermediate routers, and it needs to be
easily accessible.
- Authentication of some sort is needed. See the recent IAB document
which was produced as a result of the IAB architecture retreat on security,
section 4, and especially section 4.3. There is currently no set way of
doing "denial/theft of service" in Nimrod, but this topic is well explored
that document; Nimrod would use whatever mechanism(s) seem appropriate to
those knowledgeable in this area.
- A version number. Future forwarding mechanisms might need other
information (i.e. fields) in the packet header; use a version number would
allow it to be modified to contain what's needed. (This would not
necessarily be information that is visible to the hosts, so this does
not necessarily mean that the hosts would need to know about this new
format.) 4 bits is adequate; it's not clear if a larger value needs to be
recommended.
Field Requirements and Addition Methods
It is possible to use Nimrod in a mode where needed information/fields are
added by the first-hop router. It is thus useful to ask 'which of the fields
must be present in the host-router header, and which could be added by the
router?' The only ones which are absolutely necessary in all packets are the
TLEI's (provided that some means is available to the routers to map them into
locators).
As to the others, if the user wishes to use flows, and wants to guarantee
that their packets are assigned to the correct flow, the flow-id field is
needed. If the user wishes efficient use of the datagram mode, it's
probably necessary to include the locators in the packet sent to the
router. If the user wishes to specify the route for the packets, and does
not wish to set up a flow, they need to include the source route.
How would additional information/fields be added to the packet, if
the packet is emitted from the host in incomplete form? (By this is meant the
simple question of how, mechanically, not the more complex issue of where
any needed information comes from. It is assumed here that the packet does
not contain empty fields for this information, but rather a format which
does not include those fields at all.)
This question is complex, since all the IPng candidates (and in fact, any
reasonable inter-networking protocol) are extensible protocols; those
extension mechanisms could be used. Also, it would possible to carry some
of the required information as user data in the internetworking packet,
with the original user's data encapsulated further inside. Finally, a
private inter-router packet format could be defined.
It's not clear which choice is best, but it is possible to discuss which
fields the Nimrod routers need access to, and how often; less used ones
could be placed in harder-to-get-to locations (such as in an encapsulated
header). The fields to which the routers need access on every hop are the
flow-id and the looping packet detector. The locator/pointer fields are
only needed at intervals (in what Nimrod's datagram forwarding mode calls
active routers), as is the source route (the latter on entry to every
object which is named in the source route).
Depending on how access control is done, and which forwarding mode is used,
the TLEI's and/or locators might be examined for access control purposes,
wherever that function is performed.
This is not a complete exploration of the topic, but should give a rough
idea of the issues.
The Future of Routing
It is worth noting that although this section is specifically about the
requirements of the Nimrod routing architecture, there are reasons to
believe that any routing architecture for a large, ubiquitous global
network will have many of the same basic fundamental principles as the
Nimrod architecture, and the requirements that these generate.
While contemporary routing technologies do not yet have the characteristics
and capabilities that generate these requirements, they also do not seem to
be completely suited to routing in the next-generation Internet. As routing
technology moves towards what is needed for the next generation Internet,
the underlying fundamental laws and principles of routing will almost
inevitably drive the design, and hence the requirements, toward things
which look like the material presented here.
Therefore, even if Nimrod is not the routing architecture of the
next-generation Internet, the basic routing architecture of that Internet
will have requirements that, while differing in detail, will almost
inevitably be similar to these. This makes the requirements of Nimrod worth
looking at, even if the specific Nimrod design is destined to be nothing
more than a historical curiousity.
Bibliography
[Nimrod] Isidro Castineyra, Noel Chiappa, Martha Steenstrup, "The Nimrod
Routing Architecture", RFC 1992, August 1996.
Back to JNC's home page