Will The Real "End-End Principle" Please Stand Up?




Introduction

This note was prompted by Yet One More case of having someone claim that (to paraphrase) "the end-end principle says you shouldn't keep state in the network". This never fails to cause smoke to come out of my ears, for several reasons.

First, the "real" end-end principle, as stated in the original classic Saltzer/Reed/Clark paper, "End-To-End Arguments in System Design", actually says something quite different from what most people seem to think it does. Its name is much taken in vain - sadly, because it makes an important point, one that ought not to be lost behind an alternative masquerading as the real thing.

Second, neither:

says that about state in the network.

(I often feel like one of the critic animals in "Animal Farm", where every time someone says something unpalatable, the chorus pipes up with the de-rationalizing refrain "four legs good, two legs bad" - except that in the Internet community, it's either "soft state good, hard state bad", or, as in this case, "end-end good, state-in-network bad".)


The "Real" End-End Principle

The "real" end-end principle, as stated in the original Salzter/Reed/Clark paper is, quoting from the paper:
functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level. Examples discussed in the paper include .. duplicate message suppression .. and delivery acknowledgement.
or:
The function in question can completely and correctly be implemented only with the knowledge and help of the application standing at the end points of the communication system. Therefore, providing that questioned function as a feature of the communication system itself is not possible.
Thus, it is clear that the end-end principle (as originally given) has nothing to say about the placement of state in the network - except insofar as that may be done in an attempt to avoid having state in the endpoints.

The So-Called "End-End Principle"

Many people seem to think of 'end-end' as an alternative name for a principle that is related to a principle called 'fate-sharing'; this confusion may have started with RFC-1958 ("Architectural Principles of the Internet", by Carpenter), which says of the end-end principle (the real one):
This principle has important consequences if we require applications to survive partial network failures. An end-to-end protocol design should not rely on the maintenance of state (i.e. information about the state of the end-to-end communication) inside the network. Such state should be maintained only in the endpoints, in such a way that the state can only be destroyed when the endpoint itself breaks (known as fate-sharing). An immediate consequence of this is that datagrams are better than classical virtual circuits. The network's job is to transmit datagrams as efficiently and flexibly as possible. Everything else should be done at the fringes.
The fate-sharing principle was first enunciated by Dave Clark in the classic "Design Philosophy of the DARPA Internet Protocols". He says:
state information which describes the on-going conversation must be protected. Specific examples of state information would be the number of packets transmitted, the number of packets acknowledged .. If the lower layers of the architecture lose this information, they will not be able to tell if data has been lost, and the application layer will have to cope with the loss of synchrony. .. In some network architectures, this state is stored in the intermediate packet switching nodes of the network. .. The alternative, which this architecture chose, is to take this information and gather it at the endpoint of the net, at the entity which is utilizing the service of the network. I call this approach to reliability "fate-sharing." The fate-sharing model suggests that it is acceptable to lose the state information associated with an entity if, at the same time, the entity itself is lost.
It's clear that the "end-to-end protocol design" discussed in RFC-1958 is in fact referring to the fate-sharing concept discussed in the Clark paper, not the (possibly more general) principle of the original Salzter/Reed/Clark End-To-End paper.

(It's also worth nothing that although the referenced paper which describes fate-sharing dates to 1995, the fate-sharing principle, with that exact name, was understood and followed many years before. I recall it being explained and discussed in a presentation that Dave Clark prepared for DARPA, explaining the basic concepts behind TCP, sometime in the late 70's, when I first started working at LCS with Dave.)


State in the Network

The fate-sharing principle (and its reformulation in RFC-1958) does not say (in effect) "do not have any state in the network", it rather says, as stated in the Clark paper (emphasis mine):
the intermediate packet switching nodes, or gateways, must not have any essential state information about on-going connections. Instead, they are stateless packet switches, a class of network design sometimes called a "datagram" network.
The state that the fate-sharing principle is concerned about is state which is critical to the end-end communication, such as information on what data has been fully acknowledged from the far end. It is that state that must be co-located with the application, so that "they all go together when they go".

The fate-sharing principle actually says nothing about state in the network, where that state is related to the operation of the network itself (such as routing tables - which are obviously necessary state) - although obviously it's a somewhat related topic.

Network State

Looking past this "no critical end-end state in the network" idea, all real network designs, including the one bringing these bits to you, have state in the network core (albeit not end-end state). So does every other packet network design since the hot-potato algorithm design of Baran.

Moreover, that state is state such that, if it is not properly maintained, that will cause packets not to flow. In other words, it is also critical (in the sense of "it has to be right, or the communication doesn't work") state. RIP, OSPF, IS-IS, BGP - they all maintain state, and often include extremely complex and powerful mechanisms to ensure that that state is perfectly synchronized - because if it isn't synchronized perfectly, and up-to-date, things will go to hell in a handbasket.

RFC-1958 goes on to say:

To perform its services, the network maintains some state information: routes, QoS guarantees that it makes, session information where that is used in header compression, compression histories for data compression, and the like. This state must be self-healing; adaptive procedures or protocols must exist to derive and maintain that state, and change it when the topology or activity of the network changes. The volume of this state must be minimized, and the loss of the state must not result in more than a temporary denial of service given that connectivity exists. Manually configured state must be kept to an absolute minimum.
all of which is sound - but does ignore the need for some communication between the network and the end-system about that state.

E.g. if a network component failure means that a previously guaranteed QoS allocation can no longer be met, the network can't "fix that" on its own - it has to inform the application, which is the only entity that knows whether it can accept a lower rate of service, or must terminate - a perfect example of the real end-end principle in action.

Conclusion

To close, a useful criticism of any design has to say something more than just "it has state in the core" or even "it has critical state in the core". All real networks already have that (and that should be obvious to anyone who really understands them). A valid and potentially useful criticism has to think about what that state buys you, about the cost and complexity of maintaining that state, about what happens when something goes wrong, etc, etc, etc.




Back to JNC's home page