I kept meaning to send a message about this since I'm a partial source
of the counter-opinion about the manageability of per-flow state growth, but I
had previously kept putting off sinking the time.
I think it is true that the growth in the amount of flow state (for any system using per-flow state, be it RSVP or anything else) in a given switch *will* be bounded by the growth of the bandwidth through the switch, and not by the size of the network as a whole. (This assumes that per-flow bandwidth does not decrease over time, and no, I don't consider most WWW requests to be "flows", or at any rate flows which I would bother setting up state for, so I think this assumption is pretty good.)
So, the amount of memory needed per switch for flow-state will not grow as O(N^2), or anything like that (which is the canard you used to hear). It will only grow as *local* conditions change - which is good, it means you have control over that growth, so that if you have a big fast switch, you need more memory, but the two will grow together.
(It's also the case that memory is getting larger at a higher rate
than speeds are getting faster, since memory capacity goes as the square of
the device feature size, whereas device speed is generally linear in feature
size - another nice feature.)
In realizing all that, though, I missed something very important - which is that what's important is the *per flow cost* of the state - and although the total *amount* of per-flow memory needed goes up linearly with the amount of traffic, the total *cost* of that memory does not, but at a *higher* rate, since faster switches generally need higher-speed state memory, and memory cost is generally somewhat proportional to speed.
In other words, the unit (i.e. per-flow) cost of per-flow memory is
higher in high-speed switches than in low-speed switches. This is a negative
economy of scale on high-speed switches, which is the inverse of what you
want.
The implications of this for flow aggregation are pretty obvious. I had previously opposed flow aggregation on the grounds that i) it was extra complexity, and ii) it didn't buy the users any additional capability. That analysis led me to reject flow aggregation. (Believe it or not, I don't at all like "kitchen sink" designs, although I expect many people who look at Nimrod don't believe that! :-)
However, that simplistic analysis hits the complex hard rock of reality, which is that you have to have aggregation for economic reasons. (And a tip of the hatly hat to the people at BBN who disagreed with me about putting aggregation in Nimrod! You guys were right, I was wrong. So there! :-)
So, in a large network, one using a hierarchy of switch speeds to
construct the mesh (as seems the most workable real-world design), you cannot
in fact support per-flow state all the way through the network, but need to
aggregate flows.
Note that you don't have to have a *lot* of aggregation as you go on up - just enough to keep the unit cost of per-flow memory declining at an appropriate rate.
E.g. (using made-up numbers) if the faster memory for your higher-
speed switch is 5 times as expensive, and your faster switch is otherwise 3
times more cost-effective per unit traffic flow, you only need to aggregate by
a factor of 15 to match i) the decrease in per-flow unit costs for memory for
per-flow state, and ii) the overall decrease in per-flow unit costs elsewhere
in the switch.
To sum all this up, the issue with growth of state, as routers get
larger (and therefore faster), is not simply the *amount* of memory, but also
the *unit cost* of that memory. If the amount needed is proportional to the
amount of traffic, but the speed (a.k.a. unit cost :-) is larger, you have a
negative economy of scale in your routers.
There's an interesting corollary (one I don't have time to explore here) to this notion that you *have* to have aggregation of flows in the network. My initial analysis was all in terms of unicast flows, not multicast. However, the same argument holds for multicast flows.
If we have many small, topologically-widespread, multicast groups, we
are going to have a problem with multicast state growth (for things like
routing information *as well as* RSVP) in the core routers. So, we are going
to have to figure out how to aggregate multicast flows (i.e. groups, since all
groups have per-group state in current multicast routing) before multicast
will fully scale.
Back to JNC's home page