Posted on:

# Introduction #

Naming is and has always been a controversial and subjective topic. This is particularly true in the area of content-centric networking (CCN) [3]. What are the naming strategies and conventions used to assign meaning to content? How do we discover and learn names? What do the names reveal about the content to which point? Are names absolute or relative? Do they change over time?

I am discouraged by the apparent readiness by which researchers consider this to be a solved problem. I often hear people say things like "just name the data" or "just ask for [the data] with this name." Naming is simple, right? I'm not so sure. These simple answers completely ignore nearly ever one of the problems I just listed. In this post, I will attempt to make sense of the current state of naming as it pertains to CCN. I will discuss how we might be able to do better for CCN and related architectures in which named data is a first class citizen in a distributed system.

# What's in a Name? #

By and large, URIs are the standard way by which applications and users identify data and services (i.e., resources) in the Internet. As the name implies, a URI is something which identifies a resource. It can take two forms: URLs and URNs (links and names). A URL is an identifier in that it specifies a way by which to access a resource. In this case, that access strategy is encoded in a link. Conversely, a URN identifies a resource by referring to it by name in a particular namespace. The access mechanism is not explicitly delineated in a URN.

But URIs are not the only form of identifiers that we can use to represent data. Let's consider some alternative networking architectures and their approaches to naming. URIs are primarily used to bind meaningful names to content in both IPv4 and IPv6 with DNS as the resolution service that translates between these names and usable addresses. The GS1 [1] (Global Language of Business) uses random data and additional metadata to name products. SWE [2] (Sensor Web Enablement) uses unique identifiers based on random data to mark individual sensors. Many ICN architectures like CCN and NDN [4] use hierarchical URIs (URNs, really) to mark data. In these architectures, cryptographic hash digests are also used as static identifiers for data. MobilityFirst [5], another future Internet architecture, uses GUIDs and hash digests to mark data and entities in the network.

Collectively, there are many options from which to choose for identifiers. To help develop a taxonomy for these naming schemes and conventions, Bauer et al. compiled a catalog of naming, addressing, and discovery schemes useful or applicable to the IoT [6]. And given the expected scale by which data and devices will begin flooding the network, as well as the expectation and anticipation for multi-homed or mobile entities, this report is immensely useful when considering the naming problems that plague ICN. I recommend you read this paper if you can spare some cycles. The remainder of this post describes some of the issues identified in this report.

# What are the Problems? #

Naming is hard because of the difference (or lack thereof) between locations and an identifiers. This has important implications on mobility, name-to-entity lifetime, and even privacy. I'll describe these problems in more detail below.

## Mobility #

URNs are good for identifying data within a namespace while URLs are good for specifying how to get data. Web applications are drawn to URLs because they allow a browser to programmatically retrieve the resource in question. Unfortunately, URLs have the (sometimes nasty) side effect of binding a resource to a single location. For example, the URL

https://github.com/chris-wood


binds the content of my main page to the Github domain. Your browser will use DNS to resolve the Github name (within the com TLD) to a specific IP address and will then proceed to fetch the data in question from that host. This is perfectly fine if my profile information stayed put at the host (or behind the host). But what if the data was generated by something that was mobile? What if the resource in question was a picture served by someone's mobile device? (Ignore the fact that addressing a user's mobile device is a terrible idea for now.)

NDN and CCN adopt the idea of location-agnostic names. Specifically, content names are independent from where the data actually originated and thus the name of the request issued to get this content. Applications are assigned a unique part of the global namespace under which to publish their data, e.g., /twitter. These identifiers help mobile consumers since their requests are still forwarded on a reverse-path basis [7]. If a consumer needs to move it simply retransmits a lost interest that timed out, as shown below.

To deal with mobile producers, forwarder hints (locators) are provided to help the network route requests to the actual location of the producer [1]. Resolution services such as DNS (or NDNS [8]) contain these locators and distribute forwarding hints and cryptographic keys. One way to avoid these locator hints is for the network FIBs to continually update as data "anchors" (producers) move throughout the network. This, however, is problematic if many producers are mobile; routing signalling would consume far too much bandwidth and the time to convergence would likely be slow. See [9] for a comparison of how state-of-the-art routing protocols compare against one another with respect to routing traffic.

Alternatives to handle producer mobility are rendezvous services or points (RPs) [10]. RPs help consumer interests get to mobile producers in one of two ways: the RP can redirect interests from consumers to a mobile producer (chasing) or the RP can store data procured by the MP as a data depot (storing). Chasing can use hints provided by the RP (as described above) or traces to the MP in FIBs or PITs between the RP and MP. Regardless of what's done, the key point is that NDN and CCN names are insufficient in handling mobility. And that's a problem. While the projects are well intentioned in that they try to separate naming from addressing, practical problems such as mobility mash them together again.

MobilityFirst, a different future Internet architecture, effectively separates naming from addressing by using GUIDs to identify content. This idea is based on LISP (Location/Identifier Separation Protocol) [11]. LISP splits the device identity (Endpoint Identifier) and location (Routing Locator) into two namespaces. This separation enables, among other things, endpoint mobility under a common location or routing locator, which allows routing to more easily scale to handle mobile nodes and for mobility to be better handled at the edge.

MobilityFirst [5] supports two types of routing protocols: routing based on (1) GUIDs and (2) network addresses. And even though these are treated separately, MobilityFirst provides a mapping between the two. A Name Assignment Service assigns public keys used to build GUIDs based on these keys and a unique sequence number. GUIDs are mapped to all locations in the network where the actual content can reside. A Global Name Resolution Service is used to map GUIDs to the nearest possible location, and the network supports routing on the result. Thus, in a way, this hybrid routing scheme resembles that of NDN with hints.

## Issues of Scope with Temporal and Immutable Bindings #

Let's move away from the routing and mobility issues that are rooted in naming and turn our attention to the semantics of names themselves. A name is a binding between one representation of something to some resource or data. More often than not, this binding changes over time. For example, in imperative programming languages, the names of mutable variables remain static but the underlying data changes with regularity. For example, consider the following for loop.

for (int i = 0; i < 5; i++) {
System.out.println(names[i]);
}


During every iteration of the loop, the name i is bound to a different value. This type of temporal binding certainly has its use, but we're now living in a world that is increasingly dominated by multithreaded, multicore, and distributed computations where immutability is the key to stable, performant, and scalable systems. As observed by Helland in [13], immutability has crept its way into nearly every layer of the computation stack, from the lower FS layers with immutable files to the application layer with append-only logs and application-generated data sets. In that same article, Helland gives examples of how names may be rebound to point to different immutable blocks. In this way, the binding between a name and data can change even though the underlying data itself is static. This seems problematic to me. Generally speaking, data is accessed via a name of some sort, be it a CCN interest name, variable name, pointer address, or HD block number. If the name for some data is not available, or the name has been rebound to a different piece of data, does the name represent the same thing? Or, as Pat Helland put it, if the binding from some name and the underlying data changes, can we still consider the data to be immutable [13]?

This is a difficult question to answer. But, in an attempt to do so, consider the following. We typically represent a naming scheme as an injective function where the domain is the set of names and the range is the set of values that are "alive" or "accessible." As bindings change, it may be the case that there is a data element for which there does not exist a name that maps to the data. These "nameless data elements" should only exist for those data items that are no longer needed or go out of scope. To be specific, let's revisit the for loop example above. Within the scope of that for loop, the name i is bound to 5 different values: 0, 1, 2, 3, 4. After each iteration of the for loop, the previous value to which i pointed is no longer needed -- it's out of scope. In this case, it's okay for that value to be lost to the nameless abyss.

The problem is that the scope of a data item is not always clear or under our control. Consider the name of some content object in CCN. How does a consumer know the scope of this name? How does a router know that the binding has changed and should therefore discard its cached copy of the content? Given a content object which binds the name "/foo/bar/baz" to some data, how does a router know that it should delete the content object when this binding changes?

Put simply: they can't. CCN does not provide a rich enough naming scheme to express these the scope of the data it represents. Yes, a producer can specify the "expiry time" of a content object. But that does not necessarily correspond to the lifetime of the data encapsulated by the content object. The producer is responsible for tuning the lifetime of a content object to the expected lifetime of the binding from the name to the underlying data. If the name of the content object instead pointed to a single piece of data and the binding did not change, e.g., "/foo/bar/baz/v10," then this is no longer a problem. Consumers would ask for a specific version of some content, i.e., some name that is never rebound to another data item, and would retrieve that for which they asked. But if we go down that route, then we are left with another problem. Namely, how does a consumer learn the latest version of content they are interested in?

One approach would be to use two names for content: (1) a name that always points to the latest version of some content (i.e., the latest binding), and (2) a sequence of names that point to predecessor values (until they go out of scope). The producer is free to change the bindings at their discretion so long as they ensure that the lifetime of data in the network matches the rate at which this is updated. This is depicted below.

Alternatively, to encode all versions of the data, we could use store the data in a persistent data structure like a fat tree [14]. This would allow each version to be recorded as the bindings changed in time without having to worry about scope.

There's one problem here, though. What about data that changes at sporadic intervals or based on events? Breaking news, for example, is just that: new content that arrives unexpectedly. If an application (stupidly) decides to name its news data under the name "/bad/news/latest," how can a consumer be sure that they have the latest data? A far superior approach would be to not cache the head of the data but to only cache specific, immutable versions of the data and then refer to these static data elements via redirection. For example, the data named "/bad/news/latest" could always produce a list of names of the latest stories, where these names are fixed identities and do not change over time (i.e., with infinite scope).

## Privacy with Service-Centric and Data-Centric Names #

As the name implies, CCN is all about naming specific data items and transferring them around the network. Requests for data express the name in full. However, these requests are routed based on some subset of the name depending on the configuration of network routers' FIB entries. Thus, there must exist some minimal prefix that is necessary to route an interest from a consumer to an authoritative producer. The rest of the name, i.e., those components which are not included in any LPM match when indexing into a FIB from the consumer to the producer, can be elided or encrypted. Currently, this portion included in cleartext in order to allow the producer application to act on them to return the right data. However, this is not a requirement for communication; it is merely serves to help the producer. To attain some amount of privacy, this suffix could (should) be encrypted so that eavesdroppers cannot identify or classify what data is going on over a link.

Let's assume that we do encrypt the suffixes of a name such that the only part that's left in the clear is the minimal routable prefix. Is that enough to ensure privacy? Maybe. The information that's leaked by that name is is profoundly influenced by the application namespace and the routing protocol.

To support this claim, consider an email application which makes a user's inbox available under the following namespace:

/edu/uci/ics/email/woodc1/inbox/


If the email application ran on a host which advertised the prefix /edu/uci/ics/email, this would be the minimal routable prefix. The suffix /woodc1/inbox can be encrypted to hide the fact that it refers to my inbox. However, an eavesdropper still learns that the request is for some UCI student's email data. The minimal routable prefix in this case only exposes the service from which data can be fetched. This is no coincidence: more general service-centric names leak less information through requests. (Consider what would happen if the email application was running on my personal machine with the minimal routable prefix /edu/uci/ics/woodc1/. What is leaked then?)

The type of names (data- vs service-centric) are critically important to the privacy of a request. However, a complete discussion of this design space topic is outside the scope of this small post. So, for now, I'll simply say that the solution does not seem simple. Especially after we muddied the waters by insisting on treating names as both locators and identifiers.

# Wrapping Up #

Make no mistake: naming is hard to get right. In the CCN case, this difficulty does not stem from the desire to make names readable or understandable. It is a direct result of the fact that application namespaces and data names impact many seemingly unrelated problems such as routing, mobility, security, and privacy. Of course, these problems should not come as much of a shock given that CCN does away with services like DNS.

Part of my ongoing research is to help decouple names from the inherent security and privacy issues discussed above. There are other more qualified individuals working on the remaining problems. Hopefully, we make progress going forward.