EuroView Deliverable D11.1 - Framework Document No. 2 "Implementing Directory Services"

Brunel Home

EuroView Home

Deliverables

Implementing an Organisational Directory Service

2. About Directories

A directory is fundamental to office communication. Before you can post a letter, make a phone call or send an e-mail you need to know the number or address of the person you're contacting. Without an efficient directory, determining such information can become a time consuming and expensive operation - looking through a phone book, ringing the operator or in some cases sending a message to an e-mail administrator and waiting for a response. Electronic directories bridge this gap by making all contact information readily available through a single, user friendly, manageable and efficient information service. In doing so it facilitates and enhances the use of all communication processes.

Directories already exist in various shapes and forms. Telephone data is handled by paper phone books. Many e-mail systems contain an integrated electronic address book. The directory aims to rationalise such mechanisms by acting as the main repository for all communication data. If a telephone number is needed by a person they use the directory. Mailing list applications can use it to verify a person's e-mail address. The directory provides a focus for addressing needs by making the processes of gathering, storing and retrieving this information a central and co-ordinated activity.

The directory possesses a number of key features which help it go about its task:

Range of application. The directory can store many different kinds of information - anything from e-mail and network addresses to sound and image files. The data model is extensible so new types of data can be represented.
Searchability. Information can be located on the basis of common and known criteria. Being able to search the database in a user friendly way is an important aspect of the directory.
Accuracy. Ease of data management is central to the directory, so the directory should (or at least can) always contain the most up to date information.
Information sharing. Directories can be connected together in order to share data. This means that you can access the directories of other organizations, as well as making yours available to them.

Whilst it was initially conceived as a method of locating electronic addressing information, the properties outlined above mean that the directory is flexible enough that it can provide access to most forms of widely distributed information. Thus it lends itself to the support of many functions. Here are a few examples:

White Pages. Locate personal phone and e-mail information.
Yellow Pages. Local contact information related to services.
Public key management. The directory can be used to store and distribute public keys. This infrastructure can then be used to support secure applications, primarily messaging.
E-mail routing. The directory is beginning to be used for the storage of e-mail routing configuration.
Network address directory. A directory of networked services, e.g. printers.

The services listed all require some form of directory. Some of them are currently supported by existing directories (in whatever form). Others are not provided for in any co-ordinated way. The electronic directory aims to rationalise existing directory services by providing a "one stop" access point for all forms of contact information. If you want to find someone's e-mail address you can find it in the directory. If you need find the phone number of a service, it's there. It is extensible and thus capable of supporting future applications that have a directory requirement.

2.1 Benefits to Your Organization

The current trend in both the private and public sector is towards accountability. Citizens are demanding higher and higher levels of service and, importantly, more and better information about the services they are using. Organizations will have to take steps in order to live up to these expectations and maintain competitiveness. One of these measures will be ensuring that the communication with the consumer is as effective as possible.

Corporate telecommunications and networking strategy will be central to this. In order for IT systems to function well, a corresponding information framework is required - without access to addressing information proper use of the communication system cannot take place. Similarly, the more efficient the information framework, the more efficiently communication takes place. The quicker a phone number is found, the quicker the call is made and the service rendered.

There are several benefits to making addressing information readily available via the directory. Firstly, internal communication is improved, resulting in productivity gains. Secondly, the way in which citizens and clients communicate with the organization is affected - your services are easier to reach because contact points can be discovered in a structured, user friendly and convenient manner.

The gains are more than just practical. Publication of useful information is good for public relations, because it signifies that your organization is open to communication with the citizen, business and government. In this way the directory gives your organization a network presence and accessibility that can span international borders. The importance of this cannot be overstated given the ever increasing scale and diversity of the global networked community.

2.2 How Do They Work?

Electronic directories are distributed databases. This means that parts of the database are held on a number of connected machines. Although the directory isn't held in its entirety on one machine, access to it is seamless, so users are unaware of the underlying distribution of data. All a person using the service sees is a single database. This applies to every aspect of user access - reading and searching the directory as well as modifying its contents. This property of the directory is reflected in several aspects of its design. The database, for example, has a tree structure which is easily divisible into sub-components and so is suitable for distribution across directory servers (though more of this later).

This section looks at the underlying mechanisms behind electronic directories and explains some of the terminology. It is a summary and is intended to provide enough background for the reader to continue with the rest of the document, but without getting bogged down in detail. The majority of concepts apply equally to X.500 directories and those based on the LDAP protocol.

2.2.1 The Hierarchical Database

The directory database is organized into a tree structure where each node in the tree represents an entry in the database. Entries then correspond to some real world object, e.g. a person, department, organization or country. Similarly, the structure of the tree follows the real world relationships between the objects represented - people work within departments, departments are divisions of organizations, and so on.

The structure of the directory database is usually referred to as the DIT (Directory Information Tree). Figure 2.1 illustrates two common DIT structures. In the first of countries occupy the first level of the tree. Each country then "owns" a set of organizations, with organizations then containing departments and these in turn containing entries describing people and roles. In the second the structure reflects the Internet domain naming scheme, with the full organizational domain (in this case "dti.gov.uk") then containing personal and role entries.

Figure 2.1 Example DIT structures

A hierarchical data structure was adopted for a number of reasons:

Many real world relationships can be represented by a hierarchical model, e.g. management, geographical or Internet domain based structures.
Hierarchies are simple and understandable. Making the database comprehensible to people is an important aspect of the directory.
The database can be cleanly divided (into subtrees). This a key factor in the database distribution mechanism and also in the data management and ownership scheme. In Figure 2.1, for example, the subtree rooted at `Ministry for Social Affairs' could be regarded as a distinct part of the database, with its own data management and security policies.

2.2.2 Database Entries

Directory entries can be thought of as a database record consisting of a set of fields. The fields, in directory jargon, are known as the attributes of the entry. Some of the attributes of a personal entry could be the person's name or their telephone number. Attributes can have one or more value - a person may have more than one phone number for example.

Entries can represent different things, e.g. not just people, so each type of entry will contain different kinds of information. Every personal entry has an attribute for the person's surname, but the surname attribute would not apply to an entry for a room, which would have attributes for its room number and/or room name.

The real world object that an entry represents is described by its object class attribute. Every entry has one or more object class values that indicate what that entry represents, and what further properties it possesses.

As an example, entries representing people contain the `person' object class value. This object class specifies that the entry must contain values for the `commonName' and `surname' attributes (note that the quoted attribute names are the labels defined by the X.500 directory standard - other directory protocols may use different names). It can also optionally contain values for other attributes such as `telephoneNumber'. By applying further object classes to an entry more information can be added to it. In order for a personal entry to contain a World Wide Web reference (via the `labeledURI' attribute) it must contain the `labeledURIObject' object class. A few basic object classes are listed in Table 2.1.

Person OrganizationalRole Organizational Unit

Mandatory Attributes commonName
surname commonName organizationalUnitName

Optional Attributes telephoneNumber
description
seeAlso roleOccupant
description

Table 2.1 Example Object Classes

2.2.3 Directory Names

Entries in the directory are identified by their entry name. The entry name consists of one or more of the attribute values from the entry. The attribute values used to name an entry are referred to as its distinguished values. An organizational entry, for example, is generally named by one particular value of its `organizationName' attribute. An entry representing the European Commission may contain the following values for the `organizationName' attribute:

o= European Commission
o= EC

The entry for the European Commission would then be named by just one of these values - the distinguished value.

Figure 2.2 Directory Names

Each entry in the DIT has a unique name. The name of an entry is formed by concatentating the distinguished values of all entries up to the top of the tree, beginning with the entry itself. Looking at Figure 2.2, the name for the `Pensions' entry is then:

ou=Pensions; o=Social Services; c=GB

The name of an entry is referred to as its distinguished name (or DN for short). The DN of an entry is the key used to reference the information that it contains.

2.2.4 Querying the Directory

Users are permitted to retrieve and search for directories. Directory reads are performed by supplying the directory name of the entry in question, together with a list of attributes required. The search operation is used for friendly look up of directory entries, i.e. it is the means of locating an entry if you don't know its directory name. The search operation works by applying a filter across a set of directory entries. A filter consists of one or more attribute value pairs which are compared against the contents of an entry to see if they match. The match operation can be performed using contained substrings (so the string "smith" would match "Joe Smith" or "Barry Blacksmith") or using an approximate match algorithm which compares the sounds of words (so "shilton" might match "sheldon").

Figure 2.3 Example DIT

The search operation can be applied to single entries, all immediate children of a given entry or all entries in a subtree. In the example DIT shown in Figure 2.3, a subtree search using "commonName=Peter*" (i.e. a leading substring) would find all entries beginning with "Peter". Similarly, a filter "cn=*apple" would find all entries ending with the string "apple". Combining the two filters would result in a single match, the entry "cn=Peter Pineapple", because it begins with the string "Peter" and ends with the string "apple".

Figure 2.4 User Access to the Distributed Database

2.2.5 The Distributed Database

Users access the directory by using a client application to connect to a local directory server (servers are referred to a Directory System Agents, or DSAs for short). The directory is then queried as a whole using a single session. In the common case a user will query a local DSA containing local information, e.g. the directory of the organization they work for. When a user requests directory data not held locally, the connected server will pass the request to a DSA deemed to be in a better position to satisfy the request. The result is then chained back to the user via the server he or she is connected to (this is illustrated in Figure 2.4). Other models are possible. In another common mode the user interface connects directly to any server that contains the information it needs.

2.2.6 Replicating Data

Sometimes directory servers will be linked by low bandwidth connections. This results in slow access to data not held in the local server and so service quality is compromised. One way round this is to replicate data across servers. Copying regularly accessed remote data to a local access point means that less network usage is required to get at that data and also helps to balance load across servers. In this way data replication can be regarded as an optimisation of directory performance.

The replication mechanism is also used as a method of providing service resilience. Here all data in a service server is copied to a shadow server. If one server goes down, then service access can be routed via the backup.

2.2.7 Restricted Access

When a user connects to the directory they can identify themselves as a given entry in the directory (usually their personal entry), supplying a password or digital signature to verify this. A directory user`s identity can be used to restrict the information they are allowed to access from the directory. Directory access control can be used to stop people from viewing or updating the directory. Restrictions can be applied to whole entries or to individual attributes within them, so a user could be allowed to, for example, modify the `description' attribute in their own entry but not any other.

The value of an access control system is entirely dependent on the type of authentication used to identify directory users. The most common form of authentication in use today is `unprotected simple' authentication. Here a user`s identity (a directory name) and password are passed to the directory at connect time in unencrypted form. Obviously, this does not offer total protection (password snooping is possible). If access controls are used to restrict dissemination of highly sensitive data then authentication based on encrypted methods are best used, especially when off-site connection to the directory is permitted.

	Person	OrganizationalRole	Organizational Unit
Mandatory Attributes	commonName surname	commonName	organizationalUnitName
Optional Attributes	telephoneNumber description seeAlso	roleOccupant description