Implementing An Organizational Directory Service

6. The Service Design

Remember that while users will make relatively infrequent use of the directory, its usage will effectively hold up wider activities if the service is either unavailable or inefficient. A user won't be able to send a message to a new recipient until the destination address has been found and retrieved from the directory. The goal of the service design is to define a system that responds quickly to user requests and has high (ideally total) availability.

How well any directory request is handled will largely depend on bandwidth between the calling user and the directory server containing the target data. For small organizations the service will be handled adequately by a single directory server and all users will connect to that server. In certain situations more than one server will be required, an organization with several sites lacking strong network interconnections being the most obvious example. Here low bandwidth connections would limit the usability of the directory if one machine at a central site was used to service all requests. In this case, a simple solution is to install servers at each site and implement a data replication scheme so that each server holds a copy of the entire directory database. In this way every local user has fast access to a site-local access point, and with no need to resort to inter-site networking when resolving directory requests.

This approach is something of a simplification and solutions may vary depending on local requirements. In particular, the way data is distributed and managed across servers will affect the connection map and the replication strategy. In many systems the entire database will be mastered on a central server, with other servers just copying this data (as in the above example). Some circumstances may require that parts of the overall database be mastered across a number of servers. An organization with a number of geographically distributed offices might well master local data in a local server in order to permit update at source. In this case a more complex connection scheme will be required to achieve efficient user access.

As well as providing quick response, the service should be resilient to failure. Even if a single directory server can provide adequate total service, consider deploying a backup in case of failure. Likewise, when data is mastered across servers think about replicating data between them so that any database element is available from more than one server. If the mastering server fails, then a copy of the data will be present in another server. Bear in mind that the replication operation is not `free' - some implementations may well slow down while the database is undergoing change. For this reason, data replication should be used judiciously, i.e. as an optimisation when service would otherwise be provided too slowly or when backups are required.

The following sections go over a few example solutions that should help you choose an appropriate architecture. Note that the more complex distribution schemes are specific to systems based on the X.500 architecture. Whilst LDAP directories possess some facilities for distribution of data, they are certainly not currently extensive enough to provide a large scale distributed service.

6.1 Central Master

The database is managed in its entirety on a central directory server. If other servers are required to serve remote sites then they receive shadow copies of the entire database from the central server. Figure 6.1 depicts this model.

Figure 6.1 Centrally Mastered Data

Shadow updates should take place on a regular basis. If the database as a whole is small and changes infrequent, then updates should be triggered by modification to the master database. If the database is large and rate of change greater, then the update operation should take place at scheduled times (e.g. during periods of low network activity), in order to avoid undue load on connections between the master and its shadows. The scheduled approach means that the data contained in shadow servers will `lag' the master copy between updates as changes to the master database are only reflected after a shadow update operation has occurred.

Figure 6.2 Distribution Server Architecture with no Replication

6.2 Data Distributed Across Servers

A more complex approach is required if the database is not mastered on a single machine. In this case the solution taken will depend on the capacity of intermediate network connections. If all servers are strongly interconnected then requests can be chained between them (this is illustrated in Figure 6.2). When connections are weak then replication between some or all servers will be necessary in order to ensure adequate performance levels.

Looking at the example in Figure 6.2, if server 2 has low bandwidth connections to servers 1, 3 and 4, then their data should be replicated to server 2. This guarantees that users connected via server 2 will have fast access to the entire database. Similarly, servers 3, 4 and 1 should replicate the data mastered in server 2, in order to serve their users.

6.3 Widely Distributed Data

In very large and complex organizations data may be scattered across a great number of machines. Whilst replication will probably be required to optimise access across the database, care will have to be taken in order to avoid overloading. The following could cause problems:

A single server may not be capable of supporting the entire database. Putting too much in could result in slower response times.
Regular shadowing between every server may cause undue load on hardware, again slowing service.
A complicated replication scheme involving all servers will be difficult to define and manage.
If not all servers are fully interconnected comprehensive replication will not be possible anyway.

Figure 6.3 Replication Around a Backbone

In practice a hybrid scheme involving shadow `centres' will have to be devised. Figure 6.3 depicts a replication scheme built around a server `backbone'. Here a set of `central' machines are interlinked to form the backbone. Backbone servers copy data from those on a spine. Jointly all backbone servers then contain a copy of the full database. User requests coming via peripheral `data master servers' need only go as far as the backbone for a response in the worst case.