Implementing An Organizational Directory Service

10. Maintaining the Service

Ensuring the integrity of the database is of prime importance. If information is invalid or out of date service loses a great deal of its value. Usage levels will drop and effort will have to be invested in order to bring the service back up to scratch and raise confidence levels. However, be aware that despite the best efforts errors will always occur and that your directory will rarely be completely correct.

Whatever you do in the initial phases long term maintenance costs will probably outweigh this effort. Much can be done to reduce maintenance overhead by considering the issue at an early stage. Ensure that the directory database isn't too complex and doesn't contain too much data. An information `heavy' database will be inherently difficult to maintain. Locality information, such as building names and room numbers, may seem useful but in the long run will also lead to extra overhead in the case, say, when a department relocates. Making management processes simple and efficient is also likely to make them more rigorous - the less work there is to do, the less chance there is of things going wrong. There is then a trade off to be made between the richness of the directory and the required level of maintenance effort.

Reducing dependency on master sources of data will also go a long way to help. If the directory is loaded with data from a number of sources, try to reduce the number of these as the data merging process is itself likely to result in errors. Problems lie mainly in the correlation of data entries between databases. Suppose, for example, that the e-mail and human resources databases are used as master sources but different name forms are used in each. The address `Joe.Smith@soap.com' may actually correspond to the database entry for `Joseph David Smith'. Cases such as these (and there may well be many) demonstrate why it may be difficult to relate the entries as there is not necessarily a simple and reliable rule for tying them together.

Make estimates of how much data change is likely to occur. The database will require update for many reasons:

Staff arrival and departure.
Internal staff movement, e.g. transfer or promotion.
Relocation of groups and individuals.
Change of addressing information, e.g. phone numbers, e-mail addresses, etc..
Personal name changes.

The rate of expected change coupled with the required tolerance to short term errors will dictate levels of maintenance effort. If tolerance is low and anticipated database changes regular then maintenance procedures should be defined well before the directory is in operation. The following sections explore some of specific tasks that may be undertaken, giving consideration to why these might be appropriate in a maintenance scheme.

10.1 Automatic Synchronisation with Master Sources

The directory is likely to make use of a number of master sources of data. The personnel database is one example, the e-mail address system another. The directory could be kept in step with these sources of information in several ways. The choice will largely depend on the nature of the master databases, and on how many of them there are. If the directory takes data from a single source (unlikely but possible) then directory update might be best fulfilled by periodic download. If multiple sources are involved then some merging of data will be necessary.

Data merging is problematic because it is often difficult to correlate information across databases. There are two main reasons for this:

Inconsistent naming. People may have different given names across databases, e.g. `Joe Smith' and `Joseph Smith' may be the same person.
Non-unique naming. Two people in the same organization with the same name. This is made worse if they`re in the same department.

Obviously, the greatest problems will occur when a number of entire databases are merged and piped into the directory (the more names there are the greater the probability of name clashes and ambiguities occurring). For this reason any update processes that require data merging should be incremental, i.e. changes to the source data (in whatever form it is held) should be followed by a corresponding change to the directory (hence synchronisation).

Ideally synchronisation process will be automated without any need for manual intervention. The only reliable way to achieve this given problems such as the above is to tag data across all databases with a unique identifier. This will make large scale data merging a much easier task by ensuring exact correlation between entries in all source databases. This, however, requires that all master databases are capable of holding unique keys in the first place. The chances are some manual input will be required to resolve inconsistent data found during synchronisation.

10.2 Manual Update at Source

One way of reducing the need for synchronisation is by implementing administrative procedures for directory update. Here, whenever staff movements occur the registration process will include a change to the directory database. Whenever new staff enrol the registration process will involve an addition to the directory. When staff are promoted or leave the equivalent change should be made. By and large, rolling small scale updates into everyday procedure will eliminate the need for merging data from master sources.

`Registration at source' will most likely begin in the personnel department, where all staff movement is recorded anyway. Alternatively the IT department may be the focus, especially when it is customary for all staff to be given access e-mail facilities as this will mean the e-mail database will contain all staff. In either case some merging of data will be required unless unified registration procedures can be defined. Generally speaking, if the personnel department is used as the trigger for addition of new staff to the directory then their e-mail addresses will have to be incorporated at a later date. Similarly, systems administration could add new computing users to the directory, but other information, such as job titles, fax numbers, etc., would have to be added separately.

Registration may well be a stepwise process:

Personnel adds new staff to the directory.
The IT department registers new staff, but only if they already have a directory entry.
The new staff member`s local manager updates the directory to contain localised information, such as room numbers, secretaries, phone numbers, etc..

For other changes, e.g. staff transfer or promotion, like procedures will have to be encoded into internal policy.

Obviously the `update at source' concept will have to be adhered to rigorously or else errors will creep in and require later rectification. In practice management responsibility will have to be assigned to specific staff. This may mean the recasting of existing roles or require the uptake of new staff members acting as dedicated directory administrators. In this case all registration procedures will involve notification of change to the directory maintenance group, who should make the relevant changes to the database.

10.3 Integrity Checking

Mistakes will arise in the database, no matter what procedures are used for update. The most significant ones involve `dangling pointers'. Take for example a role entry that contains a `roleOccupant' attribute pointing to a personal entry. If that person leaves and their directory entry is deleted then, without due care, the `roleOccupant' pointer in the role entry would be left pointing to a non-existent entry. The same problem will arise for `seeAlso', `secretary' and, to a certain extent, `labeledURI' attributes.

For directory entry pointers (`seeAlso', `roleOccupant' and `secretary') should be handled by data management tools, i.e. every time an entry is deleted or renamed perform a search on the database to ensure that no pointers to that entry exist. In cases where management tools don`t support this, mechanisms for `spring cleaning' the directory should be installed. In general these can be performed automatically (which is just as well because checking manually would be a very time consuming and tedious process!) using a process like the following:

Perform a subtree search for all entries containing, for example, `seeAlso' attributes.
For all `seeAlso' attributes found perform a read on the DN pointer value.
If any invalid DN pointers are found either delete them or flag them for attention.

A similar method could be used to check World Wide Web references contained in `labeledURI' attributes.

Whilst such errors will build up over time, unless management procedures are badly out of step there should be relatively few mistakes and so database-wide integrity checks will not need to be performed on a regular basis. Monthly sweeps should be more than adequate on all except the largest and most volatile databases.