Ensuring the integrity of the database is of prime importance. If information is invalid or out of date service loses a great deal of its value. Usage levels will drop and effort will have to be invested in order to bring the service back up to scratch and raise confidence levels. However, be aware that despite the best efforts errors will always occur and that your directory will rarely be completely correct.
Whatever you do in the initial phases long term maintenance costs will probably outweigh this effort. Much can be done to reduce maintenance overhead by considering the issue at an early stage. Ensure that the directory database isn't too complex and doesn't contain too much data. An information `heavy' database will be inherently difficult to maintain. Locality information, such as building names and room numbers, may seem useful but in the long run will also lead to extra overhead in the case, say, when a department relocates. Making management processes simple and efficient is also likely to make them more rigorous - the less work there is to do, the less chance there is of things going wrong. There is then a trade off to be made between the richness of the directory and the required level of maintenance effort.
Reducing dependency on master sources of data will also go a long way to help. If the directory is loaded with data from a number of sources, try to reduce the number of these as the data merging process is itself likely to result in errors. Problems lie mainly in the correlation of data entries between databases. Suppose, for example, that the e-mail and human resources databases are used as master sources but different name forms are used in each. The address `Joe.Smith@soap.com' may actually correspond to the database entry for `Joseph David Smith'. Cases such as these (and there may well be many) demonstrate why it may be difficult to relate the entries as there is not necessarily a simple and reliable rule for tying them together.
Make estimates of how much data change is likely to occur. The database will require update for many reasons:
10.1 Automatic Synchronisation with Master Sources
The directory is likely to make use of a number of master sources of data. The personnel database is one example, the e-mail address system another. The directory could be kept in step with these sources of information in several ways. The choice will largely depend on the nature of the master databases, and on how many of them there are. If the directory takes data from a single source (unlikely but possible) then directory update might be best fulfilled by periodic download. If multiple sources are involved then some merging of data will be necessary.
Data merging is problematic because it is often difficult to correlate information across databases. There are two main reasons for this:
Ideally synchronisation process will be automated without any need for manual intervention. The only reliable way to achieve this given problems such as the above is to tag data across all databases with a unique identifier. This will make large scale data merging a much easier task by ensuring exact correlation between entries in all source databases. This, however, requires that all master databases are capable of holding unique keys in the first place. The chances are some manual input will be required to resolve inconsistent data found during synchronisation.
One way of reducing the need for synchronisation is by implementing administrative procedures for directory update. Here, whenever staff movements occur the registration process will include a change to the directory database. Whenever new staff enrol the registration process will involve an addition to the directory. When staff are promoted or leave the equivalent change should be made. By and large, rolling small scale updates into everyday procedure will eliminate the need for merging data from master sources.
`Registration at source' will most likely begin in the personnel department, where all staff movement is recorded anyway. Alternatively the IT department may be the focus, especially when it is customary for all staff to be given access e-mail facilities as this will mean the e-mail database will contain all staff. In either case some merging of data will be required unless unified registration procedures can be defined. Generally speaking, if the personnel department is used as the trigger for addition of new staff to the directory then their e-mail addresses will have to be incorporated at a later date. Similarly, systems administration could add new computing users to the directory, but other information, such as job titles, fax numbers, etc., would have to be added separately.
Registration may well be a stepwise process:
Obviously the `update at source' concept will have to be adhered to rigorously or else errors will creep in and require later rectification. In practice management responsibility will have to be assigned to specific staff. This may mean the recasting of existing roles or require the uptake of new staff members acting as dedicated directory administrators. In this case all registration procedures will involve notification of change to the directory maintenance group, who should make the relevant changes to the database.
Mistakes will arise in the database, no matter what procedures are used for update. The most significant ones involve `dangling pointers'. Take for example a role entry that contains a `roleOccupant' attribute pointing to a personal entry. If that person leaves and their directory entry is deleted then, without due care, the `roleOccupant' pointer in the role entry would be left pointing to a non-existent entry. The same problem will arise for `seeAlso', `secretary' and, to a certain extent, `labeledURI' attributes.
For directory entry pointers (`seeAlso', `roleOccupant' and `secretary') should be handled by data management tools, i.e. every time an entry is deleted or renamed perform a search on the database to ensure that no pointers to that entry exist. In cases where management tools don`t support this, mechanisms for `spring cleaning' the directory should be installed. In general these can be performed automatically (which is just as well because checking manually would be a very time consuming and tedious process!) using a process like the following:
Whilst such errors will build up over time, unless management procedures are badly out of step there should be relatively few mistakes and so database-wide integrity checks will not need to be performed on a regular basis. Monthly sweeps should be more than adequate on all except the largest and most volatile databases.