April 2003
Adjudicated double entry
The Medical Research Council (MRC) of the UK maintains a research center in The Gambia, which employs approximately 800 staff and runs infectious disease programs relevant to the region.
This work involves extensive data collection and the quality of the data is vital to meaningful research. Currently field based data collection is administered by 'paper' questionnaires.
The structure of data management within the MRC laboratories, The Gambia is shown in Figure 1. To ensure data integrity data are double entered by data entry clerks and adjudicated by data supervisors, with if necessary further checking by data managers.
Figure 1: Data Management Structure
Major criticisms of double data entry are:
Bespoke comparisons required for each project
Inefficient as all fields are double entered
Difficult to implement on 'live' systems
Error correction can introduce new errors.
The data management and statistics unit of the MRC laboratories have developed a generic double data entry program, which addresses these issues. The package is a Visual Basic (DAO) application in Access 2000. As the MRC laboratories have standardized on Access for database applications, the package assumes that the databases for verification are Access 2000 databases. However the code could be modified to connect to any ODBC source.
The package is generic and verifies any two Access databases, with identical table structure, table and field names. The program is split into error detection and error correction sections. The error detection screen after entering two database names is shown in Figure 2. Note that the verification process can be run immediately or exclusion criteria can be defined by clicking the exclusions button.
Figure 2: Error detection screen
The exclusions button allows whole tables or individual fields to be excluded from the verification as shown in Figure 3.
Figure 3: Exclusion criteria
To allow for many exclusions the Exclusions form can extract and save information from text files of exclusion information.
After running a verification the errors are reported in the right hand panel of Figure 2, as shown here. Records in the two databases are collated using the primary/foreign keys of the relevant tables and individual fields are then compared.
Two types of discrepancy are reported:
Records in one database, but not the other
Errors in fields for records with matching primary keys
Figure 4: Error reporting
Tables without matching records are selected from the first two scroll boxes in Figure 4 and on selection their primary keys are reported. The double entry scroll box, shows all the tables that have double entry errors, i.e. matching primary keys, but discrepancies between two identical fields. For this example the double entry scroll list contains the tables shown in Figure 5.
Clicking on PostVacc for example, displays a printable representation of all the errors in this table, in the format shown in Figure 6.
Figure 5: Table selection for double entry errors
Figure 6: Errors for PostVacc
SN and Dose are the unique record identifiers, the field name for the discrepancy is given in the final column and the actual entries in the two databases are given in the entry1 and entry2 columns. If necessary field descriptions from the tables can also be printed out and the relevant paper questionnaires can then be retrieved for adjudication by the data supervisor.
Many of the research projects at the MRC laboratories have a longitudinal element, with combinations of epidemiological and laboratory data, requiring continuous data entry over 2 or more years. Primary keys identifying records in subsets of variables (defined by exclusions), which have identical entries in both databases can be output at this stage, by using the output option of Figure 4. These lists can be used to produce validated intermediate queries during the lifecycle of a project.
Once the relevant paper forms have been retrieved and the incorrect fields have been identified using the information in Figure 5, the errors can be corrected, by clicking on the correct errors button in Figure 4. The resulting form shown in Figure 7, allows the correction of any double entry errors.
The controls labelled Entry A and Entry B are based on a query and automatically pick up the correct data type, input mask and any scroll lists that were defined in the two databases. These controls represent live connections to the first and second databases and corrections are immediate. The error count is decremented only if Entry A and Entry B are identical and correction continues until the number of discrepancies is zero. This eliminates the risk of introducing further errors whilst performing error corrections.At the end of a session an error report can be produced which lists the distribution of the errors between the two databases, which has helped to promote constructive competition amongst data entry staff.
Figure 7: Data correction screen
For practical use the data correction form should move quickly through database records and from experience at the MRC, a 450Hz Pentium III with 64MB ram is the minimum requirement. For network use the most efficient structure is to run the double entry package as standalone and connect to the databases for double entry via the network.
Pauline Kaye
Study data auditor
David Jeffries
The authors would like to acknowledge the staff of Figure 1 who have helped to implement this package at the MRC Laboratories, The Gambia.
Fully functional copies of the code and documentation can be obtained from pkaye@mrc.gm or djeffries@mrc.gm