Computers and Software

Computer Database Management for Historical Research and Writing

David L. Clark, April 1991

When Ronald Reagan was governor of California, he once suggested that all faculty members should be required to account for how they spent their time, in order to determine if they were spending it in the most productive manner possible. The proposal was met with outcries of indignation and was quickly dropped.

Today, as I look through my boxes of three-by-five note cards filled with cryptic abbreviations, which with each passing year become less readable, and as I see other library researchers filling out similar note cards, I wonder if it wasn't such a bad idea after all. We rarely talk with anyone about how we do what we do and how we could make our time more productive. That topic will be the subject of this article.

When writing a chapter or article, the researcher generally begins with a stack of three-by-five note cards spread over the desk. The researcher then works his or her way through the cards, translating the notes into first draft text. The cards are often difficult to decipher, either because they are handwritten or because they include abbreviations, the exact meanings of which are difficult to recall several weeks or months after the notes were written.

Computer database management can make the writing process much easier and more efficient. If the information is stored in a computer database, you can begin writing the chapter or article from a first draft. The History Database program which we are using at the Regional History Center of the University of Southern California will bring draft text from the database to a wordprocessor complete with entries for footnotes, section headings, an index, a table of contents, and a bibliography. The History Database program was created for the purpose of research, writing, and cataloging with historical materials and for recording information to assist in the preservation of historic sites. The program provides simplified data entry, editing, and searching facilities for use by historians, researchers, archivists, museum curators, librarians, preservationists, and others whose entries include textual descriptions. The program also helps a beginning computer user make use of a historical database in the same way that a reference librarian helps a freshman make use of a library. The History Database program is presently used by a variety of historical researchers and organizations.

When writing that article, what happened to the stage of moving from note cards to text? There are no note cards. The stage of translating from and retyping note cards can be eliminated. During research, information is typed directly into a data entry form and stored in the computer database on a portable computer. Default choices eliminate much of the typing, as the program picks the most likely entries from information stored previously and offers them to you to accept or change.

The last field on the form is a long DESCRIPTION field. Here you type the material to be used later as text. Type this in first draft, manuscript form, with complete sentences, rather than in cryptic, note-taking style. You can, of course, type only brief notes when you do not need a more complete description. If the information already exists in machine-readable form, you can send it directly to the database without retyping. The program includes all of the features common to word processors for entering and editing text.

When you are ready to write the chapter or article, commands prompt the database program to send the information to your word processor. The History Database program also writes footnotes from the bibliographic information contained in the database. Most of the drudgery of writing has been eliminated. Rather than wasting time retyping the same information from notes to text or organizing footnotes, you can instead devote your time to the more productive task of editing the material into a final, polished form.

Having information available in database form also greatly facilitates the research process. During research, information is most often collected in isolated bits and pieces. Computer database management helps to explore the linkages between many scattered pieces of information and to recognize patterns not otherwise detectable. The information will be instantly available for further investigation and use. For example, before you go to conduct an oral history interview, you can make a printed copy of information related to that individual to use as a basis for your interview questions.

The researcher spends the greatest portion of his or her time reading through primary source materials and recording notes and excerpts. Even one lone researcher will develop a considerable inventory of information. Yet, most of the information gathered will not find its way into the eventually published work. The unpublished material still has value and, particularly in the case of historical materials, the information does not lose its value over time. Research on the French Revolution does not grow out-of-date five years later.

The combination of computer database management with methods borrowed from library and archival cataloging practice will allow a researcher to use that information collected in the past in the same way that a business uses its inventory. The data will be there to recall immediately whenever you have a question. You can take advantage of each new project to add to and deepen your understanding of the information collected previously. The total body of information you have collected can be treated as a coherent whole.

An example of such reuse occurred when I presented a paper to the Modern Language Association on "Raymond Chandler's Los Angeles." In the course of writing a history of Los Angeles I had collected many Raymond Chandler quotes such as "Cops never say goodbye. They're always hoping to meet you again in the line-up." But I had gathered the research for that history on note cards, before using a computer, and the notes were organized by subject rather than by author. I had to sift through six long drawers of cards to comb out the Chandler quotations. Had the information been stored in a computer database, I could have searched at will by author, by subject, by the names of individuals and organizations featured in the materials, by the time period when the materials were created, or by many other criteria. For example, I could have asked History Database the types of questions listed below:



History Database also offers a "simple search" utility—to conduct a search you merely pick choices from a menu. It is not necessary to give a command. The "search by example" method presents a blank data entry form, into which you type the names or subjects that you want. When you are entering or editing data, you can also flag records for different purposes, and retrieve them as a group later. Because you cannot always anticipate all the ways in which you may eventually want to use a piece of information when you record it, History Database allows you to use your information in ways that originally you never imagined.

New research and cataloging methods hold promise for researchers and subject specialists not only in the improvement of traditional scholarly work, but for additional work in the areas of consulting, cataloging, and public history. The adoption by the library, archival, and soon the museum professions of uniform descriptive standards such as the MARC (Machine-Readable Cataloging) format and the Library of Congress Subject Headings opens a tremendous new employment potential for the historical profession. A historian specializing in a given subject, who has also learned to use computer database management and standardized cataloging tools, will be able to apply the same knowledge and understanding to the holdings of any institution. Database management plays a key role in this process. The History Database program makes the subject specialist's time more productive by using default choices to eliminate the repetitive drudgery normally associated with cataloging, and by removing the need to enter special cataloging codes by supplying those codes from authority files.

As mentioned above, data entry and editing are performed by filling out and changing information on the data entry form, similar to filling out a form on paper. In contrast to a paper form, however, when adding new data, the History Database program will fill in much of the information for you, based upon information stored previously. The result is a drastic reduction in the amount of typing that you have to do.

When the program anticipates your answer, it will present you with a default choice constructed from information already held by the computer, such as the date in the computer's clock, your name, the name of the project that you are working on, and data which you entered previously for the same project or on the same subject. If you do nothing to change it, the default choice is what you get. If you do not want to accept the default choice, you edit the field, changing the information in the field or adding new information.

Most data entry tasks involve a great deal of repetition. If you are examining a poster from a political campaign, there is a good chance that you have previously entered data on election materials from the same campaign. To repeat all of that information manually is to use the computer as a glorified typewriter. The History Database program will recycle automatically those elements which can be lifted from previously-entered data.

Another area in which the History Database program represents a substantial improvement for the purpose of historical research is in the program's use of variable-length fields. The amount of information which we record about historical materials will vary widely from one item to the next. Yet most database programs use fixed-length fields. For each field you must stipulate a fixed length in advance, before you enter any data. If you later need more space than you allocated, you are out of luck. The History Database program uses variable-length fields, into which you can type as much information as you wish. The data entry form will change dynamically on the screen, to accommodate the amount of information that you wish to add. These variable-length fields are vastly superior to the memo fields offered by DBase and other programs which are variable in length but which cannot be searched. History Database can search every field by content and by keyword, including the long DESCRIPTION field.

Some free-text programs, such as InMagic, provide variable-length fields, but the fields can be searched only after indexing, and the index files may take up to three times more space than the data. Changes in the text require rebuilding indexes, a process that can be quite time consuming. The complexity of the indexes also makes them subject to damage. Since the data can be reached only through the indexes, if you lose your indexes you lose your data. The figure on the space taken up by indexing is based on personal experience, the InMagic manual, and Carol Tenopir and Gerald Lundeed, Managing Your Information (New York, NY: Neal-Schuman Publishers, 1988), page 189. In regard to losing data, Tenopir and Lundeed noted that InMagic experiences a "problem with files getting corrupted for unknown reasons" (page 191). Free-text programs are intended for loading and searching computer files taken from outside sources, such as wire service reports, rather than for original cataloging and research.

The greatest disadvantage of free-text programs is that they lack the standard database management facilities for changing, correcting, updating, and restructuring information, and for searching on information combined from different files, such as when collection-level and item-level data are combined. You should be able to search for items donated by a particular person, even though the name of the donor was placed only in the collection-level record, and not repeated in all of the records which describe individual items in the collection. In the History Database program, all of the fields can be searched and manipulated with an extensive range of database capabilities. Indexes can be created and maintained automatically to speed up searching on fields such as the SUBJECTS field, which are searched most frequently, but indexing is not necessary for searching.

The History Database program also contains a set of utilities which help to maintain the consistency and accuracy of a database. For example, what happens if you misspell a name ten thousand times? Will you have to comb through the database to make ten thousand individual corrections? With History Database, you can make the correction through the use of global search and replace facilities, which will make a change to a selected field across the entire database. The History Database program offers twenty-five separate varieties of global search and replace operations, which go beyond the correction of spelling errors to changing the word order or format of data such as names and dates, moving data from one field to another, removing unwanted material, and automatically inserting new information needed to update old records. If you wish, you can choose to inspect and then approve or cancel each change, rather than having all the changes made at once. After you have inspected a few, you can opt to have all of the remaining changes made automatically. You can also decide to intervene personally and edit the change.

Those who are already using wordprocessors should note that the type of global search and replace carried out by a wordprocessor is not the same as that needed for database maintenance, because the manner in which they hold information is fundamentally different. A wordprocessor does not separate data into fields, therefore a word processor cannot restrict a change to a single field. A wordprocessor also cannot apply conditions, such as limiting a change according to information held elsewhere in the record. For example, with History Database, you can specify that a change should take place only if the record concerns a given time period, deals with a particular industry, or includes other conditions which will separate the records that you want to change from those that you wish to leave untouched.

The History Database program is again distinguished from most other database programs in that it will separate individual values within a field. Examples of values that should be treated as separate entities are the names of the individuals who appear in a photograph or who are mentioned in a document, and individual subject headings entered into the SUBJECTS field. The History Database program will store the name of each person and each subject heading as a separate value or sub-field. The result is that you have better control over your data for searching and for global search and replace. Controlled change is essential for maintaining the consistency and accuracy of a database and for recycling the data later for other purposes.

Historians today should seize the new opportunities created by the computerization of research and by the creation of standard formats for description. Personal and portable computers have made the new methods applicable to research carried out at the most remote locations and on the most limited budgets. In addition, the History Database program has adapted computer database management for use with historical materials. The program provides simplified methods to give computer novices access to the industrial-strength database management power which is needed to control the mountains of information which research and cataloging produce, and to make swift and efficient inquiries of research conducted in years previous.

The Regional History Center of the University of Southern California and the Los Angeles City Historical Society are beginning a History Computerization Project to encourage the application of computer database management and standardized methods of description to historical materials and to create a regional information network to facilitate the exchange of information between researchers, librarians, archivists, museum curators, and historical societies. The project will offer a series of short courses on the use of database management for history. The course textbook is Computer Database Management for Research, Writing, and Cataloging, by David L. Clark, published by McGraw-Hill. For more information on the project or History Database program, please contact David L. Clark, History Computerization Project, 24851 Piuma Road, Malibu, CA 90265, phone (818) 888-9371.