Computers and Software
A Case Study in Utilizing Computer Technology: The Atlas of Historical County Boundaries
John H. Long, March 1992
It is difficult to imagine designing a history research project today without a major role for a computer. Marvelous as computer technology is, however, it is hardly a panacea. Success with a computer depends on the answers to two basic questions: Which tasks should be computerized? What are the right equipment and programs to accomplish those tasks? While there obviously is no single answer to either of those questions, some considerations such as suitability and cost are common to nearly every undertaking. What follows is one example of how such practical issues can be resolved.
In our project, the Atlas of Historical County Boundaries, a small staff must handle large amounts of data extracted from numerous and varied sources. Our goal is to describe, document, map, and measure all changes in the shapes, sizes, and locations of U.S. counties from their colonial origins to 1990. The sources, the information, and, therefore, the work organize themselves naturally by state, and our publisher, the Academic Reference Division of Simon and Schuster, will, with few exceptions, publish the results in a separate volume for each state. (Exceptions will be very small states, like Delaware and Rhode Island, which will be combined with larger neighboring states.) Despite the apparently neat and easy organization by state, there is a high incidence of overlap or redundancy that arises from the nature of our subject matter; altering the configuration of any county necessarily involves one or more adjacent counties, and changing a state line affects at least one neighboring state, plus counties in both states. Sorting data and filtering some of them out of a whole mass are among the jobs computers do far faster and more reliably than people, so there never was any question about whether the Atlas project would employ computers.
There was little difficulty in identifying the tasks for which we would need a computer. Correspondence, proposals and reports, and final versions of the textual matter for each volume (e.g., the tables of colonial, territorial, state, and federal censuses; the introductions; the chronologies; the methods essays; and the bibliographies) obviously would be handled best through wordprocessing. Building the bibliography (now around 2,000 items), including annotations to identify the states to which each work relates, and compiling data about each boundary change (e.g., date, description of the event, references to sources) would require a database program. We also chose to employ computer technology to measure the areas of the counties in their different configurations. As will be explained later, other tasks, like working up the budget and drawing the maps, were not computerized, even though it was certainly possible to do so.
Hardware selection involves many, many factors, but one consideration applies every time—suitability. The best analogue may be choosing a car: a subcompact hatchback will hardly be up to hauling a Little League team and its gear, and there is no advantage in being able to cruise the interstate at 100 miles per hour when every trip will be a short errand in the city; in any case, consider the potential difficulty of maintaining a Chevrolet in a community where everyone else, including the local mechanic, drives a Ford. Some people argue the wisdom of buying the most up-to-date equipment, while others believe it is more important to understand that, given the speed at which the microcomputer technology changes, the highest-priced, latest technology of today will be inexpensive and unremarkable tomorrow. Once you have a good idea of what you will want your computer to do, try to find some equipment reviews written in plain English, ask associates about their experiences, and consult the people at your institution's computer center. Suitability also includes your comfort and convenience, so be sure to arrange a "test drive" of the keyboard and the image on the monitor.
The Newberry Library, our headquarters, is, in computerese, a "DOS shop," meaning that all desktop computers there are IBM Personal Computers or machines that employ the same disk operating system ("DOS" being the acronym for that particular system) as IBMs do. We have two IBM-standard microcomputers for the Atlas project. Each has a hard disk that provides plenty of storage for our programs and data and the convenience of not having to shuffle floppy disks when working at the machines. Although neither model is particularly advanced, one of them has sufficient power to cope easily with tracing boundaries and calculating county areas.
We also have a digitizing tablet (a Genius Tablet, model GT-1212A), the least expensive one available at the time we purchased it, connected to the more powerful computer. A digitizer converts things like lines into digital form. The digitizing tablet functions, in effect, as a sheet of electronic graph paper; when we lay a map on the tablet and use the attached electronic stylus to trace a boundary line, the digitizer records the path of the stylus as a series of x and y coordinates, numbers that the computer can easily manipulate. Through a graphics program called Generic CADD, Level 3, we have the computer use those coordinates to calculate the areas enclosed by the lines. This process of digitizing and calculating is neither difficult nor complicated in its execution, and, compared to manual alternatives, it permits us to determine county areas more quickly and accurately.
The wordprocessing program we use on the Atlas project is not only the most popular in the country but also the official choice at the Newberry, namely WordPerfect. WordPerfect and a number of other wordprocessing programs can do more than we could imagine; therefore, as in the choice of hardware, the institutional standard was the determinant. Every program has a distinct style, often termed its "look and feel," but neither that style nor a user's personal taste can be given much weight in this situation because having a common program for every department is chiefly a matter of developing expertise and promoting common knowledge. So-called "full-featured" programs, of which WordPerfect is a leading example, are so complicated and capable of so many different operations that few people ever master them completely, and fewer still ever use every feature. Such programs commonly take time to learn, and even experienced users can have difficulty when they try something new. Institution-wide adoption of the program creates a pool of knowledgeable users who can help a neophyte and whose collective experience probably covers every problem that will be encountered.
The database program we chose to handle the bibliography and the data on boundary changes is called askSam. All programs organize the information we pump into the computer into separate, distinctly named files, and some programs further subdivide the files into records and the records into fields. A wordprocessing program like WordPerfect does not go beyond making each document a separate file, while a database program, designed to sort collections of data and to select some from the rest, could not work without subdivisions. For example, a data file might be a bibliography in which each entry (e.g., book or article) would be a record and each element of an entry (e.g., author or title) would be a field. Whereas conventional database programs limit the size of each field and record and require that the name and length of each field be fixed (e.g., four spaces for a field called "year") when the file is initially set up, askSam allows different approaches to defining fields and permits both records and fields to be practically unlimited in size. That flexibility and the remarkable speed with which the program finds and sorts records are what made askSam our choice.
We employ named fields in all of our askSam files because when entries or records are as similar in structure and content as ours are, named fields facilitate both data entry and data retrieval. In this way, we avoid being more specific and narrow than necessary. For data on county changes we create a separate file for each state and handle each file as a chronology with three named fields in each record: "Date," "Ref" (which holds the source citations), and "Event." In contrast to the state-by-state treatment of county data, we gather all sources, regardless of the state(s) to which they apply, into a single, consolidated bibliography. We set up four fields for each record in the bibliography: "Entry" is where the full citation is entered; "NL#" is the call number of any source found in the Newberry's collections; "Range" takes the abbreviations of all states to which the work pertains; the last field, "Comment," is a catch-all. We adopted the broadly inclusive "Entry" in preference to a large number of tighter fields (e.g., author's last name, title, place of publication) because our materials range from books to maps to manuscripts and we must mix the works of known authors with materials for which we have only a title. Creating a single alphabetical list that contains both author entries and title entries is either impossible or extremely awkward with the conventional approach.
Work in the bibliography is further eased by askSam's unlimited records and fields, because the length of an "Entry" could range from two lines to four or five. In a chronology, where four words may do for one "Event" entry (e.g., BRONX created from WESTCHESTER) and four lines may not be enough for the next, not having to worry about a limit on field length is more than mere convenience. Two other virtues of askSam are, first, the remarkable speed with which it sorts records and finds those containing the word(s) or number(s) specified in a query and, second, the fact that no key words need be identified or flagged in advance for those high-speed searches to work.
Powerful, quick, and convenient as it is, askSam is not perfect. Compared to entering data and finding it again—processes assisted by helpful menus of commands—extracting a report (i.e., a formatted printout or screen display) is not as easy. We get around this problem simply by not trying to produce finished-looking reports in askSam. Instead, we transfer information from our askSam files into WordPerfect, where the tools for formatting and editing are more powerful and much easier to use.
And what of those tasks for which we could use a computer but choose not to do so? Two kinds of costs keep us from computerizing our map making. The first is dollar cost. Data files of the base maps we use are available, but they sell for much more money than even specially printed versions of the same maps. More important, the equipment (computers, monitors, digitizers, printer, etc.) and the software required to do the mapping and to produce high quality final copy would take many more thousands of dollars than we have. The other cost is time—the time it would take to set up the programs and procedures and for everyone to become adept with such a high-powered graphics facility would probably cut significantly into the research without saving much in the production of final copy. The same arguments apply to the making of budgets, although on a smaller scale and with the added factor of infrequent use. Our budgets are relatively uncomplicated, new ones need be constructed only every two years, and the Newberry's Business Office efficiently handles the day-to-day accounting. It hardly seems worthwhile to buy and learn a spreadsheet program to accomplish a task that is otherwise so little trouble.
The Atlas of Historical County Boundaries could hardly proceed without the benefit of contemporary computer technology; simply controlling errors by permitting us to enter every bit of information (e.g., bibliographic entry or description of a boundary event) only once is, by itself, enough to justify its application. Suitability and cost have been our guides, and, although the details may be debatable, it is hard to go wrong with that approach.
Software Used for the Atlas of Historical Country Boundaries
askSam, v. 5.0
P.O. Box 1428
119 S. Washington
Perry, FL 32347
Prices: $395 (retail); $99.95 (educational discount)
Generic CADD 5.0 (replacement for Level 3)
Generic Software, Inc.
11911 North Creek Parkway S.
Bothell, WA 98011
Prices: $495 (retail); call for educational discount
WordPerfect, v. 5.1
1555 N. Technology Way
Orem, UT 84057
Prices: $495 (retail); $135 (educational discount from selected dealers; call for list of dealers)
—John H. Long is an editor of the Atlas of Historical County Boundaries.