Teaching Quantitative History with a Database
Lisa Rosner, October 1990
Prosopography or collective biography has been described by Lawrence Stone, among others, as ideally suited to introducing students to historical research. In a recent seminar I taught students to use computers to create a prosopographical database with excellent results. I had been interested in doing this for some time since I use computer databases extensively in my own research and believed that creating one would be an excellent learning tool. In fact, it was—for me as well as my students. Not only did the database turn out to answer interesting questions, but more importantly, creating it raised a number of fundamental issues of historical research and interpretation.
I organized the seminar around a problem from my own research, the connection between medical education and medical careers in eighteenth- and nineteenth-century Britain. The focus of the seminar was the creation of a joint database though students also wrote individual research papers on related topics. My intention was to use the seminar to analyze the education and backgrounds of the Fellows of the Royal College of Physicians of London in this period. This was a particularly prominent group of physicians who had the further advantage of being well-documented in the Dictionary of National Biography and William Munk's Roll of the Royal College of Physicians of London. There were 179 Fellows to divide among 7 students, making 25 or 26 Fellows, which seemed enough for each student to get worthwhile results.
My three main goals for the course were for students to: experience the carrying out of original research; learn quantitative methods in history by actually developing the data to quantify; and acquire enough proficiency in at least creating a computer database to build one in the future.
My first step was to consult with campus computer center personnel about available facilities. I decided to use Ashton-Tate's dBASE III+ for the course because it is comparatively easy for creating a database, widely available on campus, and amply supported by the computer center. The head of user-support at the computer center provided me with "how-to" guides that she kept for dBASE and she requested copies of my student instructions so her staff could assist students if necessary.
The next step was in the seminar itself. I spent the first few weeks setting the context for the database because I wanted students to think of quantitative analysis as a way of answering historical questions rather than as an end in itself. That had been a problem students had in learning quantitative methods from statistics courses: they learned all the mathematical formulas except how to relate them to historical issues. Furthermore, I knew from experience that research for the database is hard work and I assumed that students would do it more carefully if they had questions they really wanted answered. In this case, the questions we asked had to do with what sort of factors helped physicians establish successful careers. Was it family background? If so, what kind? Was a physician more likely to become a Fellow if his father was or had been a Fellow? Or was it education that made a difference? Fellowship was, for all intents and purposes, restricted to graduates of Oxford and Cambridge Universities; however, many Fellows had also studied for several years at Edinburgh University's renowned medical school. Where, therefore, was the best place to study for a successful medical career? To begin to answer these questions, we read a variety of secondary and primary sources and discussed the arguments proposed by each. At that point I presented the idea of a database. Why not gather as much information as we could on Fellows and see what we could find out?
The students nodded dutifully at this and continued nodding after I handed each of them a list of twenty-five or twenty-six Fellows with a group of questions to try to answer about each, such as his/her birthdate, early education, father's occupation, and year of becoming a Fellow. Each student, I explained, would be creating his or her own database which would then be combined with the others into one large database. Each Fellow would make a separate record in the database and the students as a group would have to decide how the information could be organized into variables, which dBASE calls "fields." Each individual database had to have the same fields as the others, or they could not be combined at the end to give meaningful results.
Once the students had begun their research, the third step was to come up with a list of fields. The fields had to have names of no more than ten letters which we would be able to use and remember, such as YOF (year of fellowship) or F_OCC (father's occupation). Students also had to devise a way of standardizing the information they found in their sources so that the computer could tabulate it and avoid distortions from oversimplification. For example, a Fellow's father might be variously described in biographical dictionaries as a doctor, physician, MD, or Fellow of the Royal College. Entering the information into the Father's Occupation (F_OCC) category in the database exactly in those words would result in unnecessary proliferation of categories, but subsuming all of them under the category MEDICINE would make it impossible to use the computer to answer one of our most important questions: did Fellows' sons frequently become Fellows in turn? The problem was the basic one of interpretation of primary sources, made more acute by the possibilities and limitations of the computer.
Once students had collected information, the class periods we spent hammering out the variables were not exactly tumultuous but were at least a change from the dutiful nodding of earlier classes. For the first time I saw in my students signs of the creative frustration that always accompanies my own wrestling with this problem when I use computers. I made it clear that I would not simply tell them what variables to create or what values to use in interpreting data, since I had not myself seen the sources. They had to come to some agreement or the database would not work. They did this ultimately outside of class by dividing the list of questions among themselves, with each student responsible for creating a certain number of variables. We ended with twenty-seven in all, one of which was a COMMENTS variable for information that did not fit any other category. I was encouraged by this sign of cooperation and initiative and moved to the next step: teaching my students how to use dBASE itself.
My goal was to teach students just enough for them to get started entering data, rather than to give them a full course in dBASE. I had written a short manual which I had hoped was completely "user-friendly" and had given a copy to each student as well as to the user-support personnel. Unfortunately, only the best two students found it easy to follow, and like many people who use software frequently, I had neglected to explain many small points. Still, with individual coaching, students managed to create their databases and began entering data. By the end of the week, they had entered all their data and handed me their disks.
At this point, my class schedule fell apart. Unaware of any problems students had encountered with the data entry, I had intended to spend the next class period teaching advanced dBASE techniques for tabulation of data. On seeing the disks, however, I promptly changed my mind. There were too many problems with the research and interpretation of data for us to move on to the next step in computer instruction and from that point on I made those problems the focus of the course.
The problems were of three types, as I explained to the class the next time we met. The first was simply insufficient research. Many of the students had left out several pieces of easily obtainable information, such as the year that physicians became Fellows. Some had omitted a few physicians entirely. I had some sympathy for this: finding twenty-seven separate pieces of information about twenty-five different people was a difficult and often tedious task. Each student had done a reasonable job with his or her own database and, presumably, had concluded that just a few missing variables did not matter. Cumulatively, though, the effect was disasterous. I pointed out that their omissions not only made it impossible to answer any of the questions we had raised, but they also cast doubts on the integrity of the rest of their research. The omitted data had to be found for the project to work.
The second problem, like the first, came from students' forgetting, or perhaps not really believing, that their research was intended for anyone other than themselves. One student, for example, had abbreviated place names so that only she could tell what they were. Another had done an excellent job of printing out her own data, but had created her database using different variables from the rest of the class, so it could not be joined to the others. They had, I said, behaved like soloists rather than an orchestra, as though their individual contributions were more important than the joint final product. They had to remember that data entry was not the end of the project but only the beginning and that the point of the project was not to feed the data into the computer but rather to extract information.
The third problem was the most intriguing. Of the twenty-seven variables, the one students clearly preferred was COMMENTS because it saved their having to decide, for example, which category to use for Father's Occupation (F_OCC), or whether a Fellow was in Private Practice (PVTPRAC) or was known for his medical writing (abbreviated to Medical Author, field name MEDAUT). In may cases, I found that students had left those variables blank, but included COMMENTS which clearly indicated that, in the above example, a Fellow had been in private practice or was known for his books on medicine. Discussion of this raised an issue that had worried many students. How were they supposed to decide whether a Fellow was in private practice? Their main primary sources were biographical dictionaries, which never simply said, "Dr. So-and-So was in private practice and was also a medical author." Students in the seminar were afraid of being wrong, and as one said, preferred to leave other variables blank and put all the information in COMMENTS.
Difficulty in interpreting sources is a fundamental problem of historical research and I had naturally encountered it in other history courses. The advantage of the database is that it made clear how necessary interpretation was. I asked students who they expected to make the decision as to whether a Fellow was in private practice. Me? The computer? Some future quantitative historian who had no idea how the information was compiled? The essential task of the historian is to interpret the past based on available sources; we could not cut that out of the course and still call it a history seminar. Given that fact, surely it was best for the person who had actually seen the sources to be the one to interpret them.
As a result of the discussion, even the most worried of my students could see that the decision not to interpret was in fact an interpretation. We had set up fields for variables like Private Practice (PVTPRAC) so that they either had to be filled in or left blank, and no one wanting to use the database in the future would be able to tell the difference between a variable left blank because a Fellow was not in private practice and one left blank because a student had not done sufficient research or was afraid to make a decision. Putting the information in the COMMENTS field was no solution, because sooner or later someone was going to have to interpret what all those COMMENTS meant. Let it be sooner, rather than later, I suggested, and returned the disks for revision.
Discussing the problems, as it turned out, was not enough, for many of the gaps and inconsistencies were still there when I recollected the disks. During the class I did what I should have done earlier—I showed students, on the computer, what the combined database actually looked like, and, using dBASE commands like DISPLAY, LIST, COUNT, and AVERAGE, demonstrated why it would not work. We could not find out even a simple piece of information like the average age physicians became Fellows, because students had still not collected all the data. We could not determine whether Fellows' sons were in fact more likely to become Fellows, because students had not followed the standard categories in entering Father's Occupations (F_OCC). This and other examples had a definite effect on the students. Finally, the database became a tangible reality to the students. For the first time they could literally see what they were creating together, and for the first time could see what they had to do to make it work.
To all this, gratefully, there was a happy ending. The next time I collected disks they did collectively make a workable database, and one that yielded interesting results. Since we had no more time for computer instruction, I tabulated the data myself, rather than teach students how to do it. We then discussed the results in light of the questions raised in the course, such as the impact of fathers' occupations on Fellows' careers, differences between Oxford and Cambridge graduates, etc. I cannot claim that we answered all our questions about the connection between medical education and medical careers in Great Britain, but we certainly did elucidate the issue.
Next time I teach the course I will certainly do some things differently. I will leave more time for computer instruction, and rewrite my manual so it is more accessible to students. I will be more prepared and leave more time for the problems of research and interpretation of sources that students will encounter. In addition, I would like to include more explicit discussion of quantitative measures. Having created the database, students could usefully put the data into tables and calculate percentages. The class would also have been an excellent forum for discussing basic statistical concepts like mean, mode, median, correlation, and regression. And I would like to make more explicit connections between evaluation of documents and evaluation of quantitative evidence, perhaps by incorporating both into a final paper.
All in all, I considered the seminar a success which my students were relieved to hear. All said they thought they could use dBASE III Plus again, with some help; one student, in fact, has decided to use it for her senior project to create a database of Jesuit missionaries. All students also believed they had a better understanding of quantitative methods in history, including how much work is involved. Most satisfying of all, they were pleased to think they had succeeded in producing something new and original. As one student said, the best part of it was that once we put the data together we could find out all sorts of things we didn't know before.
—Lisa Rosner is an assistant professor in the historical studies program, Stockton State College, Pomona, New Jersey, where she teaches the history of science and medicine, and early modern European history.