Publication Date

February 1, 1989

Perspectives Section


AHA Topic

Teaching & Learning


Teaching Methods

“I’ve gotta use words when I talk to you.”

T. S. Eliot’s Sweeny was none too pleased at this obligation, but most historians accept it as a matter of course. Words, written or spoken, are our stock in trade. As researchers and interpreters of the past, we use words to convey our conclusions to others; as educators, we encourage our students to choose words with care and express themselves with clarity and precision.

From time to time, though, some of us must use numbers along with or instead of words, and here the traditions and conventions of our craft provide less guidance. As recently as ten or fifteen years ago, some members of the historical profession looked upon quantitative history as an unwelcome intruder into the curriculum (“not really history at all,” as one colleague put it), an amalgam of sociology and econometrics which really belonged in some other department. Today, the proponents of quantitative methods are regarded with more respect and toleration by their colleagues. Their works have received some of the highest honors the profession can bestow, and whole journals have been created for disseminating the results of their research. Even so, some problems remain.

If we describe someone who cannot read words as illiterate, there should be an equivalent term for one who cannot understand numbers.

As a teacher and sometime practitioner of quantitative history I have often encountered a condition that can best be described by the clumsy term innumeracy. Despite the awkwardness, its meaning should be clear: If we describe someone who cannot read words as illiterate, there should be an equivalent term for one who cannot understand numbers. To be fair, I must introduce a further neologism and suggest that many of the individuals I have in mind—a substantial majority of the students I encounter, and a somewhat smaller majority of professional colleagues—are actually seminumerate. Able to read and write simple numbers, they have considerable difficulty explaining or interpreting numerical information. When they encounter a statistical table in a historical study, they are inclined to turn the page. Reading an author’s numerically-based conclusions, they are more likely to “suspend their disbelief,” ask fewer questions, and offer less criticism than they would if the author had been using traditional, non-quantitative evidence.

Should this be a cause of concern? The historical profession, like every other branch of academia, has of late shown a tendency towards specialization and subdivision (perhaps, indeed, to subsubsubdivision). Why shouldn’t quantitative history follow the same path? Let the quantifiers remain a subdiscipline unto themselves and if, like the fabled Lowells and Cabots, they end up speaking only to one another or to God, will the rest of us be any poorer for that?

The answer, I am afraid, is yes. Regardless of specialization, historians of all lands and centuries are constantly coming up against problems that can only be understood through the use of numbers. To mention but a few examples, how can we hope to interpret elections, famines, or social movements without a clear understanding of numerical evidence? Can we assess the successes and failures of the New Deal, or the horrors of the Middle Passage, or the actual strength of a purported “moral majority” without examining statistics?

I do not mean to suggest that numerical evidence is in any sense superior to the written word, or that it can magically answer all the questions a historian might ask. On the contrary, no historian should ever forget J. H. Hexter’s warning, “Few statistics are more pathetic or less useful than the ones that render intelligible a course of events that did not happen.” Statistical sources must be scrutinized, criticized, and dissected with no less care than historians would give to memoirs, consular reports, or medieval sermons. My complaint is that at the moment too few historians are making this attempt. Should we, then, be enrolling ourselves or our students in remedial courses in statistics? Not a bad idea, but it may not solve all our problems. What is most needed, as I see it, is not mathematical proficiency but critical judgment. The problems that are likely to prove most vexing to historians, or for that matter to readers of this morning’s newspaper, are often arithmetical, requiring simple deductive logic or that old standby, common sense. Before we tackle chi-squares and multiple regressions, we need to learn to think about numbers.

I will go further and suggest that other behavioral scientists might share this problem, might even learn something of worth from historians. In these days of software packages and instant computing, numbers sometimes seem to take on a life of their own. Data can be entered and crunched, and a millisecond later we receive “output” in the form of scores, coefficients, and probability ratings. Unfortunately, such “results” are only as reliable as the original evidence, and statistics, like words, are human artifacts. The numbers that historians (or economists, sociologists, or the rest of the behavioral pack) must use are created and compiled by fallible human beings. They are not a reality in themselves, but a representation of reality, and frequently a poor one at that. To find out what they mean we must ask questions, compare one set of numbers with another, introduce hypotheses. We must know who compiled a particular set of statistics, and for what purpose. What is the possibility of error or deliberate fraud? What other conclusions, apart from those of the compiler, can be drawn from a given body of evidence? These are precisely the kinds of questions that historians have been asking about written sources since time immemorial; we must now learn to apply them to numerical evidence.

The questions seem easy and obvious, but not so much so that any beginner will instinctively ask them. To encourage students in this direction I began several years ago to include short statistical assignments in my introductory history courses. For example, a course on the Industrial Revolution, designed for first- and second-year undergraduates, began with a questionnaire asking students to predict the direction of change for a number of social and economic indicators in Britain: the number of workers in cottage industry, 1770–1830; the number of agricultural workers, 1750–1850; the number of draft animals; the size of the average household; the age at marriage; the rate of infant mortality. In most of the examples chosen, available historical evidence runs counter to students’ intuitive predictions. (The number of cottage weavers, for example, increased dramatically after the invention of mechanized spinning equipment, and the number of horses used in cartage increased with the growth of railroads.) At intervals throughout the following semester I introduced statistical tables for discussion, encouraging students to read them as critically as they would read other examples of historical evidence, such as parliamentary reports or workers’ memoirs. I encouraged them to ask not just “Why did things happen this way?” but “How do we know what happened? What do the numbers mean?”

In one such exercise I distributed two tables showing the occupational breakdown of the British workforce between 1801 and 1851—one giving the absolute numbers of workers in various categories and the other indicating their relative distribution. Students were quick to point out the rapid growth in mining, manufacturing, trade and transport, and the abrupt eclipse of agriculture, whose share of the workforce declined from 36 percent to 19 percent. They barely noticed, however, that the number of workers in agriculture continued to grow throughout this period, from an initial 1.7 million to roughly 2 million by 1851. Pointing out this apparent anomaly, I asked students to ponder the implications: Could industry be said to have grown at agriculture’s expense? How many British workers were forced off the land? What could account for the differences between absolute and relative magnitudes? Could the statistics be misleading? Is the size of the paid workforce an adequate measure of the agricultural population as a whole? Might the definition of an agricultural worker have changed during this period? Do available sources indicate whether family members who were employed on a casual or seasonal basis were counted in the totals? Not all of these questions could be answered from the data at hand, but by asking them the students acquired a different appreciation of the evidence and its problems.

Statistical examples need not be confined to economically-oriented topics. In a course on the Russian revolution I include a unit on the family backgrounds of party activists; in a modern Europe course I analyze voting trends in Nazi Germany. In each case I try to present the evidence, not in a predigested form with unambiguous conclusions, but as a problem for critical analysis. When preparing examination questions, I always include at least one statistical problem along with more traditional questions; students are asked to interpret a table or diagram, considering its shortcomings as well as its implications.

Numbers don’t speak for themselves. You have to torture them to make them talk. A single set of statistics, manipulated with sufficient ingenuity, can be made to yield answers to many different questions, or even to offer different answers to the same question.

After a number of experiments along these lines, I have also introduced an undergraduate course entitled, “History By Numbers,” the purpose of which is to demystify numbers and enable students to read them intelligently in historical context. The one-semester course concentrates on a series of problems and controversies from several different fields and periods of history: the “Storm over the Gentry” in Tudor and Stuart England; quantitative studies of the standard of living during the Industrial Revolution; Time on the Cross and its critics; Peter Laslett and Lutz Berkner on household composition in early modern Europe. Jack H. Hexter, “Storm over the Gentry” in his Reappraisals in History (London, 1961); E. J. Hobsbawm, “The British Standard of Living, 1750-1850,” Economic History Review, 2nd series, Vol. X; R. M. Hartwell, “The Rising Standard of Living in England, 1800-1850,” Ibid., Vol. XIII; P. Laslett, “The Structure of the Household in England over Three Centuries,” Population Studies, 23 (1969); L. Berkner, “The Stem Family and the Developmental Cycle,” American Historical Review 77 (1972). Through a series of exercises based on a single data set, students are introduced to basic vocabulary and techniques of measurement: mean, median, measures of distribution, cross-tabulation. Behind all these tasks is a list of basic themes and questions:

  1. Where do numbers come from? Human fallibility has already been mentioned, and many readers will recall Disraeli’s dismissal of “lies, damned lies, and statistics.” A careful reader should certainly be on the lookout for bias in numbers, suspicious of data that fit too perfectly with their authors’ conclusions. A classic example is wartime casualty figures, which have a tendency, especially when prepared for public consumption, to overstate an adversary’s losses and understate one’s own. Deliberate exaggeration and misrepresentation are not, however, the only problems a student or researcher will encounter. Information may be gathered or presented in ways that impart a mere subtle and unintentional bias to the results. Statistics on crime or public health, for example, will be affected by the number of police or inspectors who are reporting them, and by the public’s disposition to cooperate with authorities; any change in either of these may produce a spurious trend in the data. Historical reports on a country’s imports and exports may be distorted by changes in the incidence of smuggling, which in turn may vary with the level of excise taxes. Public opinion polls may be slanted toward one or another segment of a population—subscribers to a particular magazine or newspaper, households with telephones, individuals fluent in the interviewer’s language.
  2. Numbers don’t speak for themselves. You have to torture them to make them talk. A single set of statistics, manipulated with sufficient ingenuity, can be made to yield answers to many different questions, or even to offer different answers to the same question. A newspaper story last year reported that high school students’ use of cocaine had risen by “a mere 1 percent” between 1985 and 1986; a few days later a reader pointed out that this was actually a rise from 4.8 percent to 5.85 percent of the entire high school population, and that the rate of increase in the number of users was therefore more than 20 percent. Both figures were arithmetically correct, but the urgency of the problem was conveyed quite differently by the reader’s addendum. Researchers and readers must learn to live with ambiguity, and to think about the multiple meanings of their data. The problem is not “How to lie with statistics,” but how to recognize the multiple truths that statistical evidence may contain. In one of the assignments in my course, students are asked to compare tables from two censuses thirty years apart, showing the age-sex composition of one city’s population. Because of changes in in-migration, fertility, and mortality, the overall ratio of women to men in this population changed dramatically during these years, and students are asked to pinpoint the most important trends. The point of the assignment is to see how many different calculations can be made, and how many different conclusions can be supported, using the same body of data.
  3. You can’t always get what you want. Quantitative historians, like others in the profession, are often unable to find a source that directly addresses their topic. Sometimes the object of attention is too broad and vague to be measured (e.g., a standard of living, a revolutionary mood). Sometimes it is a subject the chroniclers of earlier centuries overlooked. In either case we wind up making substitutions, trying to answer our questions with evidence that was compiled for some other purpose. Eric Hobsbawm used statistics on the numbers of animals slaughtered at Smithfield, for example, as an indirect indicator of diet, and hence of well-being, of London’s population during the Industrial Revolution. Robert Fogel and Stanley Engerman used probate records of slaveowners’ estates as a basis for calculating slave mothers’ childbearing patterns. Such ingenuity is commendable but it can also lead a researcher astray. In our eagerness to announce what the data mean, we too easily forget what the sources really say. (If the probate records, for example, show a slave mother with children aged nine, seven, and five, does that mean the nine-year old was the woman’s first-born? (R. Fogel and S. Engerman, Time on the Cross (Boston, 1974), I:129:138, II:114-115; cf. H. Gutman, Slavery and the Numbers Game (Urbana, 1975), 150-152.) If Smithfield’s output of beef and sheep did not keep pace with London’s population, must we conclude that Londoners were eating less meat?) Here too a careful critic will scrutinize statistics for ambiguity and possible contradictions.
  4. Even simple effects have multiple causes. The most common problem here is tunnel vision. Preoccupied with our own hypotheses, we may forget the complexity of the real world. When I ask students why, in many populations, the number of widowed females is conspicuously greater than the number of widowed males, the usual response is that women live longer<197>a correct answer but an incomplete one. Only with considerable prompting do the students remember that widowhood is defined not just by mortality but by remarriage, and hence by social mores. In most modern societies males have shown a greater propensity to remarry, often to women younger than themselves; as a result the ratio of widows to widowers is often greater than the ratio of elderly women to men. To use numbers intelligently we must try to imagine all the factors, including apparently extraneous ones, that could combine to produce an observed result.
  5. Beware of misleading aggregation. The critical reader will also recognize that multiple causes and trends are more easily forgotten in summary statistics. In particular, the easiest way of summarizing a body of numbers—the “average” or arithmetic mean—often hides more than it reveals. Per capita income can rise, for example, even though most people’s wages decline or remain unchanged. All it takes is a sufficiently large increase in a few individuals’ earnings.

Some of the most dramatic examples of misleading averages come from the realm of demography. Students are usually astonished to find the death rate (more properly, the Crude Death Rate) in economically less developed countries to be lower than in industrially advanced ones. Taiwan’s rate in 1966 was 5.36 deaths per 1,000 population, as compared to 11.23 per 1,000 in the United Kingdom. (N. Keyfitz & W. Flieger, Population: Facts and Methods of Demography (San Francisco, 1971), 382, 472.) One is tempted to conclude that Taiwanese were twice as healthy as Britons, or lived twice as long, but neither of these hypotheses can be sustained. The essential point is that one number cannot adequately describe an entire population’s mortality. Taiwan turns out to have had higher per capita death rates than the United Kingdom in every separate age group. Because of its higher birth rate and rapidly growing population, however, roughly 60 percent of Taiwan’s population was between ages 5 and 35, as compared to 40 percent in the U.K., and young Taiwanese do have lower death rates than elderly Britons. The two nations’ overall death rates are a reflection, not just of health or longevity, but of the age distribution of their populations.

Some of the most dramatic examples of misleading averages come from the realm of demography.

What should one conclude from this litany of fallacies and potential errors? Some members of the profession will maintain that numerically-minded colleagues are worshipping false idols and forsaking the time-honored principles of historical investigation. My own response is more temperate: I accept the possibility of quantitative research as intellectually sound and rigorous despite the hazards. If the sources are never fully adequate and the conclusions never fully “proven,” this hardly sets quantifies apart from fellow historians or from any other students of the human condition.

At the same time, in arguing for a fuller integration of quantitative sources and methods into the history curriculum, I have a broader agenda in mind. I know that few of my students will go on to study history in a systematic way, but I also believe that quantitative reasoning should not be regarded as an esoteric skill. A day does not go by without some journalist or politician bidding for attention or support on the strength of statistical claims or calculations. (As I write these words, residents of my city are being told that freight trains carrying hazardous materials through populated areas are statistically safer if they travel at higher speeds.) I would like my students, as citizens in a democratic polity, to be able to evaluate these claims. Here, surely, is a case where a clearer understanding of the past can promote clearer thinking in the present.

R. E. Johnson is an associate professor in the history department at Erindale College, University of Toronto, and specializes in Russian history.