Computers and Software

Historians and the Web: A Beginner's Guide

Andrew McMichael, Roy Rosenzweig, and Michael O'Malley, January 1996

A funny thing happened on the way to writing this article. We decided to start with some history and some definitions and began to assemble this material from traditional sources—the dozens of manuals and guides to the World Wide Web and the hundreds of newspaper and magazine articles that explain it to general and specialized audiences. But then we realized that it might make more sense to use the Web itself. We pointed our "Web browser" (Netscape 2.0) to what is technically called a Universal Resource Locator or URL (really just the address). This URL takes us to the so-called "home page" of the World Wide Web Consortium, which is physically located at MIT. (A "home page" is the "front door" through which people enter a collection of information—or links to information—that someone has placed on the Web.) By clicking our mouse on the phrase "About the World Wide Web," we immediately found ourselves in a hypertexual universe of definitions, histories, concepts, time lines, and bibliographies (and even an online seminar) much richer than we had developed from our original sources and much richer than we can present here. We urge to you to visit.

What is World Wide Web and How It Works

The Web is really "many things," including a concept, a set of protocols, a body of available software, and a web of information. The fundamental Web concept is of "a seamless world in which ALL information, from any source, can be accessed in a consistent and simple way." At its basic level, the Web is just a set of linked computers. Computers can display pictures as well as text, and play movies and sound. But until recently, there were only limited ways to handle anything but text on the Internet. The World Wide Web solves this problem. It lets you send and receive multimedia in an intuitive, user- friendly environment. Moreover, it avoids the hierarchical structure inherent in systems like "gopher," which requires users to follow a long series of sub menus to reach each piece of information. On the Web, users view information graphically and can move from one set of information to another (even if located on a computer thousands of miles away) by pointing and clicking a mouse.

Although this hypertext and multimedia vision has many earlier roots, the most immediate source for the Web was Tim Berners-Lee, a young computer scientist at the European Particle Physics Laboratory (CERN) in Geneva, who—way back in the 1980s—was looking for a way for physicists to share information easily. But to move from a small number of physicists sharing data to a global information system, you need, first of all, a worldwide communications network. As everyone who uses e-mail or even reads the newspaper knows, that communications network—the Internet—gradually fell into place in the 1970s and 1980s. Still, if you want those interconnected computers to be able to do more than exchange computer files and e-mail messages—to become a "seamless" and "multimedia" information world—you need more sophisticated ways for them to talk to each other. That's where the "protocols" and "software" come in and where the Web differs from purely linear and textual means of exchange. One protocol (URL) establishes consistent addressing across a vast number of different computers. Another, the Hypertext Transfer Protocol (HTTP), permits those computers to very quickly exchange a wide variety of different types of data, including pictures, sounds, and films. Suddenly, the purely textual world of e-mail and gopher is enlivened with pictures, sounds, and movies. And a third, HyperText Markup Language (HTML), is the lingua franca of the Web, which not only allows documents to be displayed on different systems but also makes links to other documents and resources. It means that you can click on a highlighted word in one History HTML page that resides on a computer in Fairfax, Virginia and instantly retrieve a page in Hong Kong.

To "read" the resources residing on the Web you need software, particularly a "Web browser." Although physicists and some other scientists had begun using the Web in 1990 and 1991, it did not move into the mainstream until 1993 when the National Center for Supercomputing Applications (NCSA) at the University of Illinois released Mosaic, an easy to use, graphical browser that ran on most standard computers. A year later, in the rapid pattern of commercialization (with initial funding most often provided by the Defense Department) that has characterized the computer industry, some of the designers of Mosaic created the Netscape Communications Corporation, whose browser "Netscape" seems to be emerging as the de facto standard for the Web.

But this sophisticated hardware and software infrastructure is essentially worthless without some information to share. Almost overnight, however, a vast body of information has begun to come "on-line"—first posted by individual enthusiasts and educational organizations and now increasingly by large corporations. In mid-1993, there were only 130 Web "servers" (computers holding "home pages" and associated documents), two years later there were 22,000. By the time, you read this, there will surely be two or three times as many.

Historical Sources on the Web

The explosion in Web sites has brought with it an explosion in materials relevant to historians. These range from political documents like the Declaration of Independence to famous speeches, to entire books, like John Locke's Second Treatise on Government or Horatio Alger's Ragged Dick. With few exceptions, copyright law restricts primary sources to documents in the public domain. The number of primary sources available is also largely limited by the money and human labor required to enter them into digital form. But the range of sources increases every week. Some notable sites for primary sources include the Electronic Text Center at UVA, with a large and surprising collection of texts in modern English, or the Mississippi State University site on American History.

Other sites organized around specific subjects include the excellent "Anti- Imperialism in the United States", the multidisciplinary "Eighteenth Century Studies Site" at University of Pennsylvania, and "The Labyrinth", located at Georgetown and dedicated to medieval studies. The Library of Congress, through its "American Memory" project, has a wonderful collection of Civil War and FSA photographs as well as a few early silent films. On-line databases include the eclectic Social Sciences Data Collection and Edward Ayers' much more focused "Valley of the Shadow" project at University of Virginia, which includes an impressive range of sources: searchable census data, Army rosters, maps, newspapers and diaries on everyday life during the Civil War in Staunton, Virginia and Chambersburg, Pennsylvania. But though these sites offer a good view of the kinds of primary sources becoming available, extensive primary research in history is not yet possible, except perhaps in more contemporary subjects. For example, the full texts of the Congressional Record for the 104th and 105th Congress are on the Web. You can submit a query and find out where that subject appeared in Congressional debate. But at this point, the Web's best uses are probably in the classroom.

Web-based document collections could easily replace the traditional photocopied packet of supplementary readings, with the added attraction of provided sound, graphic images, and even film. For example, a Web-based syllabus for a freshman survey course might include links to some of the assigned texts or supplementary readings (either items already on the Web or newly scanned in), plus links to additional recommended readings, maps and other graphic images. The section on the Civil War might offer connections to the "Valley of the Shadow" project or to "Letters from an Iowa Soldier in the Civil War", maintained at University of California at Santa Cruz. The syllabus might allow students to query the professor or students directly by posting a message to the Web page containing the syllabus, much like a computerized bulletin board.

These links could take students not just to the document itself but to a whole different intellectual context. Clicking on a picture of John Locke might take the students to a Libertarian Party Web site where the Second Treatise has been digitized. Clicking on the phrase "Bacon's Rebellion" might take the students to George Welling's From Revolution to Reconstruction, an experimental survey text being assembled at the University of Groningen in the Netherlands. Welling took a public domain, textbook, An Outline of American History, scanned it, and put it on the Web. (This volume was first issued by the United States Information Agency in 1949 under the editorship of Francis Whitney and has been periodically revised over the years.) His team added a wide range of primary documents, so that students reading about the Declaration of Independence can choose links to the document itself, to Jefferson's first draft, to biographies of relevant figures, and to a bibliography of secondary texts on the Declaration. Welling's ongoing project has many gaps and welcomes contributions. In what Welling describes as an experiment in "collective authoring," the original text is gradually being broadened and enriched by a wide range of contributions from different scholars and students. Eventually, it might provide an alternative to the traditional survey textbook, a way to combine primary documents with a range of interpretations.

We should note, however, that while many surprising and useful things lurk somewhere on the Web, much of what has appeared is either stodgily "canonical"—heavily biased towards male political figures—or impressionistically spotty. Social history, for example, hasn't yet made much impression on the Web. The number of primary sources available is also limited by copyright restrictions and by the money and labor required to enter them into digital form. These costs seem to be reflected in the conservative or libertarian character of much of what is currently available.

Secondary sources are considerably less plentiful, in part because of copyright and copying concerns. A few sites (e.g., University of Rochester and the Voice of the Shuttle page of the University of California, Santa Barbara) include student papers. Central Connecticut State University includes primary and secondary sources on world history and "the struggle for social progress." In addition, a number of scholarly journals have begun to appear on line either in full-text—e.g., Essays in History or just with their tables of contents as with the American Historical Review. Project Muse at Johns Hopkins University has begun a more ambitious effort to put the full-text of a number of scholarly journals on the Web. At the moment, however, many scholars are reluctant to "publish" their work on the Web when it would impair their chances for print publication, and a number of journals are worried that on-line publishing will diminish subscriptions.

So far the most common history sites are assemblages of primary documents and links to other historical resources. The Naval Ocean Systems Center has compiled a list of museums and archives from around the world. There is a relatively comprehensive guide to online museums. Perhaps the largest and most extensive of these archive centers is the University of Kansas, which maintains an extremely comprehensive database of historical resources organized both alphabetically and by era. Carnegie Mellon University has a list on History and Historiography; Miami University of Ohio keeps a list of Archives and Archivists. Although many sites seem to contain nothing original, gathering all of these sources together in one place is quite valuable, consolidating sources into one easily searchable site.

Some pages focus on particular topics—most commonly wars, at the moment. See, for example, the Korean War Project,the "50 Years Ago" "Cybrary of the Holocaust". Louisiana State University maintains an extensive database on the Civil War, as does the University of Tennessee. Because of the popularity of the Civil War, many pages are maintained by hobbyists. Personal accounts of the Vietnam War are also beginning to appear on the Web, and there is even a site devoted to research on the French and Indian War.

How to Use the Web

Searching the Web for information is facilitated by programs called search engines. Like an electronic card catalog in a library, these programs allow the historian to enter a keyword and click on a button to begin the search. There is a searchable database of Web sites at Yahoo containing a catalog of over 30,000 Web pages. Many programs are capable of searching only one part of the Internet; restricted to titles, certain archive sites, or parts of libraries. Rather than require a researcher to wander aimlessly, unsure of how to search or which engine to use, the University of Geneva has combined several search engines onto one page, and these are condensed into an easy format at George Mason University's Instructional Development Office. Search methods, however, are limited by the content of the databases. For instance, the Yahoo index is not a complete list of all Websites, only those entered by the maintainer of the Yahoo list. Interlinks is a powerful search engine, but is restricted to areas the author feels are worth searching. As with any library or archives, historians must search with caution, and understand that the Web has vast resources which are found only through patient exploration.

The Web provides not only a resource for doing history in the sense of research, writing, and teaching; it can also facilitate our professional work and communication. Already, there have been some tentative first steps. The Crossroads project at Georgetown University is setting up an extensive Web site for the American Studies and the American Studies Association, and the Center for History and New Media at George Mason University is working with the AHA on launching its home page. A few dozen history departments (at least 45 as of July 1995) have set up their own home pages. (For a comprehensive listing go to History Departments Around the World) These pages include such things as course guides and syllabi, departmental requirements, biographies (and pictures) of faculty members, information on graduate students, lists of doctoral and M.A. theses in progress, application information, departmental histories, lists of doctoral degree recipients, and departmental news of hirings, promotions, awards, and faculty work in progress as well as myriad links to historical and teaching resources. Browsing the Web thus becomes a way for students to consider possible graduate programs or for faculty members to find out what their colleagues around the world are researching or teaching.

Many historians can gain access to the Web through their school's Internet connection. The easiest connections generally are made from an on-campus terminal. Remote (dial-in) access requires either a direct SLIP/PPP account or a program that emulates SLIP like TIA—The Internet Adapter. (SLIP, which stands for Serial Line Interface Protocol, offers a more direct connection to the Internet than a conventional e-mail account.) Where these are unavailable, you can visit the Web without graphics through a program called Lynx. Although using the Web without graphics may seem counterintuitive, Lynx allows the user to retrieve information and save text and pictures in a file format at a much greater speed. You will probably want to talk to your campus computer center for help in getting Web access. If Web access is not available on campus, you can turn to one of several commercial services—either a direct Internet provider like Delphi, Netcom, or Digex or one of the commercial services like America On-Line, Compuserve, or Prodigy. Setting up your own Web site or home page requires some knowledge of navigating the Internet, but not much. (See "Creating Your Own Web Page") The first requirement is getting space on a campus computer, preferably a mainframe or a large workstation.

Some Limitations

It is always easy to get swept up with the latest techno-enthusiasm. Newt Gingrich to the contrary, the Web isn't going to solve any fundamental social problems or even more narrowly professional problems like inadequate support for scholarship and teaching. Moreover, it has some intrinsic problems of its own that may hamper or at least limit its usefulness in the next few years.

One obvious issue is getting access. We face the real prospect of a society of information "haves" and "have-nots." Although the cost of computer equipment has dropped dramatically, ownership of computers is still sharply skewed by economic status. Moreover, you may also need to pay for access to the computer networks on which the Web resides. Even those with free access through their schools may soon find themselves faced with online charges. You can always purchase Web access from one of the commercial vendors, but that means that your monthly Internet bill will start to rival or exceed your monthly phone or cable bill. Will we be able to require that our students do research on the Web when they must pay by the hour?

A related threat comes from the rapid commercialization of the Web. Already, vendors are charging for access to particularly desirable databases, and that practice is likely to grow. Those who "own" information, moreover, are likely to more firmly assert their property rights (e.g., copyright) in an arena where such matters have been loosely regulated in the recent past. The alternative to paying for information could be watching advertisements while you undertake your search. The brief "free" era of the Web is ending.

Another issue is speed. The Web's pictures, sound, and films are a wonderful advance over the "flat" world of e-mail, but that multimedia environment comes at a price. For those using the Web via dial-in modems rather than direct wire connections at their universities, sound and movies are, at the moment, novelties rather than practically useful. With a 14.4 baud modem, it takes two and a half minutes to download a 51-word sound clip in which computer guru Donald Norman says that one of the "bothersome" things about the Web is that "it's slow. Very slow."

Third, the delightful anarchy of the Web also comes at a price. In a world in which anyone can "write" and "publish" their own history in a matter of minutes, the conventional controls on the quality of that history break down. If you use Web Crawler to look for material on FDR, you will get 49 "hits." But how can you—or more seriously your students—distinguish the good from the bad? Will they be submitting papers based on the American Freedom Coalition home page, which suggests that all the "problems we as a nation face today ... can be traced back to a single point of origin: The National Emergency in Banking Relief Act of 1933 inacted [sic] by Congress and signed into law by Franklin D. Roosevelt." After all, one of the most popular features of the Web—Mirsky's Worst of the Web—celebrates mediocrity.

To be sure, the problem of distinguishing good and bad sources of information exists in other media as well. But it may be that the difficulties are even more severe in the on-line world. On the one hand, it is considerably easier to publish on the Web than in a magazine or book. On the other hand, the reliance of many students on computers to execute tasks from completing a calculus problem to cooking toast may give the machines an exaggerated air of authority. It is possible that a new generation of students views electronic information as more reliable than information from print and video sources.

In some ways, the promise of the Web is simply the flip side of these problems. Although access and speed may be inadequate, the Web also has enormous advantages over traditional research: high school students in Montana or graduate students in Hungary can tap into resources at the Library of Congress or the University of Virginia that simply would not be available to them in any other way. When more information—and more reliable and useful information—comes online, the advantages will multiply even further. Will there be any need for the classroom documents collection when thousands of primary documents are only a few clicks away?

And the anarchy of the Web opens up the possibility for a more democratic and less hierarchical information world—if we manage to resist the corporate colonization of cyberspace. At least at the moment, this is an arena in which students can carve their own distinctive pathways through vast bodies of information. And it is also an arena where everyone can quite literally become his or her own historian—writing and publishing the history of their family or community for whomever will listen. Michael Frisch has persuasively argued that historians need to learn how to "share authority" with their audiences; the Web may be one place where that will actually happen.

Creating Your Own Web Page

Mark-up Language
Graphics
HTML Editors
Shareware Converters

Establishing a "home page" takes some knowledge of how to navigate around the Internet, but not much. It's also extremely easy, given a small investment of time. The first requirement is getting space on a campus computer, preferably a mainframe or large workstation that runs HTTP software. This is where your Web pages will physically reside. Space on campus computers may be tight, but many universities now set aside space for Web pages. Once you've found appropriate computer space, create your homepage by editing a new file and naming it "index.html." Index.html is the most common name for a home page, and it tells the computer that the file is the primary page for a directory. All files you add to your Web site should be saved with the suffix ".html", which tells the web browser that the document is in "Hypertext Markup Language."

Mark-up Language

The Web uses this standardized "mark-up" language to tell "Web browsers" (programs like "Netscape" or "Mosaic" or "Lynx") what to do. HTML uses a set of standard codes to format text. For example, to italicize text, you "tag" it as follows:

<I>This will be italicized</I>

A Web browser like Netscape reads these tags, and displays: This will be italicized

For boldface, the tags look like this:

<B>BOLD</B>

The characters "<" and ">" signal the "Web browser" that an HTML instruction is beginning; the backslash ("/") before the second "I" or "B" in the examples above signal that the instruction is ending.

To make a "link"—that is, to get the "Web browser" to open another file, you use the tags <A> and </A>. The link back to one of the authors' home pages looks like this:

<A href="index.html">Back to Page One</A>

What follows the close bracket is what will be available for the reader to click on. In the example above, the file name is "index.html", and the reader sees "Back to page one" either underlined, in a different color than the rest of the text, or both. The Web browser interprets clicking on "Back to Page One" as an instruction to open the file "index.html." Alternatively, the link might give the URL of another Web site as the file name (for example, Voyager Company), in which case clicking would take you to another site altogether. There are several dozen other HTML tags for formatting text and graphics.

Graphics

Graphics are a bit more tricky. You create an image yourself, using a graphics program like Adobe Photoshop. Or you can scan an existing image, perhaps a photograph of Victoria Woodhull, into digital form. Save the image in the "GIF" format. Transfer your "GIF" file to the computer where your files will reside. When you want to call up the image, you make a link to it in your HTML text. In effect, you tell the Web browser to "display this file." A simple graphics link looks like this:

<IMG SRC="woodhull.gif">

This tag will place your image of Woodhull on your page each time the page is accessed. You can also make a link to a picture on someone else's' page. In that case, the file name would simply be that image's URL.

Be warned, however, that images can come at a high cost-both in terms of physical space on the mainframe computer, and in terms of time. Large files, meaning images saved at high resolution, can be agonizingly slow to appear. Frequently they're not worth the wait. In writing your Web pages, it's often better to post a smaller, low resolution version of the larger graphic. Such a link looks like this "<A href="bigpicture.gif"><IMG SIZE=50 WIDTH=50 BORDER=1 SRC="smallpicture.gif"></A>." In an ordinary text link, the text between the brackets is what the viewer sees to click on. So too in a "nested" graphic link like this. The viewer sees a small image, of designated height and width (SIZE=50 WIDTH=50). The thin border (BORDER=1) shows the viewer that the image is a link. Alternatively, simply give a description, for example, "Click here to see O'Sullivan's photograph." That way, your page won't be slowed down by spurious images."

Fortunately, with most Web browsers you can easily see how a particular page was written. In Netscape, for example, you choose "Source" from the "View" menu. This shows exactly how the file you're reading was marked up—all the HTML tags and file names. The "Source" command offers one extremely easy way to learn HTML by example.

HTML Editors

There are also tools to make writing HTML easier. "HTML Editors", many either freeware or shareware, let you compose HTML files easily and preview them as you go. One typical HTML editor for the Macintosh is called "HTML Web Weaver." "Web Weaver" simplifies the mark up. You can write as you would in a word processor. To make something italic, simply highlight it and choose "Italic" from a menu. The editor pastes in the appropriate tags. To make a link, simply highlight the words you want people to click on, choose "anchor," and it does the rest. Most HTML editors allow you to preview your work as you go, without being linked to the Net. That is, you can write all your documents, preview them, making sure the links work and the graphics are right, then send the whole set of files to the campus mainframe and the appropriate directory. Voila, the site is on the Web.

Shareware Converters

There are many different HTML editors available. You can get them through the Internet software archives (e.g., mac.archive.umich.edu; sumex-aim.stanford.edu; ftp.hawaii.edu; wuarchive.wustl.edu) with an FTP program like "Fetch." There are also programs that can convert existing word processor files into HTML format. A shareware program called "RTF-HTML," for example, converts files saved in Microsoft's "RTF" (interchange) format into HTML files. There are a number of these converters as well, also available through software archives. In addition, the new version of Word Perfect (3.5) will include HTML tools, and this is likely to become a common feature of high-end word processing programs.