Digitization of the Yale Daily News Historical Archive

The following chapter excerpt is from the third section of Digitization in the Real World; "The Digital Campus: Digitization in Universities and Their Libraries." Download the entire chapter for free (PDF) or purchase online at Amazon.com.

View the collection here.

Author

Kathleen Bauer, Ian Bogus, Karen Kupiec (Yale University Library)
Jennifer Weintraub (UCLA Library)

Abstract

2010-09-13_1114 The Yale Daily News is Yale University’s independent student run newspaper. Founded in 1878 it is the oldest continuously published daily newspaper at a United States university.  From the initial print volumes until digital versions started in 2000, the entire run of the printed paper consists of 122 volumes and approximately 100,000 pages. In 2007, Yale University Library was asked to create a pilot project to digitize and make available an initial set of ten years of the newspaper with a $50,000 start-up budget. In this article, we will discuss how the project began, and issues that developed during the process related to copyright, interface design, workflow, quality control, and fundraising. This project helped Yale University Library, a large, strongly hierarchical institution, to develop workflows that allow its staff to develop new skills and work across traditional departmental boundaries. Library staff that have traditionally performed tasks related to our print collections or for smaller digital projects have developed new skills and methods for workflow, metadata creation and quality control for a large-scale digital project. 

Introduction

A newspaper digitization project is one that every library, public or academic, can undertake. It is often not hard to get the rights to a local or small paper and an academic or public library has a built-in audience for this type of project. Local researchers will love having it online and genealogists from further afield will bless you.  And yet, newspaper digitization, while having recently come into its own, has been somewhat difficult for libraries. Newspapers are crucial to research, providing detailed local and international accounts of events; these incredibly important primary source materials are made of poor quality material that will last a relatively short period of time. Newspapers are hard to digitize because they are published daily with hundreds of issues a year, comprised of various individual sections, and then individual articles, oversized, delicate, and contain thousands of words and pictures that require careful quality assurance. In addition they have unusual layouts, and articles often are split across two or more nonconsecutive pages. There can be numerous contributing authors, syndicated cartoons, advertisements, supplements, and even the occasional joke issue.

Fortunately, newspaper digitization is not new. Many organizations have taken on newspaper digitization and the major national and regional newspapers are now available for licensing by libraries. While many projects focus on digitization of newspapers from microfilm, there is also an increasing number of digitization projects that begin with the original paper. One important clearinghouse for information and best practices for newspaper digitization is the National Newspaper Digitization Program (The Library of Congress, 2009). This program, a joint effort between the Library of Congress and the National Endowment for the Humanities uses the power of grant dollars to enable proper newspaper digitization, research in newspaper digitization and access to the digitized papers through a central resource.

The Yale Daily News, Yale University’s student run newspaper, is 132 years old and is the oldest continuously published daily newspaper at a United States university. In 2007, Yale University Library (YUL) was asked to create a pilot project to digitize the newspaper with an initial $50,000 start-up budget provided by the Yale Daily News’s parent foundation (the Oldest College Daily Foundation) and YUL. In this article, we will discuss how the project began, and issues that developed during the process related to copyright, interface design, workflow, quality control, and fundraising.

This project helped YUL, a large, strongly hierarchical institution, to develop workflows that allow its staff to develop new skills and work across traditional departmental boundaries. Staff across the Library who have traditionally performed tasks related to our print collections or for smaller digital projects have developed new skills and methods for workflow, metadata creation and quality control for a large-scale digital project.         

The Yale Daily News (YDN) is staffed and produced by student volunteers. The paper is not owned by Yale University, and the student reporters and editors are advised by the independent Oldest College Daily Foundation (OCD). OCD is comprised of former YDN staffers and Yale graduates. In 2005 the OCD came to the YUL with an idea for a project to digitize the Yale Daily News archive and provide access on the Internet.  The OCD realized the complexity of the proposal especially considering they did not own a complete run of the newspaper. They asked the YUL to partner with them as OCD owned the rights to the content while YUL had the expertise and the means to make it accessible. To start the pilot project, OCD and YUL contributed $25,000 each to finance a pilot project. YUL decided that for the pilot project we would not digitize anything for which there was an existing digital edition (the YDN has been available online since 2000). Thus, we still had to choose a small amount of material from 120 years of print issues, or a fraction of the 100,000 pages in the entire run, for our initial digitization pilot.

This type of partnership, between an external group owning copyright and the campus library, can be useful for both parties. It is a good way for library staff to gain experience with a complex digitization project and digital collection building, it provides useful material for fundraising for technology projects, it enables the library to provide a useful resource to the campus community, and it enables both the newspaper and the library to create an online product with research value freely for a product that may not have a large sales market.

Several basic principals helped guide the development of the Yale Daily News Historical Archive. Open or commonly used standards for our digital files were important in the event content needed to be migrated to new interface software in the future. We wanted to digitize each newspaper in its entirety, thereby preserving the historical context provided by editorials, cartoons and advertisements. Therefore it was important that we capture the images on each page, not only the text. Another key principle was our requirement that the Yale Daily News be freely available on the Internet. Finally, we wanted the newspaper to be fully searchable, browsable, and to include advanced search features such as byline and title searches. These principals are similar to those elucidated by NDNP and other newspaper digitization projects.

Download the entire chapter for free (PDF) or purchase online at Amazon.com.

References

Anglo-American Cataloging Rules, Second Edition, 2002 Revision, 2005 Update. (2005) Chicago, IL: American Library Association.

Fair use (2009). Retrieved December 11, 2009, from U.S. Copyright Office website, http://www.copyright.gov/fls/fl102.html

MARC standards. (2010). Retrieved December 11, 2009, from the Library of Congress – Network Development and MARC Standards Office website http://www.loc.gov/marc/

METS. Metadata encoding and transmission standard (2010). Retrieved December 11, 2009, from the official web site http://www.loc.gov/standards/mets/

MODS. Metadata object description schema. (2010). Retrieved December 11, 2009, from the official web site http://www.loc.gov/ standards/mods/

Orphan works: Statement of best practices (June 2009). Retrieved December 11, 2009, from Society of American Archivists’ website http://www.archivists.org/standards/

RLG/OCLC Working Group on Digital Archive Attributes (2002). Trusted digital repositories: Attributes and responsibilities. RLG, May 2002. Retrieved December 11, 2009 http://www.rlg.org/ en/page.php?Page_ID=583

SAA: Describing archives: A content standard (DACS) (n.d.). Retrieved on May 1, 2010 from http://www.archivists.org/ governance/standards/dacs.asp

Schaffner, Jennifer. (2009). Metadata is the interface. Better description for better discovery of archives and special collections, synthesized from user studies. Retrieved December 11, 2009, from the OCLC website http://www.oclc.org/research/publications/

Session 49: More product, less pixels: Alternate approaches to digitization and metadata. (2008). SAA Annual Meeting, August 26-30, 2008.

Technical guidelines for digitizing archival materials for electronic access: Creation of production master files – raster images. (2004). Retrieved December 11, 2009, from the National Archives website http://www.archives.gov/preservation/technical/ guidelines.html