We were honored to hear from four web archiving experts on a webinar on Tuesday, April 21. Traci Mark, Studio Manager at METRO, moderated a discussion on documenting the present moment with Mark Graham from Internet Archive, Nicole Greenhouse from New York University Libraries, Alex Thurman from Columbia University, and Gary Price from Library Journal.
Each panelist presented the work they are doing in the context of the present Covid-19 crisis. Gary Price, Co-Founder and Editor, Library Journal’s infoDOCKET, led with an explanation of why this work matters: information changes rapidly in a complex and evolving crisis. Web pages are prone to disappearing. Even those that are relatively stable are subject to edited sentences, words, and even graphs.
Price shared his belief that “this is a teachable moment for preserving our documents.” He advocates for working with citation organizations and K-12 schools to adjust our practices to cite archived documents, rather than live web pages. In his presentation, Price showcased Archive-It, Save Page Now From Wayback, WebRecorder, and other tools. His presentation notes are available at https://bit.ly/webarchive20.
Nicole Greenhouse, Web Archivist at NYU, started archiving web pages in response to Covid-19 during the week of March 9. She began with daily crawls and moved to weekly crawls to suit the speed at which information was changing. Her mission at present is to document NYC activist and labor response to the crisis, as well as preserve pages that demonstrate the impact of Covid-19 on NYU’s campuses around the world. Among the critical information Greenhouse is preserving are the obituaries for New York City’s transit workers as part of the collecting priorities at the Tamiment Library.
Greenhouse’s sound approach to preserving web pages includes attention to quality assurance. She works with colleagues to develop a document that sets the scope of the collections she works to create and maintain, taking care to avoid capturing documents that could put people at risk.
Alex Thurman, Web Resources Collection Coordinator at Columbia University Libraries, is working on a collaboration between International Internet Preservation Consortium (IIPC), of which Columbia is a part, and the Internet Archive. The group is working to collect web content related to the ongoing Covid-19 outbreak. To date, over 6,000 resources have been nominated by IIPC members, and over 1,400 from the public nomination form. About 4,500 of those resources have been archived. The bulk of the resources nominated so far consist of news media articles, national, state and local government web content devoted to COVID-19, academic public health research, statistical websites tracking and visualizing the number of cases worldwide or in particular countries, and medical journal articles.
Mark Graham, Director of the Wayback Machine at Internet Archive, shared practical tools for preserving web-based materials. In addition to providing the capabilities for web archivists to do their work, Internet Archive’s contributions to the Covid-19 crisis have included the instantiation of The National Emergency Library (NEL). The NEL makes DRM-enabled copies of ebooks available without the restriction of a waitlist. Graham expects the NEL to be available through the end of June, or perhaps through the end of the crisis.
Graham’s case for preserving web-based materials included a government website’s updated recommendations for Covid-19 treatment based on potentially misleading information. Meanwhile, political groups are beginning to form on Facebook in order to advocate for “liberating” states from social distancing measures. Preserving these materials helps us understand the evolving nature of Covid-19 information (and disinformation), a critical need for this epoch, in which lives are on the line.
The webinar wrapped with each presenter sharing a practical tip about archiving online materials. Advice includes:
- Just get started. Find information about which you are passionate, and give the tools a try.
- Have a collection development policy in place. The internet is a huge place; it’s important to narrow the scope. Capture things that are relevant to your user base.
- Focus on quality assurance. Make sure the materials you are capturing are legible to researchers and others after you’ve done the work of identifying and crawling pages.
- Start by defining what you’re going to collect. Look to other institutions to see what others are collecting (here’s a master list of Covid-19-related archives) so that your efforts will be composed of unique contributions.
- Consider the use cases for what you’re saving; a healthy archive is an archive that’s being used.
- If you see something, save something. The average life expectancy of a web page is 100 days, and changes happen even hourly. Save often.
Many thanks to our presenters for sharing their work with us. Please join us at one of our future events; a full listing can be found at metro.org/events.