Pickling the Web at the GC

Pickling the Web at the GC

Pickling the Web at the GC. Photo is © Jennifer Peebles , used under a Creative Commons Attribution-NonCommercial license.

Stephen Klein, Digital Services Librarian for the Graduate Center Librarian, has recently published an article in the Journal of Interactive Technology & Pedagogy where he discusses why the Graduate Center Library has embraced a dual approach to web archiving using both Archive-It and Webrecorder.

Stephen explains that Archive-It is good for scalability and access, because it uses an automated crawler to ingest a website and then makes it available via the Wayback Machine. However, Archive-It has major limitations because the copy of the site it ingests does not provide an in-browser experience where searches, embedded media, timelines, etc. are typically lost.

Webrecorder complements Archive-It, because the recorder is able to execute parts of a website that an automated crawler, such as Archive-It may not be able to, thus content and the in-browser experience is less likely to be lost. However, Webrecorder is not efficient in terms of scalability, because users must manually visit each page and click on each Javascript based ‘object’ on a page for the action to be executed and Webrecorder to record it. Furthermore, Webrecorder does not have a simple way to access and view the saved archive file (WARC).

Here is the full article if you would like to learn more.

Are you a GC student and have a digital project that you would like to submit as part of your graduate work? Please visit our digital submissions page to either learn about some of our recommended best practices for web development, learn about the library’s recommended different approaches to preservation based upon your digital project type or, if almost complete with your digital project, make an appointment to ‘deposit’ your digital work.

 

About the Author

Digital Services Librarian