HTRC Extracted Features Dataset

Wed, Jul 8, 2020, 10:00 am to 11:30 am
Location: 
Virtual
Sponsor(s): 
Princeton Research Data Service

This virtual four-workshop series will allow attendees to gain experience with tools and data from the HathiTrust Research Center (HTRC). The Research Center facilitates text and data mining uses of the HathiTrust corpus. HathiTrust is a partnership of research libraries, and it is a digital library containing 17.3 million items digitized at the partner libraries. HTRC tools and data range from off-the-shelf options to more advanced offerings for experienced scholars. 

The workshops will be held via Zoom and will include a mix of hands-on, discussion, and presentation. We will utilize breakout rooms to support hands-on activities. You will not be required to install any software to participate in the workshops. The workshops are open to faculty, graduate students, postdoctoral researchers, librarians, and other academic staff.

Librarians who attend all four workshops will be invited to join a cohort of other librarians who are teaching with and about the Research Center. This cohort has access to additional support from HTRC, further training opportunities, and a community of their peers who are interested in HTRC. 

In this second of four workshops, we will introduce you to the Extracted Features data model and the kinds of research it enables. HTRC recently released an updated version of the Extracted Features dataset (v.2.0) that includes 17+ million files, with each file representing a volume in the HathiTrust Digital Library. The Extracted Features files contain metadata about the volumes, as well as tokens (words), parts of speech, and their per-page counts. The dataset can be used for text analysis projects where access to the words and word-counts in a volume are expected by the algorithm, such as topic modeling or certain kinds of machine learning projects. This session will include a hands-on activity using the dataset.

Co-sponsored by the Center for Digital Humanities and the Princeton Research Data Service

To request disability-related accommodations for this event, please contact pulcomm@princeton.edu at least 3 working days in advance. 

 

 


 

Upcoming Professional Development Events

COVID-19 and On-Campus Events

Princeton University is actively monitoring the situation around coronavirus (Covid-19) and the evolving guidance from government and health authorities, in keeping with our commitment to ensure the health and safety of all members of the University community. The latest communications from the Graduate School to graduate students are available here. The latest University guidance for all students, faculty, and staff is available on the University’s website.

Accessibility

To request accommodations for this or any event, please contact the organizer or James M. Van Wyck at [jvanwyck@princeton.edu,] at least 3 working days prior to the event.