Bancroft to Explore Text Analysis as Aid in Analyzing, Processing, and Providing Access to Text-based Archival Collections

Mary W. Elings, Head of Digital Collections, The Bancroft Library

The Bancroft Library recently began testing a theory discussed at the Radcliffe Workshop on Technology & Archival Processing held at Harvard’s Radcliffe College in early April 2014. The theory suggested that archives can use text analysis tools and topic modelling — a type of statistical model for discovering the abstract “topics” that occur in a collection of documents — to analyze text-based archival collections in order to aid in analyzing, processing and describing collections, as well as improving access.

Helping us to test this theory, the Bancroft welcomed summer intern Janine Heiser from the UC Berkeley School of Information. Over the summer, supported by an ISchool Summer Non-profit Internship Grant, Ms. Heiser worked with digitized analog archival materials to test this theory, answer specific research questions, and define use cases that will help us determine if text analysis and topic modelling are viable technologies to aid us in our archival work. Based on her work over the summer, the Bancroft has recently awarded Ms. Heiser an Archival Technologies Fellowship for 2015 so that she can continue the work she began in the summer and further develop and test her work.

                During her summer internship, Ms. Heiser created a web-based application, called “ArchExtract” that extracts topics and named entities (people, places, subjects, dates, etc.) from a given collection. This application implements and extends various natural language processing software tools such as MALLET and the Stanford Core NLP toolkit. To test and refine this web application, Janine used collections with an existing catalog record and/or finding aid, namely the John Muir Correspondence collection, which was digitized in 209.

                For a given collection, an archivist can compare the topics and named entities that ArchExtract outputs to the topics found in the extant descriptive information, looking at the similarities and differences between the two in order to verify ArchExtract’s accuracy. After evaluating the accuracy, the ArchExtract can be improved and/or refined.

                Ms. Heiser also work with collections that either have minimal description or no extant description in order to further explore this theory as we test the tool further. Working with Bancroft archivists, Ms. Heiser will determine if the web application is successful, where it falls short, and what the next steps might be in exploring this and other text analysis tools to aid in processing collections.

                The hope is that automated text analysis will be a way for libraries and archives to use this technology to readily identify the major topics found in a collection, and potentially identify named entities found in the text, and their frequency, thus giving archivists a good understanding of the scope and content of a collection before it is processed. This could help in identifying processing priorities, funding opportunities, and ultimately helping user identify what is found in the collection.

               Ms. Heiser is a second year masters’ student at the UC Berkeley School of Information where she is learning the theory and practice of storing, retrieving and analyzing digital information in a variety of contexts and is currently taking coursework in natural language processing with Marti Hearst. Prior to the ISchool, Ms. Heiser worked at several companies where she helped develop database systems and software for political parties, non-profits organizations, and an online music distributor. In her free time, she likes to go running and hiking around the bay area. Ms. Heiser was also one of our participants in the #HackFSM hackathon! She was awarded an ISchool Summer Non-profit Internship Grant to support her work at Bancroft this summer and has been awarded an Archival Technologies Fellowship at Bancroft for 2015.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

#HackFSM Whitepaper is out: “#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks”

The Bancroft Library and Research IT have just published a whitepaper on the #HackFSM hackathon: “#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks.”

Abstract:

This white paper describes the process of organizing #HackFSM, a digital humanities hackathon around the Free Speech Movement digital archive, jointly organized by Research IT and The Bancroft Library at UC Berkeley. The paper includes numerous appendices and templates of use for organizations that wish to hold a similar event.

Publication download:  HackFSM_bootstrapping_library_hackathon.pdf

Citation:

“#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks”. Dombrowski, Quinn, Mary Elings, Steve Masover, and Camille Villa. “#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks”. Research IT at Berk. Published October 3, 2014.

From: http://research-it.berkeley.edu/publications/hackfsm-bootstrapping-library-hackathon-eight-short-weeks

Posted in digital humanities | Tagged , , , | Leave a comment

Bancroft hosts #HackFSM, the first interdisciplinary hackathon at UC Berkeley

By Charlie Macquarie and Mary Elings, Bancroft Digital Collections

In April, The Bancroft Library and the UC Berkeley Digital Humanities Working Group organized #HackFSM, a digital humanities hackathon using the data of the Free Speech Movement digital collections at Berkeley. In preparation for the fiftieth anniversary of the FSM at Berkeley coming up in fall 2014, the event was an opportunity to engage the UC Berkeley community around the materials and history of the movement and align that conversation with the movement’s legacy of open discourse and access to information in new ways for the digital age.

This was the first interdisciplinary, digital humanities hackathon on the Berkeley campus. All participants had to be current UC Berkeley students and had to be members of a team of between two and four participants. Each team was required to include at least one humanist and one programmer (defined by their program of study).

The teams were tasked with creating a compelling web-based user interface for the materials from the FSM digital archive, one of Bancroft’s early digital initiatives. The hackathon teams were provided access to the collections data through an Apache Solr-indexed API which was put together by the UC Berkeley Library Systems Office.

The event kicked off on April 1 when teams gathered or were formed and received API keys to the data. We also had a speaker who framed the time period historically for the participants. The closing event on April 12 offered each team time to present their project and then judges deliberated and announced the winners.

The #HackFSM hackathon was different from traditional hackathons in several ways. First, we extended the traditional compressed 24-48 hours hackathon format to 12 days. This was intended to give teams more time to explore the data and develop their projects more fully.

The expanded timeframe also allowed more opportunity for collaboration between members of each team and was intended to increase participation by students who were not necessarily part of the hackathon community or shied away from the typical compressed format — particularly women. The interdisciplinary teams also had to fulfill another requirement of the hackathon: that the web application designed would enable a researcher to answer a humanities research question, so the teams actually had to learn to communicate across their disciplines, which ended up being very successful.

Teams had access to mentors (academic and industry) throughout the 12 days. At the final event, projects were judged by two panels. One panel assessed the usability, appearance, and value of the interface from a humanist standpoint and another reviewed the quality of the code and the deployability of the tool from a technical point of view. Additionally, each team’s project had to comply with the campus policies for web accessibility and security. Compliance to these criteria was verified by running automated testing tools on each contestant site.

After presentations were completed first place was awarded to the team of Alice Liu, Craig Hiller, Kevin Casey, and Cassie Xiong, and second went to Olivia Benowitz, Nicholas Chang, Jason Khoe, and Edwin Lao. The winning team’s website has been deployed at http://hackfsm.lib.berkeley.edu/. Collectively, we were surprised and pleased by the high-quality of all the projects, both visually and functionally.

Overall, The Bancroft felt the hackathon was a very valuable experience and one we hope to build upon in the near future. It was a highly collaborative and engaging event, both for the students and for us. The event required reaching out across campus and our community, to students, IT, and administrators. The students also felt the interdisciplinary nature of the event was positive for them. They had to learn to talk to one another, teach one another, and build something together. Other feedback we received from the students included their excitement about our materials, as well as the fact that they thought the challenge we presented and having the opportunity to see their site hosted by the library was sufficient reward for participating (but the prizes were also cool).

We look forward to engaging more community around our collections and supporting digital humanities efforts in the future. They say that imitation is sincerest form of flattery; The Phoebe Hearst Museum of Anthropology, a fellow UCB institution, has just announced their first hackathon. That is great news.

Mary W. Elings, Head of Digital Collections

Charlie Macquarie,  Digital Collections Assistant

(this text is excerpted and derived from an article written for the Society of California Archivist Newsletter, Summer 2014).

Posted in digital humanities | Tagged , , | Leave a comment

Digital Humanities and the Library

The topic of Digital Humanities (and Social Sciences Computing) has been a ubiquitous one at recent conferences, and this is no less true of The 53rd annual RBMS “Futures” Preconference in San Diego that took place June 19-22, 2012. The opening plenary, “Use,” on Digital Humanities featured two well-known practitioners in this field, Bethany Nowviskie of the University of Virginia and Matthew G. Kirschenbaum of the University of Maryland. For those of us who have been working in the digital library and digital collection realm for many years, Bethany’s discussion of the origins and long history of digital humanities was no surprise. Digitized library and special collection materials have been the source content used by digital humanists and digital librarians to carry out their work since the late 1980s. As a speaker at one of the ACH-ALLC programs in 1999, I was exposed to the digital tools and technologies being used to support research and scholarly exploration in what was then called linguistic and humanities computing. This work encompassed not only textual materials, but also still images, moving images, databases, and geographic materials; the stuff upon which current digital humanities and social sciences efforts are still based. What I learned then—and what the plenary speakers confirmed at this conference—is that this work has and continues to be collaborative and interdisciplinary. Long-established humanities computing centers at the Universities of Virginia and Maryland have supported this work for years, and they have had a natural partner in the library. Over the years, humanities computing centers have continued to evolve, often set within or supported by the library, and the field that is now known as Digital Humanities has gained prominence. The fact that this plenary opened the conference indicates that this topic is an important one to our community.

As scholars’ work is increasingly focused on digital materials, either digitized from physical collections or born-digital, we are seeing more demand for digital content and tools to carry out digital analysis, visualization, and computational processing, among other activities. Perhaps this is due to the maturation of the field of humanities computing, or the availability of more digital source content, or the rise of a new generation of digital native researchers. Whatever the reason, the role of the library (and the archive and the museum, for that matter) is central to this work. The library is an obvious source of digital materials for these scholars to work with, as was pointed out by both speakers.

Libraries can play a central role in providing access to this content through traditional activities, such as cataloging of digital materials, supporting digitization initiatives, and acquisition of digital content, as well as taking on new activities, such as supporting technology solutions (like digital tools), providing digital lab workspaces, and facilitating bulk access to data and content through mechanisms such as APIs. Just as we have built and facilitated access to analog research materials, we need to turn our attention to building and supporting use of digital research collections.

As Bethany stressed in her talk, we need more digital content for these scholars to work with and use. Digital humanities centers can partner with libraries to increase the scale of digitized materials in special collections or can give us tools to work with born-digital archives from pre-acquisition assessment through access to users, such as the tools being developed by Matthew’s “Bit Curator” project . By providing more content and taking the “magic” out of working with digital content, greater use can be facilitated. Unlike with physical materials, as Bethany pointed out, digital materials require use in order to remain viable, so the more we use digital materials, the longer they will last. She referred to this as “tactical preservation,” saying that our digital materials should be “bright keys,” in that the more they are used the brighter they become. By increasing use—making it easier to access and work with digital materials—we can ensure digital “futures” for our collections, whether physical or born-digital.

The collaborative nature of Digital Humanities projects — and centers — brings together researchers, technologists, tools, and content. These “places” may take various forms, but in almost all cases, the library and the historical content it collects and preserves plays a central role as the “stuff” of which digital humanities research and scholarly production is made. With its historical role in collecting and providing access to research materials, supporting teaching and learning, and long affinity with using technology for knowledge discovery, the library is well-positioned to support this work and become an even more active partner in the digital humanities and social sciences computing.

Mary W. Elings
Head of Digital Collections
(this text is excerpted and derived from an article written for RBM: A Journal of Rare Books, Manuscripts, and Cultural Heritage).

Posted in Digital collections, digital humanities | Tagged | Leave a comment

Russell Means, November 10, 1939 – October 22, 2012

Russell Means, seen here with Dennis Banks (and William Kunstler in the background) at a press conference regarding the Patricia Hearst kidnapping at the San Francisco Airport Hilton on February 19, 1974, from the Bancroft Library’s Fang Family San Francisco Examiner photograph archive negative files.

Born on the Pine Ridge reservation in South Dakota, Russell Means moved with his family to the San Francisco Bay Area when he was three, in 1942. Banks graduated from San Leandro High School, and after stints in college and working on Indian reservations around the United States, he went on to become a leader in the American Indian Movement.

A Oglala Sioux, Means fought for the rights of indigenous people around the world, urging President Reagan to support the Miskito people in Nicaragua during the rise of the Sandinista government, and staging occupations at Mount Rushmore and the site of the Battle of Wounded Knee to raise awareness of Indian treaties and claims to land that the U.S. government neglected.

Means was a charismatic and divisive public figure, running for the Libertarian nomination in the 1987 presidential election, and appearing in dozens of films, including a starring role in The Last of the Mohicans.  Means died of cancer at his home on the Pine Ridge Reservation on October 22nd.

Posted in Uncategorized | 1 Comment

Fiat Lux Redux: Ansel Adams and Clark Kerr Online Exhibit

The Bancroft Library is pleased to present the online companion exhibit to Fiat Lux Redux: Ansel Adams and Clark Kerr, which opened in The Bancroft Gallery on September 27, 2012. The online exhibit features photographs of the University of California System in the 1960s by legendary photographer Ansel Adams. These photographs — commissioned by former UC President Clark Kerr, and published in the 1967 book Fiat Lux which celebrated the educational system’s centennial — offer a rarely seen look at the evolution of the renowned University of California system through the eye of a master photographer best known for his iconic California landscapes. Fiat Lux was intended not as a document of the University as it was, but rather a portrait of the University as it would be. The Fiat Lux project was a massive endeavor, producing 605 fine prints and over 6,700 negatives, far more than the 1,000 images stipulated in Adams’s contract. After Adams’s lifetime devotion to Yosemite, Fiat Lux was probably the biggest single project of his life. The online exhibit also showcases related archival materials about the controversial Kerr himself, and the evolution of his ideas and ideals.

Visit the companion online exhibit:
http://www.lib.berkeley.edu/omeka/exhibits/show/fiat-lux/

This slideshow requires JavaScript.

©1967, the Regents of the University of California, by permission of The Bancroft Library.

Transmission or reproduction of materials protected by copyright beyond that allowed by fair use requires the written permission of the copyright owners. All requests for permission to publish must be submitted in writing to the Head of Public Services.

Posted in Uncategorized | Leave a comment

Amusements of Yesteryear

This slideshow requires JavaScript.

A zoo in the Mission?  Water slides in the Richmond?  As early as the 1890s, there was no shortage of places to seek thrills and fun in San Francisco, though almost no trace of these attractions exist today. Here’s a few of the spots where San Franciscans used to go to have fun:

  • The original and once-exclusive home of the It’s-It ice cream sandwich, Playland, also known as Chutes At the Beach, was an amusement park at Ocean Beach that operated from the 1910s-1972.  Visitors could enjoy rides like the Big Dipper, the Aeroplane Swing, and the Ship of Joy, as well as a 68-horse carousel, a fun house with a Laughing Sal, game booths, and an enormous camera obscura (which still exists today near the Cliffhouse).
  • If you weren’t in the mood for sugar and adrenaline, you could visit another seaside institution not far from Playland, the Sutro Baths.  Operating from 1896-1966, the Baths were a gigantic indoor pool complex with six salt water pools ranging from ice-cold to 80 degrees.  Less of a lap pool and more of a place to play in the salt water, you could enter the pools through slides, by swinging on trapezes or rings, or by jumping off one of the many diving boards.  Non-swimmers and spectators could watch from the stadium-style seating.
  • Over in the Mission in the late 1870s, you might spend a sunny weekend day exploring Woodward’s Gardens, located on a four-acre plot of land near Mission and 15th streets.  For 25 cents you could take in live animal attractions, including bears, lions, monkeys, wolves and kangaroos, as well as the extensive collection of taxidermy animals (seen in the slideshow above) arranged in curious groupings not found in nature.  As if that weren’t enough, there was an extensive aquarium, four art museums, an art gallery, a rollerskating rink, hot air balloon rides, and various live performances, including acrobatics and other feats.
  • Finally, if thrills were what you sought, you could visit any of the Chutes locations that cropped up around the city in the early 1900s.  For a dime you could take an elevator to the top of a tower, where 8-person boats awaited to plunge you at break-neck speed to the man-made lake at the bottom.

1-3: The Chutes at The Beach (a.k.a. Playland at the Beach)

4-6: Sutro Baths

7-9: Various Chutes

10-17: Woodward Gardens

Posted in Slideshow | Leave a comment