From the Blogosphere
Jon Orwant of Google Books
He says it’s not controversial that patrons are accessing more info online
By: David Weinberger
Mar. 29, 2010 01:06 PM
Jon Orwant is an Engineering Manager at Google, with Google Books under him. He used to be CTO at O’Reilly, and was educated at MIT Media Lab. He’s giving a talk to Harvard’s librarians about his perspective on how libraries might change, a topic he says puts him out on a limb. Title of his talk: “Deriving the library from first principles.” If we were to start from scratch, would they look like today’s? He says no.
Part I: Trends.
At MIT, Nicholas Negroponte contended in the early 1990s that telephones would switch from wired to wireless, and televisions would go from wired to wireless. “It seems obvious in retrospect.” At that time, Jon was doing his work using a Connection Machine, which consisted of 64K little computers. The wet-bar size device he shows provided a whopping 5gb of storage. The Media Lab lost its advantage of being able to provide high end computers since computing power has become widespread. So, Media Lab had to reinvent itself, to provide value as a physical location.
Is there an analogy to the Negroponte switch of telephone and TV, Jon asks? We used to use the library to search for books and talk about them at home. In the future, we’ll use our computer to search for books, and talk about them at our libraries.
What is the mission of libraries, he asks. Se;ect and preserve info, or disseminate it. Might libraries redefine themselves? But this depends on the type of library.
1. University libraries. U of Michigan moved its academic press into the library system, even though the press is the money-making arm.
2. Research libraries. Harvard’s Countway Medical Library incorporates a lab into it, the Center for Bioinformatics. This puts domain expertise and search experts together. And they put in the Warren Anatomical Museum (AKA Harvard’s Freak Museum). Maybe libraries should replicate this, adopting information-driven departments. The ideal learning environment might be a great professor’s office. That 1:1 instruction isn’t generally tenable, but why is it that the higher the level of education, the fewer books are in the learning environment? I.e., kindergarten classes are filled with books, but grad student classrooms have few.
3. Public libraries. They tend to be big open rooms, which is why you have to be quiet in them. What if the architecture were a series of smaller, specialized rooms? Henry Jenkins said about newspapers, Jon says, that it’s strange that hundreds of reporters cover the Superbowl, all writing basically the same story; newspapers should differentiate by geography. Might this notion of specialization apply to libraries, reflecting community interests at a more granular level. Too often, public libraries focus on lowest common denominator, but suppose unusual book collections could rotate like exhibits in museums, with local research experts giving advice and talks. [Turn public libraries into public non-degree based universities?]
Part 2: Software architecture
He talks a bit about the scanning tech, which tries to correct for the inner curve of spines, keeps marginalia while removing dirt, doing OCR, etc. At O’Reilly, the job was to synthesize the elements; at Google, the job is to analyze them. They’re trying to recognize frontispieces, index pages, etc. He gives as a sample of the problem of recognizing italics: “Copyright is way too long to strike the balance between benefits to the author and the public. The entire raison d’etre of copyright is to strike a balance between benefits to the author and the public. Thus, the optimal copyright term is c(x) = 14(n + 1).” In each of these, italics indicates a different semantic point. Google is trying to algorithmically catch the author’s intent.
Physical proximity is good for low-latency apps, local caching, high-bandwidth communication, and immersive environments. So, maybe we’ll see books as applications (e.g., good for physics text that lets you play with problems, maybe not so useful for Plato), real-time video connections to others reading the same book, snazzy visualizations, presentation of lots of data in parallel (reviews, related books, commentary, and annotations).”
“We’ll be paying a lot more attention to annotations” as a culture. He shows a scan of a Chinese book that includes a fold-out piece that contains an annotation; that page is not a single rectangle. “What could we do with persistent annotations?” What could we do with annotations that have not gone through the peer review process? What if undergrads were able to annotate books in ways that their comments persisted for decades? Not everyone would choose to do this, he notes.
We can do new types of research now. If you want to know whether the past tense of “sneak” is, 50 yrs ago people would have said “snuck” but in 50 years it’ll be “sneaked.” You can see that there is a trend toward regularization of verbs (i.e., not irregular verbs) over the time, which you can see by examining the corpus of books Google makes available to researchers. Or, you can look at triplets of words and ask what are the distinctive trigrams. E.g., It was: oxide of lead, vexation of spirit, a striking proof. Now: lesbian and gay, the power elite, the poor countries. Steve Pinker is going to use the corpus to test the “Great man” theory. E.g., when Newton and Leibniz both invented the calculus, was the calculus in the air? Do a calculus word cloud in multiple languages and test against the word configurations of the time. The usage of phrases “World War I” and “The Great War” cross around 1938, but there were some people calling it “WWI” in 1932, which is a good way to discover a new book (wouldn’t you want to read the person who foresaw WWII?). This sort of research is one of the benefits of the Google Books settlement, he says. (He also says that he was both a plaintiff and defendant in the case because as an author, his book was scanned without authorization.)
The images of all the world’s books are about 100 petabytes. If you put terminals in libraries so anyone can access out of print books. You can let patrons print on demand. “Does that have an impact on collections” and budgets? Once that makes economic sense, then every library will “have” every single book.
How can we design a library for serendipity? The fact that books look different is appealing, Jon says. Maybe a library should buy lots and lots of different e-readers, in different form factors. The library could display info-rich electronic spines (graphics of spines) [Jon doesn't know that this is an idea the Harvard Law Library, with whom I'm working, is working on]. We could each have our own virtual rooms and bookshelves, with books that come through various analytics, including books that people I trust are reading. We could also generalize this by having the bookshelves change if more than one person in the room; maybe the topics get broader to find shared interests. We could have bookshelves for a community in general. Analytics of multifactor classification (subject, tone, bias, scholarliness, etc.) can increase “deep” serendipity.
Q: One of the concerns in the research and univ libraries is the ability to return to the evidence you’ve cited. Having many manifestations (= editions, etc.) lets scholars return. We need permanent ways of getting back to evidence at a particular time. E.g., Census Dept. makes corrections, which means people who ran analyses of the data get different answers afterward.
Q: Centralized vs. distributed repository models?
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
SYS-CON Featured Whitepapers
Most Read This Week