~200,000 ingested this month
~50,000 ingested this week
~8,000 ingested in the last 24 hours
In collaboration with our UW Library staff team members, xDD negotiates agreements with publishers that allow programatic downloading and mining of published content.
All documents are securely stored on an access-controlled server at the heart of our digital library infrastructure (xDD team members and our collaborators do not have access to original content via our infrastructure). UW-Madison's Center for High Throughput Computing supplies the computational power for processing documents using NLP, OCR, and other software tools useful for TDM tasks, which also allows for deploying new tools quickly against all existing documents.
Our app-template allows collaborators to quickly bootstrap TDM applications that use the NLP and OCR ouput and easily identify potentially relevant documents. Development is done with samples of documents, but applications operating on the full document set can be run on the xDD infrastructure.
A question that can be answered by mining the scientific literature. 1.5 million documents from 8 publishers are currently available.
Find the application template on Github
Use words of interest to identify relevant documents. A subset of the literature that contains these words is then generated for testing purposes.
Identify an output schema that can be used to answer the original question, and write an application to parse the input into the desired structure. Python, Postgres, R, and associated modules are currently supported.
Commit your application to a xDD infrastructure repository on Github to run your application on our infrastructure and generate results.
Download and analyze results, identify strengths and weaknesses. We will provide bibliographical information about all relevant documents.
Troubleshoot application, resubmit, generate new results. We will continue to grow the dataset as more matching documents are fetched.
Whether you're a publisher interested in contributing your content to our infrastructure, a scientist interested in collaboration, or just curious to know more, let us know!
The xDD team is based at the University of Wisconsin - Madison and is made up of domain experts in both the Geosciences and Computer Sciences, librarians, infrastructure developers, and undergraduate, graduate, and postdoctoral researchers
Chris Ré's team is focused on the DeepDive platform for knowledge base creation, and ensuring the datasets produced by the UW-Madison infrastructure team are DeepDive and Snorkel-ready.
Past and present
University of Victoria
Senior Research Scientist
Arizona Geological Survey
Undergrad App Builder
University of Massachusetts Amherst
Postdoctoral App Builder