1,600,000 documents
30,000 ingested this month
1,432 ingested this week
123 ingested in the last 24 hours
In collaboration with our UW Library staff team members, GeoDeepDive negotiates agreements with publishers that allow programatic downloading and mining of published content.
All documents are securely stored on an access-controlled server at the heart of our digital library infrastructure (GeoDeepDive team members and our collaborators do not have access to original content via our infrastructure). UW-Madison's Center for High Throughput Computing supplies the computational power for processing documents using NLP, OCR, and other software tools useful for TDM tasks, which also allows for deploying new tools quickly against all existing documents.
Our app-template allows collaborators to quickly bootstrap TDM applications that use the NLP and OCR ouput and easily identify potentially relevant documents. Development is done with samples of documents, but applications operating on the full document set can be run on the GeoDeepDive infrastructure.
A question that can be answered by mining the scientific literature. 1.5 million documents from 8 publishers are currently available.
Find the application template on Github
Use words of interest to identify relevant documents. A subset of the literature that contains these words is then generated for testing purposes.
Identify an output schema that can be used to answer the original question, and write an application to parse the input into the desired structure. Python, Postgres, R, and associated modules are currently supported.
Commit your application to a GeoDeepDive infrastructure repository on Github to run your application on our infrastructure and generate results.
Download and analyze results, identify strengths and weaknesses. We will provide bibliographical information about all relevant documents.
Troubleshoot application, resubmit, generate new results. We will continue to grow the dataset as more matching documents are fetched.
Whether you're a publisher interested in contributing your content to our infrastructure, a scientist interested in collaboration, or just curious to know more, let us know!
contact@geodeepdive.org
The GeoDeepDive team is based at the University of Wisconsin - Madison and is made up of domain experts in both the Geosciences and Computer Sciences, librarians, infrastructure developers, and undergraduate, graduate, and postdoctoral researchers
Postdoctoral Researcher
Geoscience
Chris Ré's team is focused on the DeepDive platform for knowledge base creation, and ensuring the datasets produced by the UW-Madison infrastructure team are DeepDive and Snorkel-ready.
Past and present
Postdoctoral App Builder
Carnegie Institute
Undergrad App Builder
Princeton
Physical Scientist
USGS
Software Developer
USGS