Collections

Industry Documents Library API and Data Set

API

The Industry Documents Digital Library uses Solr to index the document corpus. Users who are interested in accessing the data programmatically can query the Industry Documents Solr server directly. This allows the user to easily export documents to another system, execute search queries and process search results by program. Data can be exported in these formats: xml, json, python, ruby, php, and csv.

Download documentation

Data Set

For researchers who would prefer to work with Industry Documents Library (IDL) metadata and OCR text from within their own database systems, IDL has made these files available for free download via the link below. Please consult the included readme file for instructions. Note that the IDL website’s user interface provides access to the most current dataset, as the website undergoes a new release each month. In contrast, due to time constraints, the IDL dataset will be updated only twice a year. These files are provided on a do-it-yourself basis. IDL is unable to provide individual technical support for downloading files or for setting up your own database in which to ingest them. We do welcome feedback - Please contact IDL directly via email or phone

https://www.industrydocumentslibrary.ucsf.edu/dataset