Collections

Frequently Asked Questions

Please contact us at industrydocuments@ucsf.edu if you are interested in large downloads of PDFs. For batch downloads of records/metadata, you can query our Solr index directly using an API. We also provide data sets containing metadata and OCR text for our entire corpus. Please see Industry Documents Library API and Data Set for more information and documentation.
You can use these materials for a non-commercial project if it falls under ‘Fair Use.’ Please see Copyright and Fair Use for more information.
You can query our Solr index directly using an API. We also provide data sets containing metadata and OCR text for our entire corpus. Please see Industry Documents Library API and Data Set for more information and documentation.
The date query can be expressed as YYYYMMDD.

-The year (YYYY) needs to be between 1760 and the current year.
-The month (MM) needs to be between 01 and 12. (notice the leading zero).
-The date (DD) needs to be between 01 and 31. (notice the leading zero).

an example of a valid date query dd:19810123 or dd:[19810123 TO 20101111]
an example of a invalid date query dd:00000000 or dd:[00000000 TO 19899999]
Fuzzy search is very useful in searching for names or terms when you are not sure how they are spelled in the documents or you think a word might be misspelled in the text.

Use the ~ operator at the end of a term like teen~
A fuzzy search like teen~ searches for words that are similarly spelled to teen. The definition of similar is how far is it from the original word by "edit distance". An edit distance is either an insertion (teens), a deletion (ten), or a substitution (teem).

You can specify how much edit distance you want. For instance, teen~1 will only return words that are at most 1 edit distance away from teen.
cigarettes~ or cigarettes~1 (to be a little more conservative), will return documents where the term is spelled as cigaretes or cigarretes.

If you do not specify a number, then the system searches for teen~0.5 which will return words that are about 50% like teen (in this case 2 edit distance away).
Potential duplicates are identified when a document matches another in the following fields:
collection, title, documentdate, pages, availability
"More Like This" returns documents that are similar to the currently viewed document.
This feature contains two types of documents:
  • The public version of a "restricted" document (if it has a public counterpart).
  • Recommendations based upon matches in title and author with a slightly higher weight put on title.
"Previous/Next Bates" allows you to view documents in order of Bates number, a sequential number stamped on most litigation documents.
"Browse" allows you to view the documents in the order they were ingested into the archive as a part of a contextual set.