Frequently Asked Questions
Will old field codes like au: and ti: still work?
Yes, all 2-digit field codes are still valid. These codes will automatically convert to the new field code when typed into the search box.
From the main search page, users can now search across more than one archive at once. Choose industry collections under "Search Options."
Users can search for per:”glantz, s” and get “Glantz, S”, “glantz, sa” and “Glantz, Stanton”
Users may get more results because we’ve engineered the indexing to include OCR text such as “glan.tz”
Documents have been assigned an ID number which forms the basis of their permanent URL - https://www.industrydocumentslibrary.ucsf.edu/industry/docs/abcd1234. Documents that were in the previous version of the Legacy Tobacco Documents Library and the Drug Industry Documents Archive do retain their TIDs and a search for a document using its TID is still possible (tid:abc00d99)
Can I use documents or media clips in my project?
You can use these materials for a non-commercial project if it falls under ‘Fair Use.’ Please see Copyright and Fair Use
for more information.
Do you make your entire Data Set available? Is there an API?
You can query our Solr index directly using an API. We also provide data sets containing metadata and OCR text for our entire corpus. Please see Industry Documents Library API and Data Set
for more information and documentation.
What is the format for a date query?
The date query can be expressed as YYYYMMDD.
-The year (YYYY) needs to be between 1760 and the current year.
-The month (MM) needs to be between 01 and 12. (notice the leading zero).
-The date (DD) needs to be between 01 and 31. (notice the leading zero).
an example of a valid date query dd:19810123 or dd:[19810123 TO 20101111]
an example of a invalid date query dd:00000000 or dd:[00000000 TO 19899999]
How do I search for variant spellings of names or terms (fuzzy search)?
Fuzzy search is very useful in searching for names or terms when you are not sure how they are spelled in the documents or you think a word might be misspelled in the text.
Use the ~ operator at the end of a term like teen~
A fuzzy search like teen~ searches for words that are similarly spelled to teen. The definition of similar is how far is it from the original word by "edit distance". An edit distance is either an insertion (teens), a deletion (ten), or a substitution (teem).
You can specify how much edit distance you want. For instance, teen~1 will only return words that are at most 1 edit distance away from teen.
cigarettes~ or cigarettes~1 (to be a little more conservative), will return documents where the term is spelled as cigaretes or cigarretes.
If you do not specify a number, then the system searches for teen~0.5 which will return words that are about 50% like teen (in this case 2 edit distance away).
How do you identify "Potential Duplicates"?
Potential duplicates are identified when a document matches another in the following fields:
collection, title, documentdate, pages, availability
What is the "More Like This" feature when viewing a document?
"More Like This" returns documents that are similar to the currently viewed document.
This feature contains two types of documents:
- The public version of a "restricted" document (if it has a public counterpart).
- Recommendations based upon matches in title and author with a slightly higher weight put on title.
What is the "Previous/Next Bates" feature when viewing a document?
"Previous/Next Bates" allows you to view documents in order of Bates number, a sequential number stamped on most litigation documents.
What is the "Browse" feature when viewing a document?
"Browse" allows you to view the documents in the order they were ingested into the archive as a part of a contextual set.