Frequently Asked Questions
Why did your name change from Legacy Tobacco Documents Library?
The archive was named after Legacy for Health, which provided funding for its creation and ongoing maintenance. Legacy’s name became Truth Initiative
in September 2015. We updated the archive’s name to align with this change.
Will old field codes like au: and ddi: still work?
Yes, all 2-digit field codes are still valid. These codes will automatically convert to the new field code when typed into the search box.
- Users can search for per:”glantz, s” and get “Glantz, S”, “glantz, sa” and “Glantz, Stanton”
- Users may get more results because we’ve engineered the indexing to include OCR text such as “glan.tz”
- Documents have been assigned an ID number which forms the basis of their permanent URL - https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/abcd1234. Documents that were in the previous version of the archive (formerly known as the Legacy Tobacco Documents Library) do retain their TIDs and a search for a document using its TID is still possible (tid:abc00d99)
Why did I find my name on a document in your library?
If you have found your name in our archive it is because a tobacco company turned over a document with your name on it during litigation. The US tobacco companies were sued in the 1990s and the resulting Master Settlement Agreement
(MSA) of 1998 ordered the companies to make all corporate documents from the lawsuit available in a publicly accessible depository and online website. These internal files contained a variety of documents including consumer letters, department memos, advertisements, newspaper clippings, and reports. The UCSF Truth Tobacco Industry Documents archive gathers all of these company documents into one central location for ease of research and education about this large industry and its effect upon public health and public policy.
Can you remove a document with my personal information?
We are a library/archive and therefore cannot remove or substantially edit any documents we did not create. We do take the issue of sensitive personal information very seriously and we proactively black-out any sensitive personal information found in the documents. What we are required to protect is listed in SB 1386
and includes the following:
- Social Security number
- Drivers license number or California identification card number
- Financial account number, credit or debit card number, in combination with any required security code, access code, or password that would permit access to an individual’s financial account
- Medical information
- Health insurance information
Please contact us
if you have found personal information on a document.
Why do some documents have a lock icon?
The lock icon identifies any documents that have restricted access. When a company withholds a document on privilege or confidential trade-secret grounds, there is no document image attached to the index record. Our Library only receives the information about the document, not the PDF itself. We may have a public version of the restricted document - open the record for the restricted document and then click on "More Like This" in the upper "Browse" area. Any public versions of the document will be displayed in the carousel area. For more information, see Privileged and Confidential Documents
Why is some text blocked out?
Some documents have "redactions," with a black or white box or black highlighting that makes the original text unreadable. Sometimes these are redactions of personally identifiable information that has been withheld from public view based on a purported privacy concern. If you come across a personal confidential redaction that you would like to see in un-redacted form, you can contact us
to request that we inquire whether the redaction can be lifted.
Can I use documents or media clips in my project?
You can use these materials for a non-commercial project if it falls under ‘Fair Use.’ Please see Copyright and Fair Use
for more information.
What is the Master Settlement Agreement?
The Master Settlement Agreement of 1998 was a multibillion-dollar settlement of dozens of lawsuits in a majority of the U.S. States against the tobacco industry. Among its many requirements, the settlement mandated that the tobacco companies release their internal company documents to the public by submitting them to a depository in Minnesota as well as creating and maintaining document websites. The Truth Tobacco Industry Documents archive preserves and maintains electronic versions of these released documents, making them widely available to researchers and the general public. More information on the MSA
, including the full text of the Agreement.
Do you make your entire Data Set available? Is there an API?
You can query our Solr index directly using an API. We also provide data sets containing metadata and OCR text for our entire corpus. Please see Industry Documents Library API and Data Set
for more information and documentation.
What is the format for a date query?
The date query can be expressed as YYYYMMDD.
-The year (YYYY) needs to be between 1760 and the current year.
-The month (MM) needs to be between 01 and 12. (notice the leading zero).
-The date (DD) needs to be between 01 and 31. (notice the leading zero).
an example of a valid date query dd:19810123 or dd:[19810123 TO 20101111]
an example of a invalid date query dd:00000000 or dd:[00000000 TO 19899999]
How do I search for variant spellings of names or terms (fuzzy search)?
Fuzzy search is very useful in searching for names or terms when you are not sure how they are spelled in the documents or you think a word might be misspelled in the text.
Use the ~ operator at the end of a term like teen~
A fuzzy search like teen~ searches for words that are similarly spelled to teen. The definition of similar is how far is it from the original word by "edit distance". An edit distance is either an insertion (teens), a deletion (ten), or a substitution (teem).
You can specify how much edit distance you want. For instance, teen~1 will only return words that are at most 1 edit distance away from teen.
cigarettes~ or cigarettes~1 (to be a little more conservative), will return documents where the term is spelled as cigaretes or cigarretes.
If you do not specify a number, then the system searches for teen~0.5 which will return words that are about 50% like teen (in this case 2 edit distance away).
How do you identify "Potential Duplicates"?
Potential duplicates are identified when a document matches another in the following fields:
collection, title, documentdate, pages, availability
What is the "More Like This" feature when viewing a document?
"More Like This" returns documents that are similar to the currently viewed document.
This feature contains two types of documents:
- The public version of a "restricted" document (if it has a public counterpart).
- Recommendations based upon matches in title and author with a slightly higher weight put on title.