Illustration of a person falling in an ocean of books.

Save Your Threads

High-fidelity capture of Twitter threads as sealed PDFs.

Request a capture.

Submitting this form will open a new tab, in which your request will be processed.

The capture process should take around a minute, at the end of which the resulting sealed PDF will be ready to be downloaded.

What is this?

This site is an experiment by the Harvard Library Innovation Lab to let you download signed PDFs of Twitter URLs. Here's an example PDF we made from this tweet.

Why make a PDF archiving tool for Twitter?

There are lots of screenshots of Twitter threads going around. Some are real, some are fake. You can't tell who made them, or when they were made.

PDFs let us apply document signatures and timestamps so anyone can check, in the future, that a PDF you download with this site really came from the Harvard Library Innovation Lab and hasn't been edited.

PDFs also let us bundle in additional media as attachments. Each signed PDF currently includes all images in the page (so you can see full size images that are cropped in the PDF view), the primary video on the page if any, as well as a list of all the links on the thread and their actual destinations.

Why not make a PDF archiving tool for Twitter?

Not everything on Twitter wants to be archived! On Twitter all kinds of conversations happen at different levels of privacy in the same public space. Some tweets want to be quiet; some want to be forgotten; some are by public figures or have public impact or sentimental value and want to be kept around. Please think carefully about what you choose to preserve.

Library nerd note: societies create much more data than they can save. "Thinking carefully about what you choose to preserve" is part of the practice of archiving. By doing it, you're helping to form our shared cultural memory.

Why do you ask the reason for archiving?

At the Library Innovation Lab, we build experiments like this to explore what's most important to save in the cultural record and how we can save it. Your answer will help us understand whether this tool is serving its purpose, who it's helping, and what other features it might need. Feel free to provide as much or as little detail as you want about who you are and what you're trying to accomplish. Including the same answer each time is fine.

How do you make these PDFs (and why does it take so long)?

Twitter captures are made using open source web archiving software we're developing at the Library Innovation Lab for eventual use in our project. The software uses a headless Chrome browser to render the page as it would appear to a reader. For this experiment, we're also running custom javascript in the headless browser to remove Twitter UI and make the content easier to read.

Captures can take as long as a minute, because we scroll to load resources from the entire Twitter thread.

Why didn't my requested URL capture correctly?

If a page fails to capture correctly after a few attempts, let us know. Also note that some PDF viewers will truncate very large PDFs, so you may need to try a different viewer if the top and bottom are hidden.

Deactivating the "unfold thread" option might also give better results in certain cases.

How do I check that a PDF came from you?

See our Signature Verification Page.

Does a signature on a PDF web archive mean it's real?

Well … no. Library folks like to talk about "authenticity" and "provenance". A signature on a PDF tells you its provenance: you can prove that you really got the PDF from us, and that we couldn't have created it after a certain date. You'll then have to decide whether you trust our claim that the PDF we gave you represents a real page we saw on Twitter (and that no one else has messed with our servers). If someone else gives you a signed PDF, they're giving you a different provenance chain, and you can trace that back to decide who you're being asked to trust.

Tech nerd note: This whole trust step is needed because of something called repudiability: https web transactions are deliberately designed to be repudiable, meaning there's no way to tell as a third party after the fact whether they ever really happened. Signed HTTP exchanges are one proposal that may eventually let websites choose to publish verifiable content instead, but they aren't here yet. So for now, you're left deciding whether "" is an intermediary you want to choose to trust.

Is the code for this site available?

Yes! Code for this site is published under an open license on GitHub. We encourage you to run your own instance of the server — remembering that if you run it, you'll be the one asserting provenance of the resulting PDF files.

What is your privacy policy?

We may log requested Twitter URLs and may store cached copies of delivered archives. We also log cryptographic hashes of all PDF files delivered, in case there is a later question of authenticity. We store normal server request logs.