Illustration of a person falling in an ocean of books.

Save Your Threads

High-fidelity capture of Twitter threads as sealed PDFs.

Request a capture.

Submitting this form will open a new tab, in which your request will be processed.

The capture process should take around a minute, at the end of which the resulting sealed PDF will be ready to be downloaded.

What is this?

This site is an experiment by the Harvard Library Innovation Lab to let you download signed PDFs of Twitter URLs. Here's an example PDF we made from this tweet.

Who can use it?

To use our website you'll need to contact us for an API key. We're currently only able to share a limited number with people like journalists, internet scholars, and archivists. But you can also use our open source software to stand up an archive server of your own, and share it with your friends.

Why make a PDF archiving tool for Twitter?

There are lots of screenshots of Twitter threads going around. Some are real, some are fake. You can't tell who made them, or when they were made.

PDFs let us apply document signatures and timestamps so anyone can check, in the future, that a PDF you download with this site really came from the Harvard Library Innovation Lab and hasn't been edited.

PDFs also let us bundle in additional media as attachments. Each signed PDF currently includes all images in the page (so you can see full size images that are cropped in the PDF view), the primary video on the page if any, as well as a list of all the t.co links on the thread and their actual destinations.

Why not make a PDF archiving tool for Twitter?

Not everything on Twitter wants to be archived! On Twitter all kinds of conversations happen at different levels of privacy in the same public space. Some tweets want to be quiet; some want to be forgotten; some are by public figures or have public impact or sentimental value and want to be kept around. Please think carefully about what you choose to preserve.

Library nerd note: societies create much more data than they can save. "Thinking carefully about what you choose to preserve" is part of the practice of archiving. By doing it, you're helping to form our shared cultural memory.

How do you make these PDFs (and why does it take so long)?

Twitter captures are made using open source web archiving software we're developing at the Library Innovation Lab for eventual use in our Perma.cc project. The software uses a headless Chrome browser to render the page as it would appear to a reader. For this experiment, we're also running custom javascript in the headless browser to remove Twitter UI and make the content easier to read.

Captures can take as long as a minute, because we scroll to load resources from the entire Twitter thread.

Why didn't my requested URL capture correctly?

If a page fails to capture correctly after a few attempts, let us know. Also note that some PDF viewers will truncate very large PDFs, so you may need to try a different viewer if the top and bottom are hidden.

Deactivating the "unfold thread" option might also give better results in certain cases.

How do I check that a PDF came from you?

See our Signature Verification Page.

Does a signature on a PDF web archive mean it's real?

Well … no. Library folks like to talk about "authenticity" and "provenance". A signature on a PDF tells you its provenance: you can prove that you really got the PDF from us, and that we couldn't have created it after a certain date. You'll then have to decide whether you trust our claim that the PDF we gave you represents a real page we saw on Twitter (and that no one else has messed with our servers). If someone else gives you a signed PDF, they're giving you a different provenance chain, and you can trace that back to decide who you're being asked to trust.

Tech nerd note: This whole trust step is needed because of something called repudiability: https web transactions are deliberately designed to be repudiable, meaning there's no way to tell as a third party after the fact whether they ever really happened. Signed HTTP exchanges are one proposal that may eventually let websites choose to publish verifiable content instead, but they aren't here yet. So for now, you're left deciding whether "social.perma.cc" is an intermediary you want to choose to trust.

What is your privacy policy?

We may log requested Twitter URLs and may store cached copies of delivered archives. We also log cryptographic hashes of all PDF files delivered, in case there is a later question of authenticity. We store normal server request logs.