Replace Text in a PDF File with Python
I recently googled “search and replace text in pdf with python”, and all the solutions were hacky and extra complicated. I assumed that someone else must have implemented an elegant solution for this trivial task, but it was hard to find.
Luckily, after experimenting with many of the Google search result solutions, I found the code snippet that I needed to replace some text in a PDF file with Python.
First step would be to uncompress your PDF file:
sudo apt install pdftk # Google the installation steps for `pdftk` if you use a different package manager
pdftk original.pdf output uncompressed.pdf uncompress
Second step is to use PyMuPDF
(pip install PyMuPDF
) to replace your text:
import fitz
text_to_replace = "TEST"
replacement = "REPLACED"
doc = fitz.open("uncompressed.pdf")
for page in doc:
text_to_replace_search = page.search_for(text)
for text_to_replace in text_to_replace_search:
page.add_redact_annot(text_to_replace, text=replacement)
page.apply_redactions()
doc.save("output.pdf")
It doesn’t do a perfect replacement, but it’s a quick solution for those looking for a code snippet to start with. The package maintainer seems to be pretty active on GitHub, so check their documentation if you need to customize the code.