Locking and Unlocking .pdf files

Here, I’ll be showing you how to set the same password for all .pdf files in a directory simultaneously using python, instead of manually setting passwords for each pdf. This comes in handy for files with over hundreds of .pdf documents. The full code for setting passwords can be found here, and the same for unlocking files is available here.

We will be using the PyPDF2 and os modules. To start off, we’ll get the user to input their password and root directory. Then, we’ll use the os.walk()function to iterate over each file in that directory and its sub-directories.

for (root, dirs, files) in os.walk(path):
	for filename in files:
		if filename.endswith('.pdf'):
		    pdfFile = open(root + '/' + filename, 'rb')
			pdfReader = pypdf2.PdfFileReader(pdfFile)

Above, we store the result of each ‘walk’ as a three-element tuple. We then access each file in the current directory, and use an if statement to check whether or not the file concerned has a .pdf extension. Then, we open the file in a read binary format, and also initialise a pdfFileReader object for the file.

Now, to make an encrypted pdf using PyPDF2, we’ll read one file and copy over its contents into another file, then encrypt the second file and delete the first. Do note that opening the first file in a write binary format won’t delete it, it’ll delete the data inside the file, but not the actual file. To delete files, we can use the os.remove() function.

# inside 'endswith('.pdf')' if

if pdfReader.isEncrypted == False:
    pdfWriter = pypdf2.PdfFileWriter()
    for pageNum in range(pdfReader.numPages):
		pdfWriter.addPage(pdfReader.getPage(pageNum))

	pdfWriter.encrypt(password)

	if filename.endswith('decrypted.pdf'):
		resultPdf = open(root + '/' + filename[:-13] + 'encrypted.pdf', 'wb')
	else:
		resultPdf = open(root + '/' + filename[:-4] + 'encrypted.pdf', 'wb')

	pdfWriter.write(resultPdf)
	os.remove(root + '/' + filename)
	resultPdf.close()
	pdfFile.close()
	

Now, we firstly check whether the file in question is encrypted or not. If not, we initialise a PdfFileWriter object, and copy over the contents of the file to the object. Using the user-inputted password, we encrypt the new file and then save it with a suffix of ‘_encrypted.pdf.’ Then, the os.remove function is used to delete the starting file.

Unlocking files has the same methodology. Using a python program also saves a lot of time when say unlocking hundreds of files. Once again, we use os.walk() to traverse our root directory. Here, we also check for whether the file is encrypted, and proceed only if it is.

if pdfReader.decrypt(password) == 1:
    pdfWriter = pypdf2.PdfFileWriter()
    for pageNum in range(pdfReader.numPages):
        pdfWriter.addPage(pdfReader.getPage(pageNum))
	
	if filename.endswith('encrypted.pdf'):
	    resultPdf = open(root + '/' + filename[:-13] + 'decrypted.pdf', 'wb')
	else:
	    resultPdf = open(root + '/' + filename[:-4] + 'decrypted.pdf', 'wb')
	pdfWriter.write(resultPdf)
	os.remove(root + '/' + filename)
	resultPdf.close()
	pdfFile.close()
else:
    print('password incorrect for {}'.format(root + '/' + filename))
    

If the inputted password works for a file, the file is unlocked, then its contents are copied over into a new file with no password. Then, the encrypted file is deleted, and the new file replaces it. If the inputted password is incorrect, the program outputs the name of the file.

Application-wise, these programs are very efficient. For example, if there is a website with hundreds of free resources (many of which are password protected PDFs), you can download all these files and use the program above to unlock all files simultaneously. Using a brute-force attack (coming soon), we can also download hundreds of files and try to hack them using a spin-off of the program above.