{"id":123164,"date":"2022-03-22T20:07:28","date_gmt":"2022-03-22T20:07:28","guid":{"rendered":"https:\/\/blog.finxter.com\/?p=256441"},"modified":"2022-03-22T20:07:28","modified_gmt":"2022-03-22T20:07:28","slug":"how-to-compress-pdf-files-using-python","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2022\/03\/22\/how-to-compress-pdf-files-using-python\/","title":{"rendered":"How to Compress PDF Files Using Python?"},"content":{"rendered":"<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\">\n<div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"How to Compress PDF Files Using Python?\" width=\"780\" height=\"439\" src=\"https:\/\/www.youtube.com\/embed\/c4mlg-_jS-g?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe>\n<\/div>\n<\/figure>\n<h2>Problem Formulation<\/h2>\n<p>Suppose you have a PDF file, but it\u2019s too large and you\u2019d like to compress it (perhaps you want to reduce its size to allow for faster transfer over the internet, or perhaps to save storage space).\u00a0 <\/p>\n<p>Even more challenging, suppose you have multiple PDF files you\u2019d like to compress.\u00a0 <\/p>\n<p>Multiple online options exist, but these typically allow a limited number of files to be processed at a time.\u00a0 Also of course there is the extra time involved in uploading the originals, then downloading the results.\u00a0 And of course, perhaps you are not comfortable sharing your files with the internet.<\/p>\n<p>Fortunately, we can use Python to address all these concerns.\u00a0 But before we learn how to do this, let\u2019s first learn a little bit about PDF files.<\/p>\n<h2>About Compressing PDF Files<\/h2>\n<p>According to Dov Isaacs, former Adobe Principal Scientist (see his discussion <a href=\"https:\/\/community.adobe.com\/t5\/acrobat-discussions\/compressing-pdf\/td-p\/10950834\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>) PDF documents are already substantially compressed.\u00a0 <\/p>\n<p>The text and vector graphics portions of the documents are already internally zip-compressed, so there is little opportunity for improvement there.\u00a0 <\/p>\n<p>Instead, any file compression improvements are achieved through compression of image portions of PDF documents, along with potential loss of image quality.\u00a0 <\/p>\n<p>So compression might be achievable, but the user must choose between how much compression versus how much image quality loss is acceptable.<\/p>\n<h2>Setup<\/h2>\n<p>A programmer going by the handle <em>Theeko74<\/em> has written a Python script called \u201c<code>pdf_compressor.py<\/code>\u201d. This script is a wrapper for <code>ghostscript<\/code> functions that do the actual work of compressing PDF files.\u00a0 <\/p>\n<p>This script is offered under the MIT license and is free to use as the user wishes.<\/p>\n<p class=\"has-global-color-8-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/13.1.0\/72x72\/1f4a1.png\" alt=\"\ud83d\udca1\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> <strong>Hint<\/strong>: make sure you have <code>ghostscript<\/code> installed on your computer. To install <code>ghostscript<\/code>, follow <a rel=\"noreferrer noopener\" href=\"https:\/\/web.mit.edu\/ghostscript\/www\/Install.htm\" data-type=\"URL\" data-id=\"https:\/\/web.mit.edu\/ghostscript\/www\/Install.htm\" target=\"_blank\">this detailed guide<\/a> and come back afterward.<\/p>\n<p>Now download <code>pdf_compressor.py<\/code> from GitHub <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/theeko74\/pdfc\/blob\/master\/pdf_compressor.py\" target=\"_blank\">here<\/a>.<\/p>\n<ul>\n<li>URL: <a href=\"https:\/\/github.com\/theeko74\/pdfc\/blob\/master\/pdf_compressor.py\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/theeko74\/pdfc\/blob\/master\/pdf_compressor.py<\/a><\/li>\n<\/ul>\n<p>Ultimately we will be writing a Python script to perform the compression.\u00a0 <\/p>\n<p>So we create a directory to hold the script, and use our preferred editor or <a href=\"https:\/\/blog.finxter.com\/best-python-ide\/\" data-type=\"post\" data-id=\"8106\" target=\"_blank\" rel=\"noreferrer noopener\">IDE<\/a> to create it (this example uses Linux command line to make the directory, and uses <code><a href=\"https:\/\/blog.finxter.com\/how-to-edit-a-text-file-in-windows-powershell\/\" data-type=\"post\" data-id=\"236823\" target=\"_blank\" rel=\"noreferrer noopener\">vim<\/a><\/code> as the editor to make script \u201c<code>bpdfc.py<\/code>\u201d; use your preferred choice for creating the directory and creating the script within it):<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">$ mkdir batchPDFcomp\n$ cd batchPDFcomp\n$ vim bpdfc.py<\/pre>\n<p>We won\u2019t write out the script just yet &#8211; we\u2019ll show some details for the script a little later in this article.<\/p>\n<p>When we do write the script, within it we\u2019ll import \u201c<code>pdf_compressor.py<\/code>\u201d as a <a href=\"https:\/\/blog.finxter.com\/python-how-to-import-modules-from-another-folder\/\" data-type=\"post\" data-id=\"19786\" target=\"_blank\" rel=\"noreferrer noopener\">module<\/a>.\u00a0 <\/p>\n<p>To prepare for this we should create a subdirectory below our Python script directory.\u00a0 <\/p>\n<p>Also, we\u2019ll need to copy <code>pdf_compressor.py<\/code> into that subdirectory, and we\u2019ll need to create a file <code><a href=\"https:\/\/blog.finxter.com\/python-init\/\" data-type=\"post\" data-id=\"5133\" target=\"_blank\" rel=\"noreferrer noopener\">__init__.py<\/a><\/code> within the same subdirectory (those are double underscores each side of \u2018<code>init<\/code>\u2019):<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">$ mkdir pdfc\n$ cp ~\/Downloads\/pdf_compressor.py ~\/batchPDFcomp\/pdfc\/\n$ cd pdfc\n$ vim __init__.py<\/pre>\n<p>What we have done here is created a local package <code>pdfc<\/code> containing a module <code>pdf_compressor.py<\/code>.\u00a0 <\/p>\n<p class=\"has-global-color-8-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/13.1.0\/72x72\/1f4a1.png\" alt=\"\ud83d\udca1\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> <strong>Note<\/strong>: The presence of file <code>__init__.py<\/code> indicates to Python that that directory is part of a package, and to look there for modules.<\/p>\n<p>Now we are ready to write our script.<\/p>\n<h2>The PDF Compression Python Script<\/h2>\n<p>Here is our script:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from pdfc.pdf_compressor import compress\ncompress('Finxter_WorldsMostDensePythonCheatSheet.pdf', 'Finxter_WorldsMostDensePythonCheatSheet_compr.pdf', power=4)<\/pre>\n<p>As you can see it\u2019s a very short script.\u00a0 <\/p>\n<p>First we import the \u201c<code>compress<\/code>\u201d function from \u201c<code>pdf_compressor<\/code>\u201d module.\u00a0 <\/p>\n<p>Then we call the \u201c<code>compress<\/code>\u201d function.\u00a0 The function takes as arguments: the input file path, the output file path, and a \u2018<code>power<\/code>\u2019 argument that sets compression as follows, from <strong><em>least<\/em><\/strong> compression to <strong><em>most <\/em><\/strong>(according to the documentation in the script):<\/p>\n<p>Compression levels:<\/p>\n<ul>\n<li><code>0: default<\/code><\/li>\n<li><code>1: prepress<\/code><\/li>\n<li><code>2: printer<\/code><\/li>\n<li><code>3: ebook<\/code><\/li>\n<li><code>4: screen<\/code><\/li>\n<\/ul>\n<h2>Running the Script<\/h2>\n<p>Now we can run our script:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">$ python bpdfc.py\nCompress PDF...\nCompression by 51%.\nFinal file size is 0.2MB\nDone.\n$ <\/pre>\n<p>We have only compressed one PDF document in this example, but by modifying the script to loop through multiple PDF documents one can compress multiple files at once.\u00a0 <\/p>\n<p>However, we leave that as an exercise for the reader!<\/p>\n<p>We hope you have found this article useful. Thank you for reading, and we wish you happy coding!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Problem Formulation Suppose you have a PDF file, but it\u2019s too large and you\u2019d like to compress it (perhaps you want to reduce its size to allow for faster transfer over the internet, or perhaps to save storage space).\u00a0 Even more challenging, suppose you have multiple PDF files you\u2019d like to compress.\u00a0 Multiple online options [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[857],"tags":[73,468,528],"class_list":["post-123164","post","type-post","status-publish","format-standard","hentry","category-python-tut","tag-programming","tag-python","tag-tutorial"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/123164","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=123164"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/123164\/revisions"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=123164"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=123164"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=123164"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}