Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] How to Compress PDF Files Using Python?

#1
How to Compress PDF Files Using Python?

<div><figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio">
<div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="How to Compress PDF Files Using Python?" width="780" height="439" src="https://www.youtube.com/embed/c4mlg-_jS-g?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
</figure>
<h2>Problem Formulation</h2>
<p>Suppose you have a PDF file, but it’s too large and you’d like to compress it (perhaps you want to reduce its size to allow for faster transfer over the internet, or perhaps to save storage space).  </p>
<p>Even more challenging, suppose you have multiple PDF files you’d like to compress.  </p>
<p>Multiple online options exist, but these typically allow a limited number of files to be processed at a time.  Also of course there is the extra time involved in uploading the originals, then downloading the results.  And of course, perhaps you are not comfortable sharing your files with the internet.</p>
<p>Fortunately, we can use Python to address all these concerns.  But before we learn how to do this, let’s first learn a little bit about PDF files.</p>
<h2>About Compressing PDF Files</h2>
<p>According to Dov Isaacs, former Adobe Principal Scientist (see his discussion <a href="https://community.adobe.com/t5/acrobat-discussions/compressing-pdf/td-p/10950834" target="_blank" rel="noreferrer noopener">here</a>) PDF documents are already substantially compressed.  </p>
<p>The text and vector graphics portions of the documents are already internally zip-compressed, so there is little opportunity for improvement there.  </p>
<p>Instead, any file compression improvements are achieved through compression of image portions of PDF documents, along with potential loss of image quality.  </p>
<p>So compression might be achievable, but the user must choose between how much compression versus how much image quality loss is acceptable.</p>
<h2>Setup</h2>
<p>A programmer going by the handle <em>Theeko74</em> has written a Python script called “<code>pdf_compressor.py</code>”. This script is a wrapper for <code>ghostscript</code> functions that do the actual work of compressing PDF files.  </p>
<p>This script is offered under the MIT license and is free to use as the user wishes.</p>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Hint</strong>: make sure you have <code>ghostscript</code> installed on your computer. To install <code>ghostscript</code>, follow <a rel="noreferrer noopener" href="https://web.mit.edu/ghostscript/www/Install.htm" data-type="URL" data-id="https://web.mit.edu/ghostscript/www/Install.htm" target="_blank">this detailed guide</a> and come back afterward.</p>
<p>Now download <code>pdf_compressor.py</code> from GitHub <a rel="noreferrer noopener" href="https://github.com/theeko74/pdfc/blob/master/pdf_compressor.py" target="_blank">here</a>.</p>
<ul>
<li>URL: <a href="https://github.com/theeko74/pdfc/blob/master/pdf_compressor.py" target="_blank" rel="noreferrer noopener">https://github.com/theeko74/pdfc/blob/master/pdf_compressor.py</a></li>
</ul>
<p>Ultimately we will be writing a Python script to perform the compression.  </p>
<p>So we create a directory to hold the script, and use our preferred editor or <a href="https://blog.finxter.com/best-python-ide/" data-type="post" data-id="8106" target="_blank" rel="noreferrer noopener">IDE</a> to create it (this example uses Linux command line to make the directory, and uses <code><a href="https://blog.finxter.com/how-to-edit-a-text-file-in-windows-powershell/" data-type="post" data-id="236823" target="_blank" rel="noreferrer noopener">vim</a></code> as the editor to make script “<code>bpdfc.py</code>”; use your preferred choice for creating the directory and creating the script within it):</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">$ mkdir batchPDFcomp
$ cd batchPDFcomp
$ vim bpdfc.py</pre>
<p>We won’t write out the script just yet – we’ll show some details for the script a little later in this article.</p>
<p>When we do write the script, within it we’ll import “<code>pdf_compressor.py</code>” as a <a href="https://blog.finxter.com/python-how-to-import-modules-from-another-folder/" data-type="post" data-id="19786" target="_blank" rel="noreferrer noopener">module</a>.  </p>
<p>To prepare for this we should create a subdirectory below our Python script directory.  </p>
<p>Also, we’ll need to copy <code>pdf_compressor.py</code> into that subdirectory, and we’ll need to create a file <code><a href="https://blog.finxter.com/python-init/" data-type="post" data-id="5133" target="_blank" rel="noreferrer noopener">__init__.py</a></code> within the same subdirectory (those are double underscores each side of ‘<code>init</code>’):</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">$ mkdir pdfc
$ cp ~/Downloads/pdf_compressor.py ~/batchPDFcomp/pdfc/
$ cd pdfc
$ vim __init__.py</pre>
<p>What we have done here is created a local package <code>pdfc</code> containing a module <code>pdf_compressor.py</code>.  </p>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Note</strong>: The presence of file <code>__init__.py</code> indicates to Python that that directory is part of a package, and to look there for modules.</p>
<p>Now we are ready to write our script.</p>
<h2>The PDF Compression Python Script</h2>
<p>Here is our script:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from pdfc.pdf_compressor import compress
compress('Finxter_WorldsMostDensePythonCheatSheet.pdf', 'Finxter_WorldsMostDensePythonCheatSheet_compr.pdf', power=4)</pre>
<p>As you can see it’s a very short script.  </p>
<p>First we import the “<code>compress</code>” function from “<code>pdf_compressor</code>” module.  </p>
<p>Then we call the “<code>compress</code>” function.  The function takes as arguments: the input file path, the output file path, and a ‘<code>power</code>’ argument that sets compression as follows, from <strong><em>least</em></strong> compression to <strong><em>most </em></strong>(according to the documentation in the script):</p>
<p>Compression levels:</p>
<ul>
<li><code>0: default</code></li>
<li><code>1: prepress</code></li>
<li><code>2: printer</code></li>
<li><code>3: ebook</code></li>
<li><code>4: screen</code></li>
</ul>
<h2>Running the Script</h2>
<p>Now we can run our script:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">$ python bpdfc.py
Compress PDF...
Compression by 51%.
Final file size is 0.2MB
Done.
$ </pre>
<p>We have only compressed one PDF document in this example, but by modifying the script to loop through multiple PDF documents one can compress multiple files at once.  </p>
<p>However, we leave that as an exercise for the reader!</p>
<p>We hope you have found this article useful. Thank you for reading, and we wish you happy coding!</p>
</div>


https://www.sickgaming.net/blog/2022/03/...ng-python/
Reply



Forum Jump:


Users browsing this thread:
2 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016