12-25-2020, 08:09 AM
6 Best Python NLP Libraries
<div><p>If you are a data scientist or aspire to be one investing your time in learning <strong>natural language processing (NLP)</strong> will be an investment in your future. 2020 saw a surge in the field of natural language processing. In this blog post you will discover 5 popular NLP libraries, and it’s applications.</p>
<h2>Preprocessing Libraries</h2>
<p>Preprocessing a crucial step in any machine learning pipeline. If you are building a language model you would have to create a word vector which involves removing stop words, and converting words to its root form.</p>
<h3>#1 Spacy</h3>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" width="1024" height="342" src="https://blog.finxter.com/wp-content/uploads/2020/12/image-94-1024x342.png" alt="" class="wp-image-19455" srcset="https://blog.finxter.com/wp-content/uploads/2020/12/image-94.png 1024w, https://blog.finxter.com/wp-content/uplo...00x100.png 300w, https://blog.finxter.com/wp-content/uplo...68x257.png 768w, https://blog.finxter.com/wp-content/uplo...36x513.png 1536w, https://blog.finxter.com/wp-content/uplo...150x50.png 150w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
</div>
<p><a href="https://spacy.io/" target="_blank" rel="noreferrer noopener">Spacy</a> is a popular Python library for sentence tokenization, lemmatization, and stemming. It is an industry grade library which can be used for text preprocessing and training deep learning based text classifiers.</p>
<p>Getting started with Spacy: Named Entity Recognition is an important task in natural language processing. NER helps in extracting important entities like location, organization names, etc.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import spacy # python -m spacy download en_core_web_sm
nlp = spacy.load('en_core_web_sm') sentences = ['Stockholm is a beautiful city', 'Mumbai is a vibrant city' ] for sentence in sentences: doc = nlp(sentence) for entity in doc.ents: print(entity.text, entity.label_) print(spacy.explain(entity.label_))
</pre>
<p>The above code processes the two sentences and extracts the location in both sentences.</p>
<p>Let us now see the output</p>
<figure class="wp-block-image size-large"><img loading="lazy" width="617" height="59" src="https://blog.finxter.com/wp-content/uploads/2020/12/image-92.png" alt="" class="wp-image-19447" srcset="https://blog.finxter.com/wp-content/uploads/2020/12/image-92.png 617w, https://blog.finxter.com/wp-content/uplo...300x29.png 300w, https://blog.finxter.com/wp-content/uplo...150x14.png 150w" sizes="(max-width: 617px) 100vw, 617px" /></figure>
<p>As seen from the output the code was able to extract Stockholm and Mumbai and associated them with the GPE label which indicates countries, cities, or states.</p>
<h3>#2 NLTK</h3>
<p><a href="https://www.nltk.org/" target="_blank" rel="noreferrer noopener">NLTK</a> is another popular Python library for text preprocessing. It was started as an academic project and soon became very popular amongst researchers and academicians.</p>
<p>Let us see how we can do Part of Speech Tagging using NLTK. Part of speech tagging is used to extract the important part of speech like nouns, pronouns, adverbs, adjectives, etc.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import nltk
import os sentence = "Python is a beautiful programming language."
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
entities = nltk.chunk.ne_chunk(tagged)
print(entities)
</pre>
<p>The parts of speech that were extract from the above sentence are</p>
<pre class="wp-block-preformatted"><code> (S (GPE Python/NNP) is/VBZ a/DT beautiful/JJ programming/NN language/NN ./.) </code></pre>
<h2>Applications</h2>
<p>A popular application of NLP is to categorize a document into a given set of labels. There are a number of Python libraries which can help you to train deep learning based models for topic modeling, text summarization, sentiment analysis etc. Let us have a look at some of these popular libraries</p>
<p>Most deep learning based NLP models rely on pretrained language models using a process called transfer learning. A huge corpus of document is trained and then this model can be fine-tuned for a specific domain. Some popular libraries which help in using pretrained models and building industry grade NLP applications are</p>
<h3>#3 FARM</h3>
<p><a href="https://github.com/deepset-ai/FARM#what-is-it" target="_blank" rel="noreferrer noopener">Farm</a> is a popular open source package developed by a Berlin based company. It is used to make the life of developers easier by providing some nice functionalities like experiment tracking, multitask-learning and parallelized processing of documents.</p>
<h3>#4 Flair </h3>
<p><a href="https://github.com/flairNLP/flair" target="_blank" rel="noreferrer noopener">Flair</a> is a popular PyTorch based framework which helps developers to build state of the NLP applications like named entity recognition, part-of-speech tagging, sense disambiguation and classification.</p>
<h3>#5 Transformers</h3>
<p><a href="https://huggingface.co/transformers/" target="_blank" rel="noreferrer noopener">Transformers</a> is a popular Python library to easily access pretrained models and has support for both PyTorch and TensorFlow. If you want to build an entire NLP pipeline by using pretrained models for Natural language understanding and generation tasks transformers will make your life easier.</p>
<h3>#6 Gensim</h3>
<div class="wp-block-image">
<figure class="aligncenter size-large"><a href="https://radimrehurek.com/gensim/index.html" target="_blank" rel="noopener noreferrer"><img loading="lazy" width="1024" height="389" src="https://blog.finxter.com/wp-content/uploads/2020/12/image-96-1024x389.png" alt="" class="wp-image-19459" srcset="https://blog.finxter.com/wp-content/uploads/2020/12/image-96.png 1024w, https://blog.finxter.com/wp-content/uplo...00x114.png 300w, https://blog.finxter.com/wp-content/uplo...68x292.png 768w, https://blog.finxter.com/wp-content/uplo...36x584.png 1536w, https://blog.finxter.com/wp-content/uplo...150x57.png 150w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>
</div>
<p><a href="https://radimrehurek.com/gensim/index.html" target="_blank" rel="noreferrer noopener">Gensim</a> is another popular Python library widely used for topic modelling and provides an easy-to-use interface for popular algorithms like word2vec to find synonymous words.</p>
<p>The post <a href="https://blog.finxter.com/6-best-python-nlp-libraries/" target="_blank" rel="noopener noreferrer">6 Best Python NLP Libraries</a> first appeared on <a href="https://blog.finxter.com/" target="_blank" rel="noopener noreferrer">Finxter</a>.</p>
</div>
https://www.sickgaming.net/blog/2020/12/...libraries/
<div><p>If you are a data scientist or aspire to be one investing your time in learning <strong>natural language processing (NLP)</strong> will be an investment in your future. 2020 saw a surge in the field of natural language processing. In this blog post you will discover 5 popular NLP libraries, and it’s applications.</p>
<h2>Preprocessing Libraries</h2>
<p>Preprocessing a crucial step in any machine learning pipeline. If you are building a language model you would have to create a word vector which involves removing stop words, and converting words to its root form.</p>
<h3>#1 Spacy</h3>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" width="1024" height="342" src="https://blog.finxter.com/wp-content/uploads/2020/12/image-94-1024x342.png" alt="" class="wp-image-19455" srcset="https://blog.finxter.com/wp-content/uploads/2020/12/image-94.png 1024w, https://blog.finxter.com/wp-content/uplo...00x100.png 300w, https://blog.finxter.com/wp-content/uplo...68x257.png 768w, https://blog.finxter.com/wp-content/uplo...36x513.png 1536w, https://blog.finxter.com/wp-content/uplo...150x50.png 150w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
</div>
<p><a href="https://spacy.io/" target="_blank" rel="noreferrer noopener">Spacy</a> is a popular Python library for sentence tokenization, lemmatization, and stemming. It is an industry grade library which can be used for text preprocessing and training deep learning based text classifiers.</p>
<p>Getting started with Spacy: Named Entity Recognition is an important task in natural language processing. NER helps in extracting important entities like location, organization names, etc.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import spacy # python -m spacy download en_core_web_sm
nlp = spacy.load('en_core_web_sm') sentences = ['Stockholm is a beautiful city', 'Mumbai is a vibrant city' ] for sentence in sentences: doc = nlp(sentence) for entity in doc.ents: print(entity.text, entity.label_) print(spacy.explain(entity.label_))
</pre>
<p>The above code processes the two sentences and extracts the location in both sentences.</p>
<p>Let us now see the output</p>
<figure class="wp-block-image size-large"><img loading="lazy" width="617" height="59" src="https://blog.finxter.com/wp-content/uploads/2020/12/image-92.png" alt="" class="wp-image-19447" srcset="https://blog.finxter.com/wp-content/uploads/2020/12/image-92.png 617w, https://blog.finxter.com/wp-content/uplo...300x29.png 300w, https://blog.finxter.com/wp-content/uplo...150x14.png 150w" sizes="(max-width: 617px) 100vw, 617px" /></figure>
<p>As seen from the output the code was able to extract Stockholm and Mumbai and associated them with the GPE label which indicates countries, cities, or states.</p>
<h3>#2 NLTK</h3>
<p><a href="https://www.nltk.org/" target="_blank" rel="noreferrer noopener">NLTK</a> is another popular Python library for text preprocessing. It was started as an academic project and soon became very popular amongst researchers and academicians.</p>
<p>Let us see how we can do Part of Speech Tagging using NLTK. Part of speech tagging is used to extract the important part of speech like nouns, pronouns, adverbs, adjectives, etc.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import nltk
import os sentence = "Python is a beautiful programming language."
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
entities = nltk.chunk.ne_chunk(tagged)
print(entities)
</pre>
<p>The parts of speech that were extract from the above sentence are</p>
<pre class="wp-block-preformatted"><code> (S (GPE Python/NNP) is/VBZ a/DT beautiful/JJ programming/NN language/NN ./.) </code></pre>
<h2>Applications</h2>
<p>A popular application of NLP is to categorize a document into a given set of labels. There are a number of Python libraries which can help you to train deep learning based models for topic modeling, text summarization, sentiment analysis etc. Let us have a look at some of these popular libraries</p>
<p>Most deep learning based NLP models rely on pretrained language models using a process called transfer learning. A huge corpus of document is trained and then this model can be fine-tuned for a specific domain. Some popular libraries which help in using pretrained models and building industry grade NLP applications are</p>
<h3>#3 FARM</h3>
<p><a href="https://github.com/deepset-ai/FARM#what-is-it" target="_blank" rel="noreferrer noopener">Farm</a> is a popular open source package developed by a Berlin based company. It is used to make the life of developers easier by providing some nice functionalities like experiment tracking, multitask-learning and parallelized processing of documents.</p>
<h3>#4 Flair </h3>
<p><a href="https://github.com/flairNLP/flair" target="_blank" rel="noreferrer noopener">Flair</a> is a popular PyTorch based framework which helps developers to build state of the NLP applications like named entity recognition, part-of-speech tagging, sense disambiguation and classification.</p>
<h3>#5 Transformers</h3>
<p><a href="https://huggingface.co/transformers/" target="_blank" rel="noreferrer noopener">Transformers</a> is a popular Python library to easily access pretrained models and has support for both PyTorch and TensorFlow. If you want to build an entire NLP pipeline by using pretrained models for Natural language understanding and generation tasks transformers will make your life easier.</p>
<h3>#6 Gensim</h3>
<div class="wp-block-image">
<figure class="aligncenter size-large"><a href="https://radimrehurek.com/gensim/index.html" target="_blank" rel="noopener noreferrer"><img loading="lazy" width="1024" height="389" src="https://blog.finxter.com/wp-content/uploads/2020/12/image-96-1024x389.png" alt="" class="wp-image-19459" srcset="https://blog.finxter.com/wp-content/uploads/2020/12/image-96.png 1024w, https://blog.finxter.com/wp-content/uplo...00x114.png 300w, https://blog.finxter.com/wp-content/uplo...68x292.png 768w, https://blog.finxter.com/wp-content/uplo...36x584.png 1536w, https://blog.finxter.com/wp-content/uplo...150x57.png 150w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>
</div>
<p><a href="https://radimrehurek.com/gensim/index.html" target="_blank" rel="noreferrer noopener">Gensim</a> is another popular Python library widely used for topic modelling and provides an easy-to-use interface for popular algorithms like word2vec to find synonymous words.</p>
<p>The post <a href="https://blog.finxter.com/6-best-python-nlp-libraries/" target="_blank" rel="noopener noreferrer">6 Best Python NLP Libraries</a> first appeared on <a href="https://blog.finxter.com/" target="_blank" rel="noopener noreferrer">Finxter</a>.</p>
</div>
https://www.sickgaming.net/blog/2020/12/...libraries/