{"id":129863,"date":"2022-11-18T20:19:12","date_gmt":"2022-11-18T20:19:12","guid":{"rendered":"https:\/\/blog.finxter.com\/?p=903641"},"modified":"2022-11-18T20:19:12","modified_gmt":"2022-11-18T20:19:12","slug":"using-pytorch-to-build-a-working-neural-network","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2022\/11\/18\/using-pytorch-to-build-a-working-neural-network\/","title":{"rendered":"Using PyTorch to Build a Working Neural Network"},"content":{"rendered":"\n<div class=\"kk-star-ratings kksr-auto kksr-align-left kksr-valign-top\" data-payload='{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;903641&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;top&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;1&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;Rate this post&quot;,&quot;legend&quot;:&quot;5\\\/5 - (1 vote)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;width&quot;:&quot;142.5&quot;,&quot;_legend&quot;:&quot;{score}\\\/{best} - ({count} {votes})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>\n<div class=\"kksr-stars\">\n<div class=\"kksr-stars-inactive\">\n<div class=\"kksr-star\" data-star=\"1\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" data-star=\"2\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" data-star=\"3\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" data-star=\"4\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" data-star=\"5\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<div class=\"kksr-stars-active\" style=\"width: 142.5px;\">\n<div class=\"kksr-star\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<div class=\"kksr-legend\" style=\"font-size: 19.2px;\"> 5\/5 &#8211; (1 vote) <\/div>\n<\/div>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"682\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-201-1024x682.png\" alt=\"\" class=\"wp-image-904029\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-201-1024x682.png 1024w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-201-300x200.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-201-768x512.png 768w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-201.png 1255w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n<p>In this article, we will use <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/pytorch-developer-income-and-opportunity\/\" data-type=\"post\" data-id=\"255891\" target=\"_blank\">PyTorch<\/a> to build a working neural network. Specifically, this network will be trained to recognize handwritten numerical digits using the famous MNIST dataset.<\/p>\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube\"><a href=\"https:\/\/blog.finxter.com\/using-pytorch-to-build-a-working-neural-network\/\"><img decoding=\"async\" src=\"https:\/\/blog.finxter.com\/wp-content\/plugins\/wp-youtube-lyte\/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2Fe02w3bKhFe8%2Fhqdefault.jpg\" alt=\"YouTube Video\"><\/a><figcaption><\/figcaption><\/figure>\n<p>The code in this article borrows heavily from the PyTorch tutorial <a href=\"https:\/\/pytorch.org\/tutorials\/beginner\/basics\/intro.html#learn-the-basics\" target=\"_blank\" rel=\"noreferrer noopener\">&#8220;Learn the Basics&#8221;<\/a>. We do this for several reasons. <\/p>\n<ul>\n<li>First, that tutorial is pretty good at demonstrating the essentials for getting a working neural network. <\/li>\n<li>Second, just like importing libraries, it&#8217;s good to not reinvent the wheel when you don&#8217;t have to. <\/li>\n<li>Third, when building your own network, it is very helpful to start with something that is known to work, then modify it to your needs.<\/li>\n<\/ul>\n<h2>Knowledge Background<\/h2>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"682\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-202-1024x682.png\" alt=\"\" class=\"wp-image-904030\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-202-1024x682.png 1024w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-202-300x200.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-202-768x512.png 768w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-202.png 1255w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<p>This article assumes the reader has some necessary background:<\/p>\n<ol>\n<li>Familiarity with <a href=\"https:\/\/blog.finxter.com\/python-crash-course\/\" data-type=\"post\" data-id=\"3951\" target=\"_blank\" rel=\"noreferrer noopener\">Python<\/a>, and Python <a href=\"https:\/\/blog.finxter.com\/introduction-to-python-classes\/\" data-type=\"post\" data-id=\"30596\" target=\"_blank\" rel=\"noreferrer noopener\">object-oriented programming<\/a>.<\/li>\n<li>Familiarity with how neural networks work. See the Finxter article <a href=\"https:\/\/blog.finxter.com\/the-magic-of-neural-networks-how-they-work\/\" target=\"_blank\" rel=\"noreferrer noopener\">&#8220;The Magic of Neural Networks: History and Concepts&#8221;<\/a> to learn the basic ideas.<\/li>\n<li>Familiarity with how neural networks learn. See the Finxter article <a href=\"https:\/\/blog.finxter.com\/how-neural-networks-learn\/\" target=\"_blank\" rel=\"noreferrer noopener\">&#8220;How Neural Networks Learn&#8221;<\/a> to learn this subject.<\/li>\n<li>Familiarity with tensors. See the Finxter article <a href=\"https:\/\/blog.finxter.com\/tensors-the-vocabulary-of-neural-networks\/\" target=\"_blank\" rel=\"noreferrer noopener\">&#8220;Tensors: the Vocabulary of Neural Networks&#8221;<\/a> to learn this subject.<\/li>\n<li>Familiarity with <a href=\"https:\/\/blog.finxter.com\/matplotlib-full-guide\/\" data-type=\"post\" data-id=\"20151\" target=\"_blank\" rel=\"noreferrer noopener\">Matplotlib<\/a>. While this is not necessary to follow along, it is necessary if you want to be able to view image data yourself on your own datasets in the future (and you <em>will<\/em> want to be able to do this).<\/li>\n<\/ol>\n<p>You can run PyTorch on your own machine, or you can run it on publically available computer systems. <\/p>\n<p>We will be running this exercise using Google Colab, which allows running world-class computing capability, all accessible for free. <\/p>\n<p class=\"has-base-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/14.0.0\/72x72\/1f30d.png\" alt=\"\ud83c\udf0d\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> <strong>Recommended<\/strong>: Other options for publically available computing are shown in the Finxter article <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/survey-of-python-online-notebook-options\/\" target=\"_blank\">&#8220;Top 4 Jupyter Notebook Alternatives for Machine Learning&#8221;<\/a>.<\/p>\n<h2>Process Overview<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"625\" height=\"938\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-203.png\" alt=\"\" class=\"wp-image-904033\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-203.png 625w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-203-200x300.png 200w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/figure>\n<\/div>\n<p>This article will cover all the necessary steps to build and test a working neural network using the <a href=\"https:\/\/blog.finxter.com\/how-to-install-pytorch-on-pycharm\/\" data-type=\"post\" data-id=\"35142\" target=\"_blank\" rel=\"noreferrer noopener\">PyTorch library<\/a>. <\/p>\n<p>PyTorch provides a framework that makes building, training, and using <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/tutorial-how-to-create-your-first-neural-network-in-1-line-of-python-code\/\" data-type=\"post\" data-id=\"2463\" target=\"_blank\">neural netwo<\/a>rks easier. Also under the hood, it is written using the very fast <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/c-plus-plus-developer-income-and-opportunity\/\" data-type=\"post\" data-id=\"196896\" target=\"_blank\">C++<\/a> language, so that those neural networks can provide world-class performance while using the popular Python language as the interface to create those networks.<\/p>\n<p>Neural networks and the PyTorch library are rich subjects. So while we will cover all the necessary steps, each step will just scratch the surface of its respective subject. <\/p>\n<p>For example, we will get the image data from datasets built into the PyTorch library. However, the user will eventually want to use neural networks on their own data, so the users will need to learn how to build and work with their own datasets. <\/p>\n<p>So for each of these steps, the user will want to learn more on each subject to become a proficient PyTorch user.<\/p>\n<p>Nevertheless, by the end of this article, you will have built your own working neural network, so you can be sure you will know how to do it! <\/p>\n<p>Further learning will enrich those abilities. Throughout the article, we will point out some of the other things you will eventually want to learn for each step.<\/p>\n<p>Here are the steps we will be taking:<\/p>\n<ol>\n<li>Import necessary libraries.<\/li>\n<li>Acquire the data.<\/li>\n<li>Review the data to understand it.<\/li>\n<li>Create data loaders for loading the data into the network.<\/li>\n<li>Design and create the neural network.<\/li>\n<li>Specify the loss measure and the optimizer algorithm.<\/li>\n<li>Specify the training and testing functions.<\/li>\n<li>Train and test the network using the specified functions.<\/li>\n<\/ol>\n<h2>Step 1: Import Necessary Libraries<a href=\"https:\/\/docs.google.com\/document\/d\/1ChXcbOjMg_yJBiWl8_GRcCDE_s7PprXAm3_wYWcYD2U\/edit#bookmark=id.3znysh7\"><\/a><\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"682\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-204-1024x682.png\" alt=\"\" class=\"wp-image-904037\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-204-1024x682.png 1024w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-204-300x200.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-204-768x512.png 768w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-204.png 1255w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n<p>Before we do anything, we will want to set up our runtime to use the GPU (again, assuming here you are using Colab). <\/p>\n<p>Click on <strong>&#8220;Runtime&#8221;<\/strong> in the top menu bar, and then choose <strong>&#8220;Change runtime type&#8221;<\/strong> from the dropdown. Then from the window that pops up choose <strong>&#8220;GPU&#8221;<\/strong> under <strong>&#8220;Hardware accelerator&#8221;<\/strong>, and then click <strong>&#8220;Save&#8221;<\/strong>.<\/p>\n<p>Next, we will need to import a number of libraries:<\/p>\n<ol>\n<li>We will import the <code>torch<\/code> library, making PyTorch available for use.<\/li>\n<li>From the <code>torch<\/code> module we will import the <code>nn<\/code> library, which is important for building the neural network.<\/li>\n<li>From the <code>torchvision<\/code> module we will import the <code>datasets<\/code> library, which will help provide the image datasets.<\/li>\n<li>From the <code>data<\/code> utilities module, we will import the <code>DataLoader<\/code> library. Data loaders help load data into the network.<\/li>\n<li>From the <code>torchvision.transforms<\/code> module we will import the <code>ToTensor<\/code> library. This converts the image data into tensors so that they are ready to be processed through the network.<\/li>\n<\/ol>\n<p>Here is the code importing the needed modules:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import torch\nfrom torch import nn\nfrom torchvision import datasets\nfrom torch.utils.data import DataLoader\nfrom torchvision.transforms import ToTensor<\/pre>\n<h2>Step 2: Acquire the Data<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"682\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-205-1024x682.png\" alt=\"\" class=\"wp-image-904041\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-205-1024x682.png 1024w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-205-300x200.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-205-768x512.png 768w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-205.png 1255w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n<p>As mentioned before, in this exercise, we will be getting the MNIST data as available directly through PyTorch libraries. This is the quickest and easiest approach to getting the data.<\/p>\n<p>If you wanted to get the original datasets they are available at:<\/p>\n<p><a href=\"http:\/\/yann.lecun.com\/exdb\/mnist\/\" target=\"_blank\" rel=\"noreferrer noopener\">http:\/\/yann.lecun.com\/exdb\/mnist\/<\/a><\/p>\n<p>Even though we will get the data through the PyTorch libraries, it can still be helpful to review this page, as it provides some useful information about the dataset. (However we will provide everything you need to understand this dataset in the article).<\/p>\n<p class=\"has-global-color-8-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/14.0.0\/72x72\/1f4a1.png\" alt=\"\ud83d\udca1\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> <strong>Note<\/strong>: Firefox has trouble accessing this page, for some reason requiring a login to access it. Either view it using another browser, or view it as recorded on the Internet Archive Wayback Machine.<\/p>\n<p>There are multiple datasets available through the PyTorch dataset libraries. Here are PyTorch webpages linking to <a href=\"https:\/\/pytorch.org\/vision\/stable\/datasets.html\" target=\"_blank\" rel=\"noreferrer noopener\">Image Datasets<\/a>, <a href=\"https:\/\/pytorch.org\/text\/stable\/datasets.html\" target=\"_blank\" rel=\"noreferrer noopener\">Text Datasets<\/a>, and <a href=\"https:\/\/pytorch.org\/audio\/stable\/datasets.html\" target=\"_blank\" rel=\"noreferrer noopener\">Audio Datasets<\/a>.<\/p>\n<p>To get data from a PyTorch dataset we create an instance from the respective dataset class. Here is the format:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">dataset_instance = DatasetClass(parameters)<\/pre>\n<p>This creates a dataset object, and downloads the data. The data is then available by working with the dataset object.<\/p>\n<p>Here is the code to create our MNIST datasets:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Download MNIST data, put it in pytorch dataset\nmnist_data = datasets.MNIST( root='mnist_nn', train=True, download=True, transform=ToTensor()\n)\n<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">mnist_test_data = datasets.MNIST( root='mnist_nn', train=False, download=True, transform=ToTensor()\n)<\/pre>\n<p>You&#8217;ll use these parameters:<\/p>\n<ul>\n<li>The <code>root<\/code> parameter specifies the directory where the downloaded data will be placed. <\/li>\n<li>The <code>train<\/code> parameter determines whether training or testing data is downloaded. <\/li>\n<li>The <code>download=True<\/code> parameter confirms the data should be downloaded if it hasn&#8217;t been already. <\/li>\n<li>The <code>transform<\/code> parameter converts the data into <a href=\"https:\/\/blog.finxter.com\/tensors-the-vocabulary-of-neural-networks\/\" data-type=\"post\" data-id=\"616223\" target=\"_blank\" rel=\"noreferrer noopener\">tensors<\/a>, in this case.<\/li>\n<\/ul>\n<p>What parameters are available vary from dataset to dataset, as does how the data is structured, so refer to the dataset web pages mentioned above to review the details of what is available and needed.<\/p>\n<p>While this method of getting data is convenient and easy, remember that you will eventually want to work with your own data, so eventually, you will want to learn how to create your own datasets.<\/p>\n<p>Also, not all datasets contain images with uniform image size, so images may need to be cropped or stretched to fit the fixed number of input neurons. <\/p>\n<p>Also, other transformations can be helpful as well. <\/p>\n<p>For example, you can effectively expand your dataset by including <code>subcrops<\/code> from your original dataset as additional images to train on. So data transformations is something else you will want to learn that you might use at this stage in the process.<\/p>\n<h2>Step 3: Review the Dataset<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"682\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-206-1024x682.png\" alt=\"\" class=\"wp-image-904043\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-206-1024x682.png 1024w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-206-300x200.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-206-768x512.png 768w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-206.png 1255w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n<p>Now that we have downloaded the data and created a dataset, let&#8217;s review the dataset to understand its contents and structure.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">type(mnist_data)\n# torchvision.datasets.mnist.MNIST<\/pre>\n<\/p>\n<p>The <code><a href=\"https:\/\/blog.finxter.com\/python-type\/\" data-type=\"post\" data-id=\"23967\" target=\"_blank\" rel=\"noreferrer noopener\">type()<\/a><\/code> function shows that our dataset is an object of the MNIST dataset class.<\/p>\n<p>Conveniently, PyTorch datasets have been designed to be indexed like lists. Let&#8217;s take advantage of this and use the <code><a href=\"https:\/\/blog.finxter.com\/python-len\/\" data-type=\"post\" data-id=\"22386\" target=\"_blank\" rel=\"noreferrer noopener\">len()<\/a><\/code> function to learn something about our datasets:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">len(mnist_data)\n# 60000<\/pre>\n<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">len(mnist_test_data)\n# 10000<\/pre>\n<p>So our training dataset contains 60000 items, and our test dataset contains 10000 items, consistent with the number of images specified to be in each respective dataset.<\/p>\n<p>Let&#8217;s use the <code>type()<\/code> and <code>len()<\/code> functions to examine the first item in the training dataset:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">type(mnist_data[0])\n# tuple<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">len(mnist_data[0])\n# 2<\/pre>\n<\/p>\n<p>So the items in the datasets are tuples containing 2 items.<\/p>\n<p>Let&#8217;s use the <code>type()<\/code> function to learn about the first item in the <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/the-ultimate-guide-to-python-tuples\/\" data-type=\"post\" data-id=\"12043\" target=\"_blank\">tuple<\/a>:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">type(mnist_data[0][0])\n# torch.Tensor<\/pre>\n<p>So the first item in the tuple is a tensor, likely some image data.<\/p>\n<p>Let&#8217;s examine the shape attribute of the tensor to understand its shape:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">mnist_data[0][0].shape\n# torch.Size([1, 28, 28])<\/pre>\n<p>This is consistent with the 28*28 pixel structure of the image data, plus one additional dimension containing the entire image data.<\/p>\n<p>Let&#8217;s examine the second item in the tuple:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">type(mnist_data[0][1])\n# int<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">mnist_data[0][1]\n# 5<\/pre>\n<\/p>\n<p>So the second item is the integer <code>'5'<\/code>, apparently the label for an image of the digit <code>'5'<\/code>.<\/p>\n<p>Let&#8217;s use Matplotlib to view the image:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import matplotlib.pyplot as plt\nplt.imshow(mnist_data[0][0], cmap='gray')<\/pre>\n<p>Output:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">TypeError Traceback (most recent call last)\n&lt;ipython-input-14-3e7278364eac> in &lt;module>\n----> 1 plt.imshow(mnist_data[0][0], cmap='gray') \/usr\/local\/lib\/python3.7\/dist-packages\/matplotlib\/pyplot.py in imshow(X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, shape, filternorm, filterrad, imlim, resample, url, data, **kwargs) 2649 filternorm=filternorm, filterrad=filterrad, imlim=imlim, 2650 resample=resample, url=url, **({\"data\": data} if data is not\n-> 2651 None else {}), **kwargs) 2652 sci(__ret) 2653 return __ret \/usr\/local\/lib\/python3.7\/dist-packages\/matplotlib\/__init__.py in inner(ax, data, *args, **kwargs) 1563 def inner(ax, *args, data=None, **kwargs): 1564 if data is None:\n-> 1565 return func(ax, *map(sanitize_sequence, args), **kwargs) 1566 1567 bound = new_sig.bind(ax, *args, **kwargs) \/usr\/local\/lib\/python3.7\/dist-packages\/matplotlib\/cbook\/deprecation.py in wrapper(*args, **kwargs) 356 f\"%(removal)s. If any parameter follows {name!r}, they \" 357 f\"should be pass as keyword, not positionally.\")\n--> 358 return func(*args, **kwargs) 359 360 return wrapper \/usr\/local\/lib\/python3.7\/dist-packages\/matplotlib\/cbook\/deprecation.py in wrapper(*args, **kwargs) 356 f\"%(removal)s. If any parameter follows {name!r}, they \" 357 f\"should be pass as keyword, not positionally.\")\n--> 358 return func(*args, **kwargs) 359 360 return wrapper \/usr\/local\/lib\/python3.7\/dist-packages\/matplotlib\/axes\/_axes.py in imshow(self, X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, shape, filternorm, filterrad, imlim, resample, url, **kwargs) 5624 resample=resample, **kwargs) 5625 -> 5626 im.set_data(X) 5627 im.set_alpha(alpha) 5628 if im.get_clip_path() is None: \/usr\/local\/lib\/python3.7\/dist-packages\/matplotlib\/image.py in set_data(self, A) 697 or self._A.ndim == 3 and self._A.shape[-1] in [3, 4]): 698 raise TypeError(\"Invalid shape {} for image data\"\n--> 699 .format(self._A.shape)) 700 701 if self._A.ndim == 3: TypeError: Invalid shape (1, 28, 28) for image data\n<\/pre>\n<p>Oops, that extra one-item dimension (containing the whole image) is causing us problems. We can use the <code>squeeze()<\/code> method on the tensor to get rid of any one-element dimensions, and instead return a two-dimensional 28*28 tensor, instead of the three-dimensional tensor we had before.<\/p>\n<p>Let&#8217;s try again:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">plt.imshow(mnist_data[0][0].squeeze(), cmap='gray')\n# &lt;matplotlib.image.AxesImage at 0x7f5b5e336150><\/pre>\n<\/p>\n<p>Well, it&#8217;s a little sloppy, but that&#8217;s plausibly a number <code>'5'<\/code>. (This is reasonable to expect from a hand-written digit!).<\/p>\n<p>So it looks like each item in the dataset is a tuple containing an image (in tensor format) and its corresponding label.<\/p>\n<p>Let&#8217;s use Matplotlib to look at the first 10 images, and title each image with its corresponding label:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">fig, axs = plt.subplots(2, 5, figsize=(8, 5))\nfor a_row in range(2): for a_col in range(5): img_no = a_row*5 + a_col img = mnist_data[img_no][0].squeeze() img_tgt = mnist_data[img_no][1] axs[a_row][a_col].imshow(img, cmap='gray') axs[a_row][a_col].set_xticks([]) axs[a_row][a_col].set_yticks([]) axs[a_row][a_col].set_title(img_tgt, fontsize=20)\nplt.show()<\/pre>\n<p>So now we have a clear understanding of how our dataset is structured and what the data looks like. Much of this is explained in the dataset description page, but this kind of analysis is often very useful for getting a precise understanding of the dataset that might not be clear from the description.<\/p>\n<h2>Step 4: Create Dataloaders<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"768\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-207-1024x768.png\" alt=\"\" class=\"wp-image-904045\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-207-1024x768.png 1024w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-207-300x225.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-207-768x576.png 768w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-207.png 1251w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n<p>Datasets make the data available for processing. <\/p>\n<p>However, typically, we will want to process using randomized mini-batches from the dataset. <\/p>\n<p>Data loaders make this easy. Dataloaders are <a href=\"https:\/\/blog.finxter.com\/iterators-iterables-and-itertools\/\" data-type=\"post\" data-id=\"29507\" target=\"_blank\" rel=\"noreferrer noopener\">iterables<\/a>, and you&#8217;ll see later that every time you iterate a dataloader it returns a randomized minibatch from the dataset that can be processed through the neural network.<\/p>\n<p>Let&#8217;s create some dataloader objects from our datasets:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">batch_size = 100 mnist_train_dl = DataLoader(mnist_data, batch_size=batch_size, shuffle=True) mnist_test_dl = DataLoader(mnist_test_data, batch_size=batch_size, shuffle=True)<\/pre>\n<p>So we have created two data loaders, one for the training dataset, and one for the test dataset. <\/p>\n<p>The <code>batch_size<\/code> parameter specifies the number of image\/label pairs in the minibatch that the dataloader will return for each iteration. The <code>shuffle<\/code> parameter determines whether or not the mini-batches are randomized.<\/p>\n<h2>Step 5: Design and Create the Neural Network<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"625\" height=\"938\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-208.png\" alt=\"\" class=\"wp-image-904047\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-208.png 625w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-208-200x300.png 200w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/figure>\n<\/div>\n<h3>Check for GPU<\/h3>\n<p>We are about to design and create the neural network, but first, let&#8217;s check if a GPU is available. <\/p>\n<p>One of the advantages PyTorch has as a neural network framework is that it supports the use of a GPU. The use of a GPU will implement parallel processing to greatly speed up computation. <\/p>\n<p>Depending on the problem, at least an order of magnitude faster processing can be achieved.<\/p>\n<p>Use of a GPU with PyTorch is very easy. First, use the function <code>torch.cuda.is_available()<\/code> to test if a GPU is available and properly configured for use by PyTorch (PyTorch uses the CUDA framework for using the GPU).<\/p>\n<p>If a GPU is available, we will send the model and the data tensors to the GPU for processing.<\/p>\n<p>The following tests for availability of a GPU, then sets a variable device to either <code>'cpu'<\/code> or <code>'cuda'<\/code> depending on what is available.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nprint(f\"Using {device} device\")\n# Using cuda device<\/pre>\n<h3>Create the Neural Network<a href=\"https:\/\/docs.google.com\/document\/d\/1ChXcbOjMg_yJBiWl8_GRcCDE_s7PprXAm3_wYWcYD2U\/edit#bookmark=id.2s8eyo1\"><\/a><\/h3>\n<p>Now let&#8217;s design and create the neural network. We do this by creating a class, which we have chosen to call <code>NeuralNet<\/code>, which is a subclass of the <code>nn.Module<\/code> library. <\/p>\n<p>Here is the code to specify and then create our neural network:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">class NeuralNet(nn.Module): def __init__(self): super().__init__() # Required to properly initialize class, ensures inheritance of the parent __init__() method self.flat_f = nn.Flatten() # Creates function to smartly flatten tensor self.neur_net = nn.Sequential( nn.Linear(28*28, 512), nn.ReLU(), nn.Linear(512, 256), nn.ReLU(), nn.Linear(256,10) ) def forward(self, x): x = self.flat_f(x) logits = self.neur_net(x) return logits model = NeuralNet().to(device)<\/pre>\n<p>There are a number of important details to review in this code.<\/p>\n<p>First, our neural network definition class <em>must<\/em> have two methods included: an <code><a href=\"https:\/\/blog.finxter.com\/python-init\/\" data-type=\"post\" data-id=\"5133\" target=\"_blank\" rel=\"noreferrer noopener\">__init__()<\/a><\/code> method, and a <code>forward()<\/code> method. <\/p>\n<p>Classes in Python routinely include an <code>__init__()<\/code> method to initialize variables and other things in the object that is created. The class must also include a <code>forward()<\/code> method, which tells PyTorch how to process the data during the forward pass of the data. <\/p>\n<p>Let&#8217;s go over each of these in more detail.<\/p>\n<h3>Creating the Model: __init__() Method<a href=\"https:\/\/docs.google.com\/document\/d\/1ChXcbOjMg_yJBiWl8_GRcCDE_s7PprXAm3_wYWcYD2U\/edit#bookmark=id.17dp8vu\"><\/a><\/h3>\n<p>First, within the <code>__init__()<\/code> method note the <code>super().__init__()<\/code> command. When we create a subclass it inherits the parent class variables and methods. <\/p>\n<p>However, when we write an <code>__init__()<\/code> method in the subclass, that overrides inheritance of the <code>__init__()<\/code> method from the parent class. <\/p>\n<p>However there are features in the parent class&#8217; <code>__init__()<\/code> that our class needs to inherit. The <code>super()__.init__()<\/code> command achieves this. In effect, it says <em>&#8220;include the parent class <code>__init__()<\/code> within our child class&#8221;<\/em>. <\/p>\n<p>To make a long story short, this is necessary to properly initialize our child class, by including some things needed from the parent <code>nn.Module<\/code> class.<\/p>\n<p>Next, note creating a function from the <code>nn.Flatten()<\/code> function. Even though our data is a 28&#215;28 pixel two-dimensional image, the processing still works if we convert it into a one-dimensional vector, stacking row by row next to one another to form a 28&#215;28 = 784 element vector (in fact making this change is a common choice).<\/p>\n<p>The <code>flatten()<\/code> function achieves this. However, the standard <code>flatten()<\/code> (note the lower case <code>'f'<\/code>) function will flatten everything, turning a 100 image minibatch tensor of shape (100, 1, 28, 28) into a single vector of shape (78400). <\/p>\n<p>Instead, if we create a function from the <code>nn.Flatten()<\/code> function (note the upper case <code>'F'<\/code>), this is smart enough to know to eliminate the single-element dimension and merge the last two dimensions, resulting in a tensor of shape (100, 784), representing a list of 100 vectors of 784 elements. <\/p>\n<p class=\"has-global-color-8-background-color has-background\"><strong>Note<\/strong>: double-check to make sure your function is flattening properly. If not, the <code>Flatten()<\/code> function can include some parameters that specify which dimensions to flatten. See documentation for details.<\/p>\n<p>The last thing we do in the <code>__init__()<\/code> method is specify the neural network structure using the <code>nn.Sequential()<\/code> function. <\/p>\n<p>Here we list the neural network layers in sequence from beginning to end. <\/p>\n<p>First, we list an input layer of 28&#215;28=784 neurons, connecting through linear (weights * input + bias) connections to 512 neurons. These 512 neurons then pass data through a non-linear ReLU <a href=\"https:\/\/blog.finxter.com\/bitcoin-price-forecast-with-lstm-based-architectures\/\" data-type=\"post\" data-id=\"782261\" target=\"_blank\" rel=\"noreferrer noopener\">activation function<\/a> layer. <\/p>\n<p>Those signals then go through another linear layer connecting 512 neurons to 256 neurons. These signals then go through another ReLU activation function layer. Finally, the signals go through a final linear layer connecting the 256 neurons to 10 final output neurons.<\/p>\n<p><code>'ReLU'<\/code> stands for <code>'Rectified Linear Unit'<\/code>. It is one of many non-linear activation functions which can be chosen. <\/p>\n<p>It is defined as:<\/p>\n<pre class=\"wp-block-preformatted\"><code>f(x) = x, if x>=0\nelse f(x) = 0<\/code><\/pre>\n<p>Here is a graph of the ReLU function:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"376\" height=\"251\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-197.png\" alt=\"\" class=\"wp-image-903892\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-197.png 376w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-197-300x200.png 300w\" sizes=\"auto, (max-width: 376px) 100vw, 376px\" \/><\/figure>\n<\/div>\n<h3>Creating the Model: forward() Method<\/h3>\n<p>The second required method for our class is the <code>forward()<\/code> method. <\/p>\n<p>As mentioned the <code>forward()<\/code> method tells <a href=\"https:\/\/blog.finxter.com\/tensorflow-vs-pytorch\/\" data-type=\"post\" data-id=\"692005\" target=\"_blank\" rel=\"noreferrer noopener\">PyTorch<\/a> how to process the data during the forward pass. Here we first flatten our tensor using the flatten function we defined previously under <code>__init__()<\/code>.<\/p>\n<p>Then we pass the tensor through the <code>self.neur_net()<\/code> function we defined previously using the <code>nn.Sequential()<\/code> function. Finally, the results are returned.<\/p>\n<p class=\"has-global-color-8-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/14.0.0\/72x72\/1f4a1.png\" alt=\"\ud83d\udca1\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> <strong>Important point<\/strong>: the programmer will NOT be using <code>forward(<\/code>) method in any classes or functions, it is just for PyTorch&#8217;s use. PyTorch expects such a method, so it must be written, but the programmer will not directly use it in any subsequent code.<\/p>\n<p>Finally, we create the neural network (here named <code>'model'<\/code>) by creating an instance of our <code>NeuralNet()<\/code> class. In addition, we move the model to the GPU (if available) by including the <code>.to(device)<\/code> method.<\/p>\n<p>Finally, we can choose to <a href=\"https:\/\/blog.finxter.com\/python-print\/\" data-type=\"post\" data-id=\"20731\" target=\"_blank\" rel=\"noreferrer noopener\">print<\/a> the model to examine the neural network object we have built:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">print(model)<\/pre>\n<p>Output:<\/p>\n<pre class=\"wp-block-preformatted\"><code>NeuralNet( (flat_f): Flatten(start_dim=1, end_dim=-1) (neur_net): Sequential( (0): Linear(in_features=784, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=256, bias=True) (3): ReLU() (4): Linear(in_features=256, out_features=10, bias=True) )\n)<\/code><\/pre>\n<\/p>\n<h2>Step 6: Choose Loss Function and Optimizer<a href=\"https:\/\/docs.google.com\/document\/d\/1ChXcbOjMg_yJBiWl8_GRcCDE_s7PprXAm3_wYWcYD2U\/edit#bookmark=id.26in1rg\"><\/a><\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"768\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-209-1024x768.png\" alt=\"\" class=\"wp-image-904051\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-209-1024x768.png 1024w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-209-300x225.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-209-768x576.png 768w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-209.png 1250w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n<p>Next, we&#8217;ll need to specify our loss function and our optimizer algorithm.<\/p>\n<h3>Choosing Cross Entropy Loss<\/h3>\n<p>Recall the loss function measures how far the model&#8217;s guess is from the correct answer for a given input. Adjusting weights and biases to minimize loss is how neural networks learn (see the Finxter article <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/how-neural-networks-learn\/\" target=\"_blank\">&#8220;How Neural Networks Learn&#8221;<\/a> for details.).<\/p>\n<p>There are multiple choices of loss functions available, and learning about these various functions is something you will want to do, because which loss choice is most suitable depends on the particular kind of problem you are solving.<\/p>\n<p>In this case, we are sorting images into multiple categories. <\/p>\n<p>One of the most suitable loss choices for this case is <em>cross-entropy loss<\/em>. Cross entropy is an idea taken from information theory, and it is a measure of how many extra bits must be sent when sending a message using a sub-optimized code.<\/p>\n<p>This is beyond the scope of this exercise, but we can understand its usefulness to our situation if we examine the calculation involved:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-198-1024x72.png\" alt=\"\" class=\"wp-image-903917\" width=\"714\" height=\"50\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-198-1024x72.png 1024w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-198-300x21.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-198-768x54.png 768w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-198.png 1192w\" sizes=\"auto, (max-width: 714px) 100vw, 714px\" \/><\/figure>\n<\/div>\n<p>That is, for each category multiply the true probability <em>t<\/em> by the log of the model&#8217;s estimated probability <em>p<\/em>, and add them all up. <\/p>\n<p>Of course, <em>t<\/em> is zero for each incorrect category, and 1 for the correct category. <\/p>\n<p>Consequently, for any given image, just the correct category is selected to contribute to the loss calculation, and that loss is the negative of the log of the probability estimate.<\/p>\n<p>Recall this is what the <code>log()<\/code> function looks like:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"385\" height=\"261\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-199.png\" alt=\"\" class=\"wp-image-903924\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-199.png 385w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-199-300x203.png 300w\" sizes=\"auto, (max-width: 385px) 100vw, 385px\" \/><\/figure>\n<\/div>\n<p>Since the network provides a probability estimate we are only interested in the interval <code>(0,1]<\/code>. Here is what the negative of the <code>log()<\/code> looks like over that interval:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"396\" height=\"251\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-200.png\" alt=\"\" class=\"wp-image-903925\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-200.png 396w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-200-300x190.png 300w\" sizes=\"auto, (max-width: 396px) 100vw, 396px\" \/><\/figure>\n<\/div>\n<p>So the loss is very large when the network gives a low probability estimate (near zero) for the correct category, and the loss is lowest (near zero) when the network gives a high probability estimate (near 1.0) for the correct category.<\/p>\n<p>Here is the code specifying cross entropy loss as the loss function:<\/p>\n<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">loss_fn = nn.CrossEntropyLoss()<\/pre>\n<h3>Choosing Optimizer Algorithm<\/h3>\n<p>We also need to choose the optimizer algorithm. This is the method used to minimize the loss through training. Multiple different optimizers may be chosen, and you will want to learn about the various optimizers available. <\/p>\n<p>All are variations on gradient descent. <\/p>\n<p>For example, some include extinction of the learning rate; others include momentum that helps drive loss away from local minima.<\/p>\n<p>In our case, we will choose plain-old vanilla stochastic gradient descent. Here is the code specifying the optimizer and its learning rate:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">learning_rate = 1e-3\noptimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)<\/pre>\n<h2>Step 7: Specify Training and Testing Functions<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"682\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-210-1024x682.png\" alt=\"\" class=\"wp-image-904055\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-210-1024x682.png 1024w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-210-300x200.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-210-768x512.png 768w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/11\/image-210.png 1255w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n<p>Now we define functions for training and testing the neural network.<\/p>\n<h3>Training Function<\/h3>\n<p>Here is the code specifying the training function:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def train_nn(dataloader, model, loss_fn, optimizer): size = len(dataloader.dataset) for batch, (X, y) in enumerate(dataloader): X, y = X.to(device), y.to(device) # For each image in batch X, compute prediction pred = model(X) # Compute average loss for the set of images in batch loss = loss_fn(pred, y) # Backpropagation optimizer.zero_grad() # Zero gradients loss.backward() # Computes gradients optimizer.step() # Update weights, biases according to gradients, factored by learning rate if batch % 100 == 0: # Report progress every 100 batches loss, current = loss.item(), batch * len(X) print(f\"loss: {loss:>7f} [{current:>5d}\/{size:>5d}]\")<\/pre>\n<p>We pass into the function the dataloader, model, loss function, and optimizer objects.<\/p>\n<p>The function then loops over minibatches from the dataloader.<\/p>\n<p>For each loop, a minibatch of the input images X and the labels y is retrieved and then moved to the GPU (if available). <\/p>\n<p>Then the neural network model calculates predictions from the input images X. These predictions and the correct labels y are used to calculate the loss (note this loss is a single number that is the average loss for the minibatch).<\/p>\n<p>Once the loss is calculated, the function can adjust weights and biases (backpropagate) in three code steps. <\/p>\n<p>First, gradient attributes are zeroed out using <code>optimizer.zero_grad()<\/code> (PyTorch defaults to accumulating gradient calculations, so they need to be zeroed out on each iteration of the loop, or else they&#8217;ll keep accumulating data). <\/p>\n<p>Then the gradients are calculated using <code>loss.backward()<\/code>. Finally, weights and biases are updated according to the gradients using <code>optimizer.step()<\/code>.<\/p>\n<p>Finally, a small section is included to report progress every 100 batches. This prints out the current loss, and how many images of the total images have been completed.<\/p>\n<h3>Testing Function<\/h3>\n<p>Here is the code specifying the testing function:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def test_loop(dataloader, model, loss_fn): # After each epoch, test training results (report categorizing accuracy, loss) size = len(dataloader.dataset) # Number of image\/label pairs in dataset num_batches = len(dataloader) test_loss, correct = 0, 0 # Initialize variables tracking loss and accuracy during test loop with torch.no_grad(): # Disable gradient tracking - reduces resource use and speeds up processing for X, y in dataloader: X, y = X.to(device), y.to(device) pred = model(X) # Get predictions from the neural network based on input minibatch X test_loss += loss_fn(pred, y).item() # Accumulate loss values during loop through dataset correct += (pred.argmax(1) == y).type(torch.float).sum().item() # Accumulate correct predictions during loop through dataset test_loss \/= num_batches # Calculate average loss correct \/= size # Calculate accuracy rate print(f\"Test Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \\n\") # Report test results<\/pre>\n<p>This function tests the accuracy of the network using the test data. <\/p>\n<p>First, we pass in the testing data loader, the model, and the loss function (for testing loss). Then the function initializes several variables, especially <code>test_loss<\/code> and correct for accumulating test results during the test loop.<\/p>\n<p>The function does the next few steps within a with <code>torch.no_grad()<\/code>: subsection. <\/p>\n<p>Here is why: PyTorch stores calculations from the forward pass for later use during the backpropagation gradient calculations. <\/p>\n<p>The <code>torch.no_grad()<\/code> method turns that off while in this with subsection, since there will be only a forward pass during the testing. This saves resources and speeds up processing. You will want to do the same thing once you have a trained network that is used for classifying in production. <\/p>\n<p>After leaving the with subsection the calculation-storing feature automatically resumes.<\/p>\n<p class=\"has-global-color-8-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/14.0.0\/72x72\/1f4a1.png\" alt=\"\ud83d\udca1\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> <strong>Note<\/strong>: be aware that storing calculations is turned on (<code>requires_grad=True<\/code>) because we are using Modules from the <code>nn<\/code> library (Linear, ReLU). Otherwise, PyTorch tensors default to <code>requires_grad=False<\/code>.<\/p>\n<p>Then the function uses a for loop to iterate through the minibatches of the test dataloader. For each iteration, the neural network model computes predictions from the minibatch of images. The loss is calculated for the minibatch, which is then accumulated in <code>test_loss<\/code>.<\/p>\n<p>Then the number of correct predictions for the minibatch is found as follows: first note that pred is a set of 10-element vectors, with each element an estimate of the probability of that element index being the correct prediction. <\/p>\n<p>The <code>.argmax(1)<\/code> method returns the index of the largest estimate (the number 1 in the <code>argmax()<\/code> argument indicates which dimension to use for the operation). This list (tensor) of indices is compared to the list (tensor) of correct labels in <code>y<\/code>. <\/p>\n<p>This results in a list (tensor) containing <code>True<\/code> where there is a match, and <code>False<\/code> otherwise. The <code>type(torch.float)<\/code> method converts these into floating point 1&#8217;s and 0&#8217;s. <\/p>\n<p>The <code>sum()<\/code> method adds all the elements together. Then finally, the <code>.item()<\/code> method converts the totaled one-element tensor into a raw number (scalar). <\/p>\n<p>Finally, we have the total number of correct predictions for that batch, which is added to the correct variable that accumulates the total number of correct predictions as the for loop iterates through the dataloader.<\/p>\n<h3>Train and Test the Network<\/h3>\n<p>Now we have written enough code, we can write a small main program loop to train and test the network. We specify how many epochs we wish to run, then we loop through those epochs, training and testing the network for each one.<\/p>\n<p>Here is the code:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># The main program! epochs = 5\nfor t in range(epochs): print(f\"Epoch {t+1}\\n-------------------------------\") train_nn(mnist_train_dl, model, loss_fn, optimizer) test_loop(mnist_test_dl, model, loss_fn)\nprint(\"Done!\")<\/pre>\n<p>Output:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Epoch 1\n-------------------------------\nloss: 2.102096 [ 0\/60000]\nloss: 2.119211 [10000\/60000]\nloss: 2.068424 [20000\/60000]\nloss: 2.056982 [30000\/60000]\nloss: 2.028877 [40000\/60000]\nloss: 1.995214 [50000\/60000]\nTest Error: Accuracy: 65.9%, Avg loss: 2.000194 Epoch 2\n-------------------------------\nloss: 2.018245 [ 0\/60000]\nloss: 1.996478 [10000\/60000]\nloss: 1.969913 [20000\/60000]\nloss: 1.999372 [30000\/60000]\nloss: 1.944238 [40000\/60000]\nloss: 1.863184 [50000\/60000]\nTest Error: Accuracy: 67.8%, Avg loss: 1.866808 Epoch 3\n-------------------------------\nloss: 1.921477 [ 0\/60000]\nloss: 1.891367 [10000\/60000]\nloss: 1.840778 [20000\/60000]\nloss: 1.751534 [30000\/60000]\nloss: 1.718531 [40000\/60000]\nloss: 1.800236 [50000\/60000]\nTest Error: Accuracy: 69.5%, Avg loss: 1.695623 Epoch 4\n-------------------------------\nloss: 1.692079 [ 0\/60000]\nloss: 1.752511 [10000\/60000]\nloss: 1.600570 [20000\/60000]\nloss: 1.582768 [30000\/60000]\nloss: 1.532521 [40000\/60000]\nloss: 1.569566 [50000\/60000]\nTest Error: Accuracy: 71.9%, Avg loss: 1.498120 Epoch 5\n-------------------------------\nloss: 1.507337 [ 0\/60000]\nloss: 1.515740 [10000\/60000]\nloss: 1.437465 [20000\/60000]\nloss: 1.424620 [30000\/60000]\nloss: 1.409456 [40000\/60000]\nloss: 1.385026 [50000\/60000]\nTest Error: Accuracy: 74.6%, Avg loss: 1.300192 Done!\n<\/pre>\n<p>After just 5 epochs, the accuracy isn&#8217;t very good yet, but we can see that things are moving in the right direction. <\/p>\n<p>Obviously, if we wanted to get good performance we would need to train for more epochs. Figuring out how much to train (being careful not to overfit!) is something a neural network engineer has to work out.<\/p>\n<h2>Reviewing the Big Picture<\/h2>\n<p>It may seem like we have gone over a lot, and we have, but if you step back and look at the big picture there isn&#8217;t a lot here. <\/p>\n<p>It may seem like a lot because we have reviewed everything in detail to make sure we convey full understanding. <\/p>\n<p>However, to gain some perspective, let&#8217;s show all the essential code, without all the extra description and explanation (note, we&#8217;re also skipping the code here used to review the dataset):<\/p>\n<h3>Import Necessary Libraries<\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import torch\nfrom torch import nn\nfrom torchvision import datasets\nfrom torch.utils.data import DataLoader\nfrom torchvision.transforms import ToTensor<\/pre>\n<h3>Acquire the Data<a href=\"https:\/\/docs.google.com\/document\/d\/1ChXcbOjMg_yJBiWl8_GRcCDE_s7PprXAm3_wYWcYD2U\/edit#bookmark=id.2et92p0\"><\/a><\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Download MNIST data, put it in pytorch dataset\nmnist_data = datasets.MNIST( root='mnist_nn', train=True, download=True, transform=ToTensor()\n)\n<\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">mnist_test_data = datasets.MNIST( root='mnist_nn', train=False, download=True, transform=ToTensor()\n)<\/pre>\n<h3>Create Dataloaders<a href=\"https:\/\/docs.google.com\/document\/d\/1ChXcbOjMg_yJBiWl8_GRcCDE_s7PprXAm3_wYWcYD2U\/edit#bookmark=id.3dy6vkm\"><\/a><\/h3>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">batch_size = 100\nmnist_train_dl = DataLoader(mnist_data, batch_size=batch_size, shuffle=True) mnist_test_dl = DataLoader(mnist_test_data, batch_size=batch_size, shuffle=True)<\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<h3>Check for GPU<\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nprint(f\"Using {device} device\")\n# Using cuda device<\/pre>\n<h3>Design and Create the Neural Network<a href=\"https:\/\/docs.google.com\/document\/d\/1ChXcbOjMg_yJBiWl8_GRcCDE_s7PprXAm3_wYWcYD2U\/edit#bookmark=id.1t3h5sf\"><\/a><\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">class NeuralNet(nn.Module): def __init__(self): super().__init__() # Required to properly initialize class, ensures inheritance of the parent __init__() method self.flat_f = nn.Flatten() # Creates function to smartly flatten tensor self.neur_net = nn.Sequential( nn.Linear(28*28, 512), nn.ReLU(), nn.Linear(512, 256), nn.ReLU(), nn.Linear(256,10) ) def forward(self, x): x = self.flat_f(x) logits = self.neur_net(x) return logits model = NeuralNet().to(device)<\/pre>\n<h3>Choose Loss Function and Optimizer<\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">loss_fn = nn.CrossEntropyLoss()\nlearning_rate = 1e-3\noptimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)<\/pre>\n<h3>Specify Training and Testing Functions<a href=\"https:\/\/docs.google.com\/document\/d\/1ChXcbOjMg_yJBiWl8_GRcCDE_s7PprXAm3_wYWcYD2U\/edit#bookmark=id.1ksv4uv\"><\/a><\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def train_nn(dataloader, model, loss_fn, optimizer): size = len(dataloader.dataset) for batch, (X, y) in enumerate(dataloader): X, y = X.to(device), y.to(device) # For each image in batch X, compute prediction pred = model(X) # Compute average loss for the set of images in batch loss = loss_fn(pred, y) # Backpropagation optimizer.zero_grad() # Zero gradients loss.backward() # Computes gradients optimizer.step() # Update weights, biases according to gradients, factored by learning rate if batch % 100 == 0: # Report progress every 100 batches loss, current = loss.item(), batch * len(X) print(f\"loss: {loss:>7f} [{current:>5d}\/{size:>5d}]\")\n<\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def test_loop(dataloader, model, loss_fn): # After each epoch, test training results (report categorizing accuracy, loss) size = len(dataloader.dataset) # Number of image\/label pairs in dataset num_batches = len(dataloader) test_loss, correct = 0, 0 # Initialize variables tracking loss and accuracy during test loop with torch.no_grad(): # Disable gradient tracking - reduces resource use and speeds up processing for X, y in dataloader: X, y = X.to(device), y.to(device) pred = model(X) # Get predictions from the neural network based on input minibatch X test_loss += loss_fn(pred, y).item() # Accumulate loss values during loop through dataset correct += (pred.argmax(1) == y).type(torch.float).sum().item() # Accumulate correct predictions during loop through dataset test_loss \/= num_batches # Calculate average loss correct \/= size # Calculate accuracy rate print(f\"Test Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \\n\") # Report test results<\/pre>\n<h3>Train and Test the Network<\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># The main program! epochs = 5\nfor t in range(epochs): print(f\"Epoch {t+1}\\n-------------------------------\") train_nn(mnist_train_dl, model, loss_fn, optimizer) test_loop(mnist_test_dl, model, loss_fn)\nprint(\"Done!\")<\/pre>\n<p>Really we have written just a few dozen lines of code, comparable to the size program a hobbyist programmer might write. <\/p>\n<p>Yet we&#8217;ve built a world-class neural network that converts hand-written digits to numbers a computer can work with. That&#8217;s pretty amazing!<\/p>\n<p>Of course, this is all possible thanks to the efforts of the many engineers who wrote the many more lines of code within PyTorch. Thank you to all of you who have contributed to PyTorch! <\/p>\n<p>This is another example of achieving great things by standing on the shoulders of giants!<\/p>\n<h2>Saving and Reloading the Network<\/h2>\n<p>We have built, trained, and tested a neural network, and that&#8217;s great. But really, the point of training a neural network is to put it to use. To support that, we need to be able to save and reload the network for later use.<\/p>\n<p>Use the following code to save the weights and biases of your neural network (<strong>note<\/strong>: the common convention is to save these files with extension <code>.pt<\/code> or <code>.pth<\/code>):<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">torch.save(network_name.state_dict(), 'filename.pth')<\/pre>\n<p>Since we named our network model we would save as follows:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">torch.save(model.state_dict(), 'model_weights.pth')<\/pre>\n<p>To reload, first create an instance of your neural network (make sure you have access to the class\/neural network you originally specified). In our example:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">user_model = NeuralNet().to(device)<\/pre>\n<p>Then load the new instance with your saved weights and biases:<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">user_model.load_state_dict(torch.load('model_weights.pth'))\n# &lt;All keys matched successfully><\/pre>\n<p>Some of the modules perform differently when in training rather than when in use. <\/p>\n<p>Specifically, when in training mode, some of them implement various <em>regularization methods<\/em> which are used to resist the onset of overfitting. <\/p>\n<p>These methods may include some randomness and can cause the network to give inconsistent results. To avoid this, make sure you are in evaluation mode and not training mode:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">user_model.eval()<\/pre>\n<p>Output:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">NeuralNet( (flat_f): Flatten(start_dim=1, end_dim=-1) (neur_net): Sequential( (0): Linear(in_features=784, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=256, bias=True) (3): ReLU() (4): Linear(in_features=256, out_features=10, bias=True) )\n)<\/pre>\n<p>As you can see this command conveniently reports the neural network structure.<\/p>\n<p>Let&#8217;s make sure our reloaded network works.<\/p>\n<p>It would be best to test with some new handwritten digits, but for the sake of convenience lets just test it with the first ten test images (especially since the network was not trained very heavily). <\/p>\n<p>Let&#8217;s look at these first ten images in the test dataset:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">fig, axs = plt.subplots(2, 5, figsize=(8, 5))\nfor a_row in range(2): for a_col in range(5): img_no = a_row*5 + a_col img = mnist_test_data[img_no][0].squeeze() img_tgt = mnist_test_data[img_no][1] axs[a_row][a_col].imshow(img, cmap='gray') axs[a_row][a_col].set_xticks([]) axs[a_row][a_col].set_yticks([]) axs[a_row][a_col].set_title(img_tgt, fontsize=20)\nplt.show()<\/pre>\n<p>Now let&#8217;s see if the network detects these images properly:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def eval_image(model, imgno): testimg = mnist_test_data[imgno][0] # assign first image to variable 'testimg' testimg = testimg.to(device) # move image data to GPU logits = model(testimg) # run image through network return logits.argmax().item() # argmax id's value, returns it for img_no in range(10): img_val = eval_image(model, img_no) print(img_val)\n<\/pre>\n<p>Output:<\/p>\n<pre class=\"wp-block-preformatted\"><code>\n7\n2\n1\n0\n4\n1\n7\n9\n6\n7<\/code>\n<\/pre>\n<p>The results are not perfect, but for an incompletely trained network that&#8217;s not bad! The few failure are plausible given the incomplete training. Our network works with the saved and reloaded weights and biases!<\/p>\n<h2>Conclusion<\/h2>\n<p>We hope you have found this article educational, and we hope it inspires you to go and build your own working neural networks using PyTorch!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>5\/5 &#8211; (1 vote) In this article, we will use PyTorch to build a working neural network. Specifically, this network will be trained to recognize handwritten numerical digits using the famous MNIST dataset. The code in this article borrows heavily from the PyTorch tutorial &#8220;Learn the Basics&#8221;. We do this for several reasons. First, that [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[857],"tags":[73,468,528],"class_list":["post-129863","post","type-post","status-publish","format-standard","hentry","category-python-tut","tag-programming","tag-python","tag-tutorial"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/129863","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=129863"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/129863\/revisions"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=129863"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=129863"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=129863"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}