Posted on Leave a comment

Python | Split String into Characters

Rate this post

Summary: Use the list("given string") to extract each character of the given string and store them as individual items in a list.
Minimal Example:
print(list("abc"))

Problem: Given a string; How will you split the string into a list of characters?

Example: Let’s visualize the problem with the help of an example:

input = “finxter”
output = [‘f’, ‘i’, ‘n’, ‘x’, ‘t’, ‘e’, ‘r’]

Now that we have an overview of our problem let us dive into the solutions without further ado.

Method 1: Using The list Constructor

Approach: One of the simplest ways to solve the given problem is to use the list constructor and pass the given string into it as the input.

list() creates a new list object that contains items obtained by iterating over the input iterable. Since a string is an iterable formed by combining a group of characters, hence, iterating over it using the list constructor yields a single character at each iteration which represents individual items in the newly formed list.

Code:

text = "finxter"
print(list(text)) # ['f', 'i', 'n', 'x', 't', 'e', 'r']

💎Related Tutorial: Python list() — A Simple Guide with Video

Method 2: Using a List Comprehension

Another way to split the given string into characters would be to use a list comprehension such that the list comprehension returns a new list containing each character of the given string as individual items.

Code:

text = "finxter"
print([x for x in text]) # ['f', 'i', 'n', 'x', 't', 'e', 'r']

Prerequisite: To understand what happened in the above code, it is essential to know what a list comprehension does. In simple words, a list comprehension in Python is a compact way of creating lists. The simple formula is [expression + context], where the “expression” determines what to do with each list element. And the “context” determines what elements to select. The context can consist of an arbitrary number of for and if statements. To learn more about list comprehensions, head on to this detailed guide on list comprehensions.

Explanation: Well! Now that you know what list comprehensions are, let’s try to understand what the above code does. In our solution, the context variable x is used to extract each character from the given string by iterating across each character of the string one by one with the help of a for loop. This context variable x also happens to be the expression of our list comprehension as it stores the individual characters of the given string as separate items in the newly formed list.

Multi-line Solution: Another approach to formulating the above solution is to use a for loop. The idea is pretty similar; however, we will not be using a list comprehension in this case. Instead, we will use a for loop to iterate across individual characters of the given string and store them one by one in a new list with the help of the append method.

text = "finxter"
res = []
for i in text: res.append(i)
print(res) # ['f', 'i', 'n', 'x', 't', 'e', 'r']

Method 3: Using map and lambda

Yet another way of solving the given problem is to use a lambda function within the map function. Now, this is complex and certainly not the best fit solution to the given problem. However, it may (or may not ;P) be appropriate when you are handling really complex tasks. So, here’s how to use the two built-in Python functions to solve the given problem:

import re
text = "finxter"
print(list(map(lambda c: c, text))) # ['f', 'i', 'n', 'x', 't', 'e', 'r']

Explanation: The map() function is used to execute a specified function for each item of an iterable. In this case, the iterable is the given string and each character of the string represents an individual item within it. Now, all we need to do is to create a lambda function that simply returns the character passed to it as the input. That’s it! However, the map method will return a map object, so you must convert it to a list using the list() function. Silly! Isn’t it? Nevertheless, it works!

Conclusion

Hurrah! We have successfully solved the given problem using as many as three different ways. I hope you enjoyed this article and it helps you in your Python coding journey. Please subscribe and stay tuned for more interesting articles!

Related Reads:
⦿ How To Split A String And Keep The Separators?
⦿
How To Cut A String In Python?


Posted on Leave a comment

How to Return a File From a Function in Python?

5/5 – (1 vote)

Do you need to create a function that returns a file but you don’t know how? No worries, in sixty seconds, you’ll know! Go! 👇

A Python function can return any object such as a file object. To return a file, first open the file object within the function body, handle all possible errors if the file doesn’t exist, and return it to the caller of the function using the keyword operation return open(filename, mode='r').

Here’s a minimal example that tries to open a filename that was provided by the user via the input() function. If it fails, it prints an error message and asks for a different user input:

def open_file(): while True: filename = input('filename: ') try: return open(filename, mode='r') except: print('Error. Try again') f = open_file()
print(f.read()) 

If I type in the correct file right away, I get the following output when storing the previous code snippet in a file named code.py—the code reads itself (meta 🤯):

filename: code.py
def open_file(): while True: filename = input('filename: ') try: return open(filename, mode='r') except: print('Error. Try again') f = open_file()
print(f.read())

Note that you can open the file in writing mode rather than reading mode by replacing the line with the return statement with the following line:

open(filename, mode='w')

A more Pythonic way, in my opinion, is to follow the single-responsibility pattern whereby a function should do only one thing. In that case, provide the relevant input values into the function like so:

def open_file(filename, mode): try: return open(filename, mode=mode) except: return None def ask_user(): f = open_file(input('filename: '), input('mode: ')) while not f: f = open_file(input('filename: '), input('mode: ')) return f f = ask_user() print(f.read()) 

Notice how the file handling of a single instance and the user input processing are separated into two functions. Each function does one thing only. Unix style.


If you want to improve your programming skills and coding productivity creating massive success with your apps and coding projects, feel free to check out my book on the topic:


The Art of Clean Code

Most software developers waste thousands of hours working with overly complex code. The eight core principles in The Art of Clean Coding will teach you how to write clear, maintainable code without compromising functionality. The book’s guiding principle is simplicity: reduce and simplify, then reinvest energy in the important parts to save you countless hours and ease the often onerous task of code maintenance.

  1. Concentrate on the important stuff with the 80/20 principle — focus on the 20% of your code that matters most
  2. Avoid coding in isolation: create a minimum viable product to get early feedback
  3. Write code cleanly and simply to eliminate clutter 
  4. Avoid premature optimization that risks over-complicating code 
  5. Balance your goals, capacity, and feedback to achieve the productive state of Flow
  6. Apply the Do One Thing Well philosophy to vastly improve functionality
  7. Design efficient user interfaces with the Less is More principle
  8. Tie your new skills together into one unifying principle: Focus

The Python-based The Art of Clean Coding is suitable for programmers at any level, with ideas presented in a language-agnostic manner.


Related Tutorials

Programmer Humor

Q: How do you tell an introverted computer scientist from an extroverted computer scientist? A: An extroverted computer scientist looks at your shoes when he talks to you.
Posted on Leave a comment

How to Print a NumPy Array Without Scientific Notation in Python

Rate this post

Problem Formulation

» Problem Statement: Given a NumPy array. How to print the NumPy array without scientific notation in Python?

Note: Python represents very small or very huge floating-point numbers in their scientific form. Scientific notation represents the number in terms of powers of 10 to display very large or very small numbers. For example, the scientific notation for the number 0.000000321 is described as 3.21e07. 

In Python, the NumPy module generally uses scientific notation instead of the actual number while printing/displaying the array items.

Example: Look at the following code snippet:

arr = np.array([1, 5, 10, 20, 35, 5000.5])
print(arr)

Output:

[1.0000e+00 5.0000e+00 1.0000e+01 2.0000e+01 3.5000e+01 5.0005e+03]

Expected Output: Print the given array without scientific notation in Python as:

[ 1. 5. 10. 20. 35. 5000.5]

Without further ado, let’s dive into the different ways of solving the given problem.

Method 1: Using set_printoptions() Function

The set_printoptions() is a function in the numpy module that is used to set how the floating-point numbers, NumPy arrays and numpy objects are to be displayed. By default, the very big or very small numbers of the array are represented using scientific notation. We can use the set_printoptions() function by passing the suppress as True to remove the scientific notation of the numpy array.

Approach:

  • Import the Numpy module to create the array.
  • Use the set_printoptions() function and pass the suppress value as True.
  • Print the array; it will get displayed without the scientific notation.

Code:

# Importing the numpy module
import numpy as np
# Creating a NumPy array
a = np.array([1, 5, 10, 20, 35, 5000.5])
print("Numpy array with scientific notation", a)
np.set_printoptions(suppress = True)
print("Numpy array without scientific notation", a)

Output:

Numpy array with scientific notation [1.0000e+00 5.0000e+00 1.0000e+01 2.0000e+01 3.5000e+01 5.0005e+03]
Numpy array without scientific notation [ 1. 5. 10. 20. 35. 5000.5]

Discussion: The set_printoptions() function only works for the numbers that fit in the default 8-character space allotted to it, as shown below:

Code:

import numpy as np
# Array with element index 1 having 8 digits
a = np.array([5.05e-5, 15.6, 2.1445678e5])
print("Numpy array with scientific notation", a)
np.set_printoptions(suppress = True)
print("Numpy array without scientific notation", a)

Output:

Numpy array with scientific notation [5.0500000e-05 1.5600000e+01 2.1445678e+05]
Numpy array without scientific notation [ 0.0000505 15.6 214456.78 ]

When we pass a number that is greater than 8 characters wide, exponential notation is imposed as shown below:

Code:

import numpy as np
# Array with element index 1 having more than 8 digits
a = np.array([5.05e-5, 15.6, 2.1445678e10])
print("Numpy array with scientific notation", a)
np.set_printoptions(suppress = True)
print("Numpy array without scientific notation", a)

Output:

Numpy array with scientific notation [5.0500000e05 1.5600000e+01 2.1445678e+10]
Numpy array without scientific notation [5.0500000e05 1.5600000e+01 2.1445678e+10]

Method 2: Using set_printoptions() Function with .format

As in method 1, the set_printoptions() function does not work when the number has more than eight characters. That is when set_printoptions(formatter) is used to specify the options for printing and rounding. We have to set the function to print the float variable.

Python’s built-in format(value, spec) function transforms the input of one format into the output of another format defined by you. Specifically, it applies the format specifier spec to the argument value and returns a formatted representation of value. Read more about the “Python format() Function.”

Code:

import numpy as np
# Creating a NumPy array
# Array with element index 1 having more than 8 digits
a = np.array([5.05e-5, 15.6, 2.1445678e10])
print("Numpy array with scientific notation", a)
np.set_printoptions(suppress = True, formatter = {'float_kind':'{:f}'.format})
print("Numpy array without scientific notation", a)

Output:

Numpy array with scientific notation [5.0500000e-05 1.5600000e+01 2.1445678e+10]
Numpy array without scientific notation [0.000051 15.600000 21445678000.000000]

We can also format the output to only have 2 units precision by using '{:0.2f}' .format as shown below:

Code:

import numpy as np
# Array with element index 1 having more than 8 digits
a = np.array([5.05e-5, 15.6, 2.1445678e10])
print("Numpy array with scientific notation", a)
np.set_printoptions(suppress = True, formatter = {'float_kind':'{:0.2f}'.format})
print("Numpy array without scientific notation", a)

Output:

Numpy array with scientific notation [5.0500000e-05 1.5600000e+01 2.1445678e+10]
Numpy array without scientific notation [0.00 15.60 21445678000.00]

Discussion: The disadvantage of using this method to suppress the exponential notion in the numpy arrays is when the array gets a very large float value. When we try to print this array, we are going to get a whole page of numbers.

Method 3: Using printoptions() Function

The printoption() function is a function in the Numpy module used as a context manager for setting print options. By passing the precision as 3 and suppress as True in the printoptions() function, we can remove the scientific notation and print the Numpy array.

Note: This function only works if you use NumPy versions 1.15.0 or later.

Approach:

  • Import the numpy module to create the array.
  • Use the printoption() function inside the “with” and pass the precision value as 3 and the suppress value as True.
  • Print the array; it will get displayed without the scientific notation.

Code:

import numpy as np
# Creating a NumPy array
a = np.array([1, 5, 10, 20, 35, 5000.5])
print("Numpy array with scientific notation", a)
print("Numpy array without scientific notation:")
with np.printoptions(precision = 3, suppress = True): print(a)

Output:

Numpy array with scientific notation [1.0000e+00 5.0000e+00 1.0000e+01 2.0000e+01 3.5000e+01 5.0005e+03]
Numpy array without scientific notation: [ 1. 5. 10. 20. 35. 5000.5]

Method 4: Using array2string() Function

The array2string() is a function in the numpy module that returns a string representation of an array. We can use this function to print a NumPy array without scientific notation by passing the array as the argument and setting the suppress_small argument as True. When the suppress_small argument is True, it represents the numbers close to zero as zero.

Approach:

  • Import the numpy module to create the array.
  • Use the array2string() function and pass the suppress_small argument as True.
  • Finally, print the array. It will get displayed without the scientific notation.

Code:

import numpy as np
# Creating a NumPy array
a = np.array([1, 5, 10, 20, 35, 5000.5])
print("Numpy array with scientific notation", a)
a = np.array2string(a, suppress_small = True)
print("Numpy array without scientific notation:", a)

Output:

Numpy array with scientific notation [1.0000e+00 5.0000e+00 1.0000e+01 2.0000e+01 3.5000e+01 5.0005e+03]
Numpy array without scientific notation: [ 1. 5. 10. 20. 35. 5000.5]

Conclusion

Hurrah! We have successfully solved the mission-critical question in numerous ways in this article. I hope you found it helpful. Please stay tuned and subscribe for more such interesting articles. 

💎Interesting Read: How to Suppress Scientific Notation in Python?


Do you want to become a NumPy master? Check out our interactive puzzle book Coffee Break NumPy and boost your data science skills! (Amazon link opens in new tab.)

Coffee Break NumPy
Posted on Leave a comment

How to Count the Number of Unique Values in a List in Python?

Rate this post

Problem Statement: Consider that you have been given a list in Python. How will you count the number of unique values in the list?

Example: Let’s visualize the problem with the help of an example:

Given: 
li = [‘a’, ‘a’, ‘b’, ‘c’, ‘b’, ‘d’, ‘d’, ‘a’]
Output: The unique values in the given list are ‘a’, ‘b’, ‘c’, ‘d’. Thus the expected output is 4.

Now that you have a clear picture of what the question demands, let’s dive into the different ways of solving the problem.

Method 1: The Naive Approach

Approach:

  • Create an empty list that will be used to store all the unique elements from the given list. Let’s say that the name of this list res.
  • To store the unique elements in the new list that you created previously, simply traverse through all the elements of the given list with the help of a for loop and then check if each value from the given list is present in the list “res“.
    • If a particular value from the given list is not present in the newly created list then append it to the list res. This ensures that each unique value/item from the given list gets stored within res.
    • If it’s already present, then do not append the value.
  • Finally, the list res represents a newly formed list that contains all unique values from the originally given list. All that remains to be done is to find the length of the list res which gives you the number of unique values present in the given list.

Code:

# Given list
li = ['a', 'a', 'b', 'c', 'b', 'd', 'd', 'a']
res = []
for ele in li: if ele not in res: res.append(ele)
print("The count of unique values in the list:", len(res)) # The count of unique values in the list: 4

Discussion: Since you have to create an extra list to store the unique values, this approach is not the most efficient way to find and count the unique values in a list as it takes a lot of time and space.

Method 2: Using set()

A more effective and pythonic approach to solve the given problem is to use the set() method. Set is a built-in data type that does not contain any duplicate elements.

Read more about sets here – “The Ultimate Guide to Python Sets

Approach: Convert the given list into a set using the set() function. Since a set cannot contain duplicate values, only the unique values from the list will be stored within the set. Now that you have all the unique values at your disposal, you can simply count the number of unique values with the help of the len() function.

Code:

li = ['a', 'a', 'b', 'c', 'b', 'd', 'd', 'a']
s = set(li)
unique_values = len(s)
print("The count of unique values in the list:", unique_values) # The count of unique values in the list: 4

You can formulate the above solution in a single line of code by simply chaining both the functions (set() and len()) together, as shown below:

# Given list
li = ['a', 'a', 'b', 'c', 'b', 'd', 'd', 'a']
# One-liner
print("The count of unique values in the list:", len(set(li)))

Method 3: Using Dictionary fromkeys()

Python dictionaries have a method known as fromkeys() that is used to return a new dictionary from the given iterable ( such as list, set, string, tuple) as keys and with the specified value. If the value is not specified by default, it will be considered as None. 

Approach: Well! We all know that keys in a dictionary must be unique. Thus, we will pass the list to the fromkeys() method and then use only the key values of this dictionary to get the unique values from the list. Once we have stored all the unique values of the given list stored into another list, all that remains to be done is to find the length of the list containing the unique values which will return us the number of unique values.

Code:

# Given list
li = ['a', 'a', 'b', 'c', 'b', 'd', 'd', 'a']
# Using dictionary fromkeys()
# list elements get converted to dictionary keys. Keys are always unique!
x = dict.fromkeys(li)
# storing the keys of the dictionary in a list
l2 = list(x.keys())
print("Number of unique values in the list:", len(l2)) # Number of unique values in the list: 4

Method 4: Using Counter

Another way to solve the given problem is to use the Counter function from the collections module. The Counter function creates a dictionary where the dictionary’s keys represent the unique items of the list, and the corresponding values represent the count of a key (i.e. the number of occurrences of an item in the list). Once you have the dictionary all you need to do is to extract the keys of the dictionary and store them in a list and then find the length of this list.

from collections import Counter
# Given list
li = ['a', 'a', 'b', 'c', 'b', 'd', 'd', 'a']
# Creating a list containing the keys (the unique values)
key = Counter(li).keys()
# Calculating the length to get the count
res = len(key)
print("The count of unique values in the list:", res) # The count of unique values in the list: 4

Method 5: Using Numpy Module

We can also use Python’s Numpy module to get the count of unique values from the list. First, we must import the NumPy module into the code to use the numpy.unique() function that returns the unique values from the list.

Solution:

# Importing the numpy module
import numpy as np
# Given list
li = ['a', 'a', 'b', 'c', 'b', 'd', 'd', 'a']
res = []
# Using unique() function from numpy module
for ele in np.unique(li): res.append(ele)
# Calculating the length to get the count of unique elements
count = len(res)
print("The count of unique values in the list:", count) # The count of unique values in the list: 4

Another approach is to create an array using the array() function after importing the numpy module. Further, we will use the unique() function to remove the duplicate elements from the list. Finally, we will calculate the length of that array to get the count of the unique elements.

Solution:

# Importing the numpy module
import numpy as np
# Given list
li = ['a', 'a', 'b', 'c', 'b', 'd', 'd', 'a']
array = np.array(li)
u = np.unique(array)
c = len(u)
print("The count of unique values in the list:", c) # The count of unique values in the list: 4

Method 6: Using List Comprehension

There’s yet another way of solving the given problem. You can use a list comprehension to get the count of each element in the list and then use the zip() function to create a zip object that creates pairs of each item along with the count of each item in the list. Store these paired items as key-value pairs in a dictionary by converting the zip object to a dictionary using the dict() function. Finally, return the dictionary’s keys’ calculated length (using the len() function).

Code:

# Given list
li = ['a', 'a', 'b', 'c', 'b', 'd', 'd', 'a']
# List comprehension using zip()
l2 = dict(zip(li, [li.count(i) for i in li]))
# Using len to get the count of unique elements
l = len(list(l2.keys()))
print("The count of the unique values in the list:", l) # The count of the unique values in the list: 4

Conclusion

In this article, we learned the different methods to count the unique values in a list in Python. We looked at how to do this using the counter, sets, numpy module, and list comprehensions. If you found this article helpful and want to receive more interesting solutions and discussions in the future, please subscribe and stay tuned!


Python One-Liners Book: Master the Single Line First!

Python programmers will improve their computer science skills with these useful one-liners.

Python One-Liners

Python One-Liners will teach you how to read and write “one-liners”: concise statements of useful functionality packed into a single line of code. You’ll learn how to systematically unpack and understand any line of Python code, and write eloquent, powerfully compressed Python like an expert.

The book’s five chapters cover (1) tips and tricks, (2) regular expressions, (3) machine learning, (4) core data science topics, and (5) useful algorithms.

Detailed explanations of one-liners introduce key computer science concepts and boost your coding and analytical skills. You’ll learn about advanced Python features such as list comprehension, slicing, lambda functions, regular expressions, map and reduce functions, and slice assignments.

You’ll also learn how to:

  • Leverage data structures to solve real-world problems, like using Boolean indexing to find cities with above-average pollution
  • Use NumPy basics such as array, shape, axis, type, broadcasting, advanced indexing, slicing, sorting, searching, aggregating, and statistics
  • Calculate basic statistics of multidimensional data arrays and the K-Means algorithms for unsupervised learning
  • Create more advanced regular expressions using grouping and named groups, negative lookaheads, escaped characters, whitespaces, character sets (and negative characters sets), and greedy/nongreedy operators
  • Understand a wide range of computer science topics, including anagrams, palindromes, supersets, permutations, factorials, prime numbers, Fibonacci numbers, obfuscation, searching, and algorithmic sorting

By the end of the book, you’ll know how to write Python at its most refined, and create concise, beautiful pieces of “Python art” in merely a single line.

Get your Python One-Liners on Amazon!!

Posted on Leave a comment

Plotly Dash Bootstrap Card Components

Rate this post

Welcome to the bonus content of “The Book of Dash”. 🤗

💡 Here you will find additional examples of Plotly Dash components, layouts and style. To learn more about making dashboards with Plotly Dash, and how to buy your copy of “The Book of Dash”, please see the reference section at the bottom of this article.

As you read the article, feel free to run the explainer video on the Card components from one of our coauthors’ “Charming Data” YT channel:

YouTube Video

This article will focus on the Card components from the Dash Boostrap Component library. Using cards is a great way to create eye-catching content. We’ll show you how to make the card content interactive with callbacks, but first we’ll focus on the style and layout.

Plotly Dash App with a Bootstrap Card

We’ll start with the basics – a minimal Dash app to display a single card without any additional styling. Be sure to check out the complete reference for using Dash Bootstrap cards.

Next, we’ll show how to jazz it up to make it look better — and more importantly — so it conveys key information at a glance.

from dash import Dash, html
import dash_bootstrap_components as dbc app = Dash(__name__, external_stylesheets=[dbc.themes.SPACELAB, dbc.icons.BOOTSTRAP]) card = dbc.Card( dbc.CardBody( [ html.H1("Sales"), html.H3("$104.2M") ], ),
) app.layout=dbc.Container(card) if __name__ == "__main__": app.run_server(debug=True)

Styling a Dash Bootstrap Card

An easy way to style content is by using Boostrap utility classes. See all the utility classes at the Dash Bootstrap Cheatsheet app. This handy cheatsheet is made by a co-author of “The Book of Dash”.

In this card, we center the text and change the color with “text-center” and “text-success“. The Bootstrap themes have named colors and “success” is a shade of green.

👉 Recommended Resource: For more information about styling your app with a Boostrap theme, see Dash Bootstrap Theme Explorer

card = dbc.Card( dbc.CardBody( [ html.H1("Sales"), html.H3("$104.2M", className="text-success") ], ), className="text-center"
)

Feel free to watch Adam’s explainer video on Bootstrap and styling your app if you need to get up to speed! 👇

YouTube Video

Dash Bootstrap Card with Icons

You can add Bootstrap and/or Font Awesome icons to your Dash Bootstrap components. In this example, we will add the bank icon as well as change the background color using the Bootstrap utility class bg-primary.

card = dbc.Card( dbc.CardBody( [ html.H1([html.I(className="bi bi-bank me-2"), "Profit"]), html.H3("$8.3M"), html.H4(html.I("10.3% vs LY", className="bi bi-caret-up-fill text-success")), ], ), className="text-center m-4 bg-primary text-white",
)

To learn more, see the Icons section of the dash-bootstrap-components documentation. You can also find more information about adding icons to dash components in the buttons article.

👉 Recommended Tutorial: Plotly Dash Button Component – A Simple Illustrated Guide

Dash Bootstrap Cards Side-by-Side

In business intelligence dashboards, it’s common to highlight KPIs or Key Performance Indicators in a group of cards. You can find many examples in the Plotly App Gallery:

This app places three KPI cards side-by-side. We use the dbc.Row and dbc.Col components to create this responsive card layout. When you run this app, try changing the width of the browser window to see how the cards expand to fill the row based on the screen size.

This app also demonstrates the usage of Bootstrap border utility classes to add and style a border. Here we add a border on the left and change the color to highlight the results. Another trick is to use the “text-nowrap” class to keep the icon and the text together on the same line when the cards shrink to accommodate small screen sizes.

from dash import Dash, html
import dash_bootstrap_components as dbc app = Dash(__name__, external_stylesheets=[dbc.themes.SPACELAB, dbc.icons.BOOTSTRAP]) card_sales = dbc.Card( dbc.CardBody( [ html.H1([html.I(className="bi bi-currency-dollar me-2"), "Sales"], className="text-nowrap"), html.H3("$106.7M"), html.Div( [html.I("5.8%", className="bi bi-caret-up-fill text-success"), " vs LY",] ), ], className="border-start border-success border-5" ), className="text-center m-4"
) card_profit = dbc.Card( dbc.CardBody( [ html.H1([html.I(className="bi bi-bank me-2"), "Profit"], className="text-nowrap"), html.H3("$8.3M",), html.Div( [ html.I("12.3%", className="bi bi-caret-down-fill text-danger"), " vs LY", ] ), ], className="border-start border-danger border-5" ), className="text-center m-4",
) card_orders = dbc.Card( dbc.CardBody( [ html.H1([html.I(className="bi bi-cart me-2"), "Orders"], className="text-nowrap"), html.H3("91.4K"), html.Div( [ html.I("10.3%", className="bi bi-caret-up-fill text-success"), " vs LY", ] ), ], className="border-start border-success border-5" ), className="text-center m-4",
) app.layout = dbc.Container( dbc.Row( [dbc.Col(card_sales), dbc.Col(card_profit), dbc.Col(card_orders)], ), fluid=True,
) if __name__ == "__main__": app.run_server(debug=True)

Creating Dash Bootstrap Cards in a Loop

In the previous example, notice that a lot of the code for creating the card is the same. To reduce the amount of repetitive code, let’s create cards in a function.

In this app, we introduce the dbc.CardHeader component and the "shadow" class to style the card. We’ll show you how to add more style later in the app that displays crypto prices.

from dash import Dash, html
import dash_bootstrap_components as dbc app = Dash(__name__, external_stylesheets=[dbc.themes.SPACELAB]) summary = {"Sales": "$100K", "Profit": "$5K", "Orders": "6K", "Customers": "300"} def make_card(title, amount): return dbc.Card( [ dbc.CardHeader(html.H2(title)), dbc.CardBody(html.H3(amount, id=title)), ], className="text-center shadow", ) app.layout = dbc.Container( dbc.Row([dbc.Col(make_card(k, v)) for k, v in summary.items()], className="my-4"), fluid=True,
) if __name__ == "__main__": app.run_server(debug=True)

Dash Bootstrap Card with an Image

This card uses the dbc.CardImage component. This is a great format for the “who’s who” section of your app. It works well for displaying information about products too.

from dash import Dash, html
import dash_bootstrap_components as dbc app = Dash(__name__, external_stylesheets=[dbc.themes.SPACELAB]) count = "https://user-images.githubusercontent.com/72614349/194616425-107a62f9-06b3-4b84-ac89-2c42e04c00ac.png" card = dbc.Card([ dbc.CardImg(src=count, top=True), dbc.CardBody( [ html.H3("Count von Count", className="text-primary"), html.Div("Chief Financial Officer"), html.Div("Sesame Street, Inc.", className="small"), ] )], className="shadow my-2", style={"maxWidth": 350},
) app.layout=dbc.Container(card) if __name__ == "__main__": app.run_server(debug=True)

Dash Bootstrap Card with an Image and a Link

This app has a card with the dbc.CardLink component.

When you run this app, try clicking on either the logo or the title. You will see that both are links to the Plotly site displaying the current job openings.

We do this by including both the html.Img component with the Plotly logo and the html.Span with the title in the dbc.CardLink component.

from dash import Dash, html
import dash_bootstrap_components as dbc app = Dash(__name__, external_stylesheets=[dbc.themes.SPACELAB]) plotly_logo_dark = "https://user-images.githubusercontent.com/72614349/182967824-c73218d8-acbf-4aab-b1ad-7eb35669b781.png" card = dbc.Card( dbc.CardBody( [ dbc.CardLink( [ html.Img(src=plotly_logo_dark, height=65), html.Span("Plotly Job Openings", className="ms-2") ], className="text-decoration-none h2", href="https://plotly.com/careers/" ), html.Hr(), html.Div("Engineering", className="h3"), html.Div("Intermediate Backend Engineer", className="text-danger"), html.Div("Remote, Canada", className="small"), ] ), className="shadow my-2", style={"maxWidth": 450},
) app.layout=dbc.Container(card) if __name__ == "__main__": app.run_server(debug=True)

Dash Bootstrap Card with a Background Image

This app puts the image in the background and uses the dbc.CardImgOverlay component to place content on top of the image.

We also use dbc.Buttons to link to other sites for more information. See the buttons article for more information. Be sure to run the app and check out the links. The Webb Telescope app is pretty cool!

👉 Recommended Tutorial: Before After Image in Plotly Dash

from dash import Dash, html
import dash_bootstrap_components as dbc app = Dash(__name__, external_stylesheets=[dbc.themes.SPACELAB, dbc.icons.BOOTSTRAP]) webb_deep_field = "https://user-images.githubusercontent.com/72614349/192781103-2ca62422-2204-41ab-9480-a730fc4e28d7.png"
card = dbc.Card( [ dbc.CardImg(src=webb_deep_field), dbc.CardImgOverlay([ html.H2("James Webb Space Telescope"), html.H3("First Images"), html.P( "Learn how to make an app to compare before and after images of Hubble vs Webb with ~40 lines of Python", style={"marginTop":175}, className="small", ), dbc.Button("See the App", href="https://jwt.pythonanywhere.com/"), dbc.Button( [html.I(className="bi bi-github me-2"), "source code"], className="ms-2 text-white", href="https://github.com/AnnMarieW/webb-compare", ) ]) ], style={"maxWidth": 500}, className="my-4 text-center text-white"
) app.layout=dbc.Container(card) if __name__ == "__main__": app.run_server(debug=True)

See this Plotly Dash app live: https://jwt.pythonanywhere.com/

Plotly Dash App with Live Updates

This app shows live updates of crypto prices. We use a dcc.Interval component to fetch the data from CoinGecko every 6 seconds.

The CoinGecko API is easy to use because you don’t need an API key, and it’s free if you keep the number of updates within the free tier limits. We pull the current price, 24 hour price change, and the coin logo from the data feed and display the data in a nicely styled card.

In this app we introduce callbacks to update the data, and show how to get the data from CoinGecko. All the other styling has been covered in previous examples.

Note that in this app, the color of the text and the up and down arrows are updated dynamically based on the data in the make_card function.

import dash
from dash import Dash, dcc, html, Input, Output
import dash_bootstrap_components as dbc
import requests app = Dash(__name__, external_stylesheets=[dbc.themes.SUPERHERO, dbc.icons.BOOTSTRAP]) coins = ["bitcoin", "ethereum", "binancecoin", "ripple"]
interval = 6000 # update frequency - adjust to keep within free tier
api_url = "https://api.coingecko.com/api/v3/coins/markets?vs_currency=usd" def get_data(): try: response = requests.get(api_url, timeout=1) return response.json() except requests.exceptions.RequestException as e: print(e) def make_card(coin): change = coin["price_change_percentage_24h"] price = coin["current_price"] color = "danger" if change < 0 else "success" icon = "bi bi-arrow-down" if change < 0 else "bi bi-arrow-up" return dbc.Card( html.Div( [ html.H4( [ html.Img(src=coin["image"], height=35, className="me-1"), coin["name"], ] ), html.H4(f"${price:,}"), html.H5( [f"{round(change, 2)}%", html.I(className=icon), " 24hr"], className=f"text-{color}", ), ], className=f"border-{color} border-start border-5", ), className="text-center text-nowrap my-2 p-2", ) mention = html.A( "Data from CoinGecko", href="https://www.coingecko.com/en/api", className="small"
)
interval = dcc.Interval(interval=interval)
cards = html.Div()
app.layout = dbc.Container([interval, cards, mention], className="my-5") @app.callback(Output(cards, "children"), Input(interval, "n_intervals"))
def update_cards(_): coin_data = get_data() if coin_data is None or type(coin_data) is dict: return dash.no_update # make a list of cards with updated prices coin_cards = [] updated = None for coin in coin_data: if coin["id"] in coins: updated = coin.get("last_updated") coin_cards.append(make_card(coin)) # make the card layout card_layout = [ dbc.Row([dbc.Col(card, md=3) for card in coin_cards]), dbc.Row(dbc.Col(f"Last Updated {updated}")), ] return card_layout if __name__ == "__main__": app.run_server(debug=True)

Plotly Dash App with a Sidebar

A common layout for Dash apps is to put inputs in a sidebar, and the output in the main section of the page. We can place both the sidebar and the output in Dash Boostrap Card components.

See the app and the code live at the Dash Example Index

Plotly Dash Example Index

See more examples of interactive apps in the Dash Example Index

Reference

Order Your Copy of “The Book of Dash” Today!

The Book Of Dash

The Book of Dash Authors

Feel free to learn more about the book’s coauthors here:

Ann Marie Ward:

Adam Schroeder:

Chris Mayer:


Posted on Leave a comment

Python TypeError: ‘dict_keys’ Not Subscriptable (Fix This Stupid Bug)

5/5 – (1 vote)

Do you encounter the following error message?

TypeError: 'dict_keys' object is not subscriptable

You’re not alone! This short tutorial will show you why this error occurs, how to fix it, and how to never make the same mistake again.

So, let’s get started!

Solution

Python raises the “TypeError: 'dict_keys' object is not subscriptable” if you use indexing or slicing on the dict_keys object obtained with dict.keys(). To solve the error, convert the dict_keys object to a list such as in list(my_dict.keys())[0].

print(list(my_dict.keys())[0])

Example

The following minimal example that leads to the error:

d = {1:'a', 2:'b', 3:'c'}
print(d.keys()[0])

Output:

Traceback (most recent call last): File "C:\Users\...\code.py", line 2, in <module> print(d.keys()[0])
TypeError: 'dict_keys' object is not subscriptable

Note that the same error message occurs if you use slicing instead of indexing:

d = {1:'a', 2:'b', 3:'c'}
print(d.keys()[:-1]) # <== same error

Fixes

The reason this error occurs is that the dictionary.keys() method returns a dict_keys object that is not subscriptable.

You can use the type() function to check it for yourself:

print(type(d.keys()))
# <class 'dict_keys'>

Note: You cannot expect dictionary keys to be ordered, so using indexing on a non-ordered type wouldn’t make too much sense, would it? ⚡

You can fix the non-subscriptable TypeError by converting the non-indexable dict_keys object to an indexable container type such as a list in Python using the list() or tuple() function.

Here’s an example fix:

d = {1:'a', 2:'b', 3:'c'}
print(list(d.keys())[0])
# 1

Here’s an other example fix:

d = {1:'a', 2:'b', 3:'c'}
print(tuple(d.keys())[:-1])
# (1, 2)

Both lists and tuples are subscriptable so you can use indexing and slicing after converting the dict_keys object to a list or a tuple.

🌍 Full Guide: Python Fixing This Subsctiptable Error (General)

Summary

Python raises the TypeError: 'dict_keys' object is not subscriptable if you try to index x[i] or slice x[i:j] a dict_keys object.

The dict_keys type is not indexable, i.e., it doesn’t define the __getitem__() method. You can fix it by converting the dictionary keys to a list using the list() built-in function.

Alternatively, you can also fix this by removing the indexing or slicing call, or defining the __getitem__ method. Although the previous approach is often better.

What’s Next?

I hope you’d be able to fix the bug in your code! Before you go, check out our free Python cheat sheets that’ll teach you the basics in Python in minimal time:

If you struggle with indexing in Python, have a look at the following articles on the Finxter blog—especially the third!

🌍 Related Articles:

Posted on Leave a comment

Python – Return NumPy Array From Function

5/5 – (1 vote)

Do you need to create a function that returns a NumPy array but you don’t know how? No worries, in sixty seconds, you’ll know! Go! 🚀

A Python function can return any object such as a NumPy Array. To return an array, first create the array object within the function body, assign it to a variable arr, and return it to the caller of the function using the keyword operation “return arr“.

👉 Recommended Tutorial: How to Initialize a NumPy Array? 6 Easy Ways

Create and Return 1D Array

For example, the following code creates a function create_array() of numbers 0, 1, 2, …, 9 using the np.arange() function and returns the array to the caller of the function:

import numpy as np def create_array(): ''' Function to return array ''' return np.arange(10) numbers = create_array()
print(numbers)
# [0 1 2 3 4 5 6 7 8 9]

The np.arange([start,] stop[, step]) function creates a new NumPy array with evenly-spaced integers between start (inclusive) and stop (exclusive).

The step size defines the difference between subsequent values. For example, np.arange(1, 6, 2) creates the NumPy array [1, 3, 5].

To better understand the function, have a look at this video:

YouTube Video

I also created this figure to demonstrate how NumPy’s arange() function works on three examples:

In the code example, we used np.arange(10) with default start=0 and step=1 only specifying the stop=10 argument.

If you need an even deeper understanding, I’d recommend you check out our full guide on the Finxter blog.

👉 Recommended Tutorial: NumPy Arange Function — A Helpful Illustrated Guide

Create and Return 2D NumPy Array

You can also create a 2D (or multi-dimensional) array in a Python function by first creating a 2D or (xD) nested list and converting the nested list to a NumPy array by passing it into the np.array() function.

The following code snippet uses nested list comprehension to create a 2D NumPy array following a more complicated creation pattern:

import numpy as np def create_array(a,b): ''' Function to return array ''' lst = [[(i+j)**2 for i in range(a)] for j in range(b)] return np.array(lst) arr = create_array(4,3)
print(arr)

Output:

[[ 0 1 4 9] [ 1 4 9 16] [ 4 9 16 25]]

I definitely recommend reading the following tutorial to understand nested list comprehension in Python:

👉 Recommended Tutorial: Nested List Comprehension in Python

More Ways

There are many other ways to return an array in Python.

For example, you can use either of those methods inside the function body to create and initialize a NumPy array:

To get a quick overview what to put into the function and how these methods work, I’d recommend you check out our full tutorial.

👉 Recommended Tutorial: How to Initialize a NumPy Array? 6 Easy Ways

Related Tutorials

Programmer Humor

Q: How do you tell an introverted computer scientist from an extroverted computer scientist? A: An extroverted computer scientist looks at your shoes when he talks to you.
Posted on Leave a comment

Can a Python Dictionary Have a List as a Value?

5/5 – (1 vote)

Question

💬 Question: Can you use lists as values of a dictionary in Python?

This short article will answer your question. So, let’s get started right away with the answer:

Answer

You can use Python lists as dictionary values. In fact, you can use arbitrary Python objects as dictionary values and all hashable objects as dictionary keys. You can define a list [1, 2] as a dict value either with dict[key] = [1, 2] or with d = {key: [1, 2]}.

Here’s a concrete example showing how to create a dictionary friends where each dictionary value is in fact a list of friends:

friends = {'Alice': ['Bob', 'Carl'], 'Bob': ['Alice'], 'Carl': []} print('Alice friends: ', friends['Alice'])
# Alice friends: ['Bob', 'Carl'] print('Bob friends: ', friends['Bob'])
# Bob friends: ['Alice'] print('Carl friends: ', friends['Carl'])
# Carl friends: []

Note that you can also assign lists as values of specific keys by using the dictionary assignment operation like so:

friends = dict()
friends['Alice'] = ['Bob', 'Carl']
friends['Bob'] = ['Alice']
friends['Carl'] = [] print('Alice friends: ', friends['Alice'])
# Alice friends: ['Bob', 'Carl'] print('Bob friends: ', friends['Bob'])
# Bob friends: ['Alice'] print('Carl friends: ', friends['Carl'])
# Carl friends: []

Can I Use Lists as Dict Keys?

You cannot use lists as dictionary keys because lists are mutable and therefore not hashable. As dictionaries are built on hash tables, all keys must be hashable or Python raises an error message.

Here’s an example:

d = dict()
my_list = [1, 2, 3]
d[my_list] = 'abc'

This leads to the following error message:

Traceback (most recent call last): File "C:\Users\xcent\Desktop\code.py", line 3, in <module> d[my_list] = 'abc'
TypeError: unhashable type: 'list'

To fix this, convert the list to a Python tuple and use the Python tuple as a dictionary key. Python tuples are immutable and hashable and, therefore, can be used as set elements or dictionary keys.

Here’s the same example after converting the list to a tuple—it works! 🎉

d = dict()
my_list = [1, 2, 3]
my_tuple = tuple(my_list)
d[my_tuple] = 'abc'

Before you go, maybe you want to join our free email academy of ambitious learners like you? The goal is to become 1% better every single day (as a coder). We also have cheat sheets! 👇

Posted on Leave a comment

Python Return String From Function

5/5 – (1 vote)

Do you need to create a function that returns a string but you don’t know how? No worries, in sixty seconds, you’ll know! Go! 🚀

A Python function can return any object such as a string. To return a string, create the string object within the function body, assign it to a variable my_string, and return it to the caller of the function using the keyword operation return my_string. Or simply create the string within the return expression like so: return "hello world"

def f(): return 'hello world' f()
# hello world

Create String in Function Body

Let’s have a look at another example:

The following code creates a function create_string() that iterates over all numbers 0, 1, 2, …, 9, appends them to the string my_string, and returns the string to the caller of the function:

def create_string(): ''' Function to return string ''' my_string = '' for i in range(10): my_string += str(i) return my_string s = create_string()
print(s)
# 0123456789

Note that you store the resulting string in the variable s. The local variable my_string that you created within the function body is only visible within the function but not outside of it.

So, if you try to access the name my_string, Python will raise a NameError:

>>> print(my_string)
Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> print(my_string)
NameError: name 'my_string' is not defined

To fix this, simply assign the return value of the function — a string — to a new variable and access the content of this new variable:

>>> s = create_string()
>>> print(s)
0123456789

There are many other ways to return a string in Python.

Return String With List Comprehension

For example, you can use a list comprehension in combination with the string.join() method instead that is much more concise than the previous code—but creates the same string of digits:

def create_string(): ''' Function to return string ''' return ''.join([str(i) for i in range(10)]) s = create_string()
print(s)
# 0123456789

For a quick recap on list comprehension, feel free to scroll down to the end of this article.

You can also add some separator strings like so:

def create_string(): ''' Function to return string ''' return ' xxx '.join([str(i) for i in range(10)]) s = create_string()
print(s)
# 0 xxx 1 xxx 2 xxx 3 xxx 4 xxx 5 xxx 6 xxx 7 xxx 8 xxx 9

Return String with String Concatenation

You can also use a string concatenation and string multiplication statement to create a string dynamically and return it from a function.

Here’s an example of string multiplication:

def create_string(): ''' Function to return string ''' return 'ho' * 10 s = create_string()
print(s)
# hohohohohohohohohoho

String Concatenation of Function Arguments

Here’s an example of string concatenation that appends all arguments to a given string and returns the result from the function:

def create_string(a, b, c): ''' Function to return string ''' return 'My String: ' + a + b + c s = create_string('python ', 'is ', 'great')
print(s)
# My String: python is great

Concatenate Arbitrary String Arguments and Return String Result

You can also use dynamic argument lists to be able to add an arbitrary number of string arguments and concatenate all of them:

def create_string(*args): ''' Function to return string ''' return ' '.join(str(x) for x in args) print(create_string('python', 'is', 'great'))
# python is great print(create_string(42, 41, 40, 41, 42, 9999, 'hi'))
# 42 41 40 41 42 9999 hi

Background List Comprehension

💡 Knowledge: List comprehension is a very useful Python feature that allows you to dynamically create a list by using the syntax [expression context]. You iterate over all elements in a given context “for i in range(10)“, and apply a certain expression, e.g., the identity expression i, before adding the resulting values to the newly-created list.

In case you need to learn more about list comprehension, feel free to check out my explainer video:

YouTube Video

Programmer Humor

Q: How do you tell an introverted computer scientist from an extroverted computer scientist? A: An extroverted computer scientist looks at your shoes when he talks to you.
Posted on Leave a comment

Deep Forecasting Bitcoin with LSTM Architectures

5/5 – (1 vote)

Although Neural Networks do a tremendous job learning rules in tabular, structured data, it leaves a great deal to be desired in terms of ‘unstructured’ data. And there we come to a new concept: Recurrent Neural Networks.

YouTube Video

Recurrent Neural Network

A Recurrent Neural Network is to a Feedforward Neural Network as a single object is to a list: it may be thought as a set of interrelated feedforward networks, or a looped network.

It is specialized in picking up and highlighting the main characteristics of your data (more on that in Andrej Karpathy’s Blog). They are often followed by a Feed Forward (Dense) Layer which will weigh the output.

Long Short-Term Memory

Long Short-Term Memory (LSTM) clusters have the extra special ability to deal with time (more on it can be found in Colah’s article).

As the term memory suggests, its greatest promise is to understand correlations between past and present events. In particular, they fit naturally in time series forecasts.

Here we aim at a hands-on introduction to several LSTM-based architectures (and more is to come 😉).

Article Overview

We use Bitcoin daily closing price as a case study. Specifically, we use the Bitcoin price and sentiment analysis we have gathered in a previous article. We use TensorFlow‘s Keras API for the implementation.

In this article will aim at the following architectures:

  1. ‘Vanilla’ LSTM
  2. Stacked LSTM
  3. Bidirectional LSTM
  4. Encoder-Decoder LSTM-LSTM
  5. Encoder-Decoder CNN-LSTM

The last one being the more convoluted (pun intended).

There is one main issue dealing with time series, which is the implementation of the problem. Are common situation both having only the historical target value alone (univariate problem) or together with other information (multivariate problem).

Moreover, you might be interested in one-step prediction or a multi-step prediction, i.e., predicting only the next day or, say, all days in the next week. Although it doesn’t sound so, you have to adjust your model to whatever situation you are facing. 

Think of how you would deal with a multivariate multi-step problem: should you train a one-step model and forecast all features in order to feed your model to predict the following days? That would be a crazy!

Kaggle’s time series course does a good job introducing the several strategies present to deal with multi-step prediction. Fortunately, setting an LSTM network for a multi-step multivariate problem is as easy as setting it for a univariate one-step problem – you just need to change two numbers.

This is another advantage of Neural Networks, apart from its capacity of memory. 

Of course, the architecture list above is not exhaustive. For instance, a new Attention layer was recently introduced, which has been working wonders. We shall come back to it in a next article, where we will walk through a hybrid Attention-CLX model.

Credits to ML Mastery blog for part of the code. 

🚫 Disclaimer: This article is a programming/data analysis tutorial only and is not intended to be any kind of investment advice.

How to Prepare the Data for LSTM?

We will use two sources of data, both explicit in our previous article: the SentiCrypt‘s Bitcoin sentiment analysis and Bitcoin’s daily closing price (by following the steps in the previous article, you can do it differently, using a minute-base data, for example).

Let us load the already-saved sentiment analysis and download the Bitcoin price:

import pandas as pd
import yfinance as yf sentic = pd.read_csv('sentic.csv', index_col=0, parse_dates=True)
sentic.index.freq='D' btc = yf.download('BTC-USD', start='2020-02-14', end='2022-09-23', period='1d')[['Close']]
btc.columns = ['btc'] data = pd.concat([sentic,btc], axis=1) data

The LSTM layer expects a 3D array as input whose shape represents:

(data_size, timesteps, number_of_features).

Meaning, the first and last elements are the number of rows and columns from the input data, respectively. The timestep argument is the size of the time chunk you want your LSTM to process at a time. This will be the time frame the LSTM will look for relations between past and present. It is essentially the size of its (long short-term) memory.

To decide how many time-steps, we recall our first time series article where we explored partial auto-correlations of Bitcoin price’s lags.

That is easily achieved through statsmodels:

from statsmodels.graphics.tsaplots import plot_pacf
import matplotlib.pyplot as plt plot_pacf(data.btc, lags=20)
plt.show()

If you were there, in the first article, with me, you might remember our curious 10-lags correlation. Here we use this magic number and feed the model with a 10 days frame and to make a 5 days prediction. I found the results with 10 days better than for 6 or 20 days (for most cases – see below for more about this). We also assume we have today’s data and try to forecast the next 5 days.

An easy way to accomplish the reshaping of the data is through (a slight modification) of our make_lags function together with NumPy’s reshape() method.

So, instead of a Series, we will take a DataFrame as input and will output a concatenation of the original frame with its respective lags. We use negative lags to prepare the target DataFrame. We will ignore observations with the produced NaN values and will use the align method to align their indexes. 

def make_lags(df, n_lags=1, lead_time=1): """ Compute lags of a pandas.DataFrame from lead_time to lead_time + n_lags. Alternatively, a list can be passed as n_lags. Returns a pd.DataFrame resulting from the concatenation of df's shifts. """ if isinstance(n_lags,int): lag_list = range(lead_time, n_lags+lead_time) else: lag_list = n_lags lags=list() for i in lag_list: df_lag = df.shift(i) if i!=0: df_lag.columns = [f'{col}_lag_{i}' for col in df.columns] lags.append(df_lag) return pd.concat(lags, axis=1) X = make_lags(data, n_lags=20, lead_time=0).dropna()
y = make_lags(data[['btc']], n_lags=range(-5,0)).dropna() X, y = X.align(y, join='inner', axis=0)

Next, we train-test split the data with sklearn, taking 10% as test size. As usual for time series, we include shuffle=False as a parameter.

from sklearn.model_selection import train_test_split X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=.1, shuffle=False)

Before proceeding, it is good practice to normalize the data before feeding it into a Neural Network. We do it now, before things get 3D.

from sklearn.preprocessing import MinMaxScaler mms = MinMaxScaler().fit(X_train)
X_train, X_val = mms.transform(X_train), mms.transform(X_val)

Finally, we use NumPy to reshape everything to 3D arrays. Observe that there is not such a thing as a 3D pd.DataFrame.

import numpy as np def add_dim(df, timesteps=5): """ Transforms a pd.DataFrame into a 3D np.array with shape (n_samples, timesteps, n_features) """ df = np.array(df) array_3d = df.reshape(df.shape[0],timesteps ,df.shape[1]//timesteps) return array_3d X_train, X_val = map(add_dim, [X_train, X_val], [timesteps]*2)

Of course, you can always prepare a function to do everything in one shot:

def prepare_data(df, target_name, n_lags, n_steps, lead_time, test_size, normalize=True): ''' Prepare data for LSTM. ''' if isinstance(n_steps,int): n_steps = range(1,n_steps+1) n_steps = [-x for x in list(n_steps)] X = make_lags(df, n_lags=n_lags, lead_time=lead_time).dropna() y = make_lags(df[[target_name]], n_lags=n_steps).dropna() X, y = X.align(y, join='inner', axis=0) from sklearn.model_selection import train_test_split X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=test_size, shuffle=False) if normalize: from sklearn.preprocessing import MinMaxScaler mms = MinMaxScaler().fit(X_train) X_train, X_val = mms.transform(X_train), mms.transform(X_val) if isinstance(n_lags,int): timesteps = n_lags else: timesteps = len(n_lags) return add_dim(X_train,timesteps), add_dim(X_val,timesteps), y_train, y_val

Note that one should give positive values to n_steps to have the right negative shifts. Fortunately, y_train, y_val are not reshaped, which makes life easier when comparing predictions with reality.

All set, let’s start with the most basic Vanilla model.

💡 Side note: We are keeping things simple here, but in a future post, we will prepare our own batches and explore better the stateful parameter of an LSTM layer. More on its input and output can be found in Mohammad’s Git.

How to Implement Vanilla LSTM with Keras?

A model is called Vanilla when it has no additional structure apart from the output layer.

To implement it we add an LSTM and a Dense layer. We must pass the number of units of each and the input shape for the LSTM layer.

The input shape is exactly (n_timesteps, n_features) which can be inferred from X_train.shape. The number of units for the LSTM layer is a hyperparameter and shall be tuned, for the Dense layer it is the number of outputs we want. Therefore 5.

Next follows a hypertuning-friendly code, specifying the main parameters in advance. 

from keras.models import Sequential
from keras.layers import Dense, LSTM # Data preparation
n_lags, n_steps, lead_time, test_size = 10, 5, 0, .2 # hyperparameters
epochs, batch_size, verbose = 50, 72, 0 model_params = {} # preparing data
X_train, X_val, y_train, y_val = prepare_data(data, 'btc', n_lags, n_steps, lead_time, test_size) # model architecture
vanilla = Sequential()
vanilla.add(LSTM(units=200, activation='relu', input_shape=(X_train.shape[1],X_train.shape[2]) ))
vanilla.add(Dense(n_steps))

The model_params dictionary will be useful for including additional parameters to the compile method, such as an EarlyStopping callback. 

We also write a function that fits the model, plot and assess predictions. The present code does not output anything, so, feel free to change it in order to do so. We fix the optimizer as Adam and the loss metric as Mean Squared Error.

def fit_model(model, learning_rate=0.001, time_distributed=False, epochs=epochs, batch_size=batch_size, verbose=verbose): y_ind = y_val.index if time_distributed: y_train_0 = y_train.to_numpy().reshape((y_train.shape[0], y_train.shape[1],1)) y_val_0 = y_val.to_numpy().reshape((y_val.shape[0], y_val.shape[1],1)) else: y_train_0 = y_train y_val_0 = y_val # fit network from keras.optimizers import Adam adam = Adam(learning_rate=learning_rate) model.compile(loss='mse', optimizer='adam') history = model.fit(X_train, y_train_0, epochs=epochs, batch_size=batch_size, verbose=verbose, **model_params, validation_data=(X_val, y_val_0), shuffle=False) # make a prediction if time_distributed: predictions = model.predict(X_val)[:,:,0] else: predictions = model.predict(X_val) yhat = pd.DataFrame(predictions, index=y_ind, columns=[f'pred_lag_{i}' for i in range(-n_steps,0)]) yhat_shifted = pd.concat([yhat.iloc[:,i].shift(-n_steps+i) for i in range(len(yhat.columns))], axis=1) # calculate RMSE from sklearn.metrics import mean_squared_error, r2_score rmse = np.sqrt(mean_squared_error(y_val, yhat)) import matplotlib.pyplot as plt fig, (ax1,ax2) = plt.subplots(2,1,figsize=(14,14)) y_val.iloc[:,0].plot(ax=ax2,legend=True) yhat_shifted.plot(ax=ax2) ax2.set_title('Prediction comparison') ax2.annotate(f'RMSE: {rmse:.5f} \n R2 score: {r2_score(yhat,y_val):.5f}', xy=(.68,.93), xycoords='axes fraction') ax1.plot(history.history['loss'], label='train') ax1.plot(history.history['val_loss'], label='test') ax1.legend() plt.show()

The time_distributed parameter will be used in the last two architectures.

I opted to set a manual learning_rate since once the Stacked LSTM’s output was an array of NaNs. After figuring out that the gradient descent was not converging, that was fixed by decreasing Adam’s learning rate.

Use verbose=1 as a global parameter to debug your network.

Without further ado:

fit_model(vanilla)

The performance is comparable to our XGBoost 1-day prediction in the last article:

Moreover, we are predicting 5 days, not only one, making the r2 score more impressive.

What bothers me, on the other hand, is the fact the predictions for all five days look identical. It requires further analysis to understand why that is happening, which we will not do here.

How to Build a Stacked LSTM?

We also can queue two LSTM layers.

To this aim, we need to be careful to give a 3D input to the second LSTM layer and that is the role the parameter return_sequences plays. We gain a slight increase in the training score in this case.

# model architecture
stacked = Sequential()
stacked.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(X_train.shape[1],X_train.shape[2])))
stacked.add(LSTM(100, activation='relu'))
stacked.add(Dense(n_steps)) fit_model(stacked)

What is a Bidirectional LSTM Layer?

In general, any RNN within minimal requirements can be made bidirectional through Keras’ Bidirectional layer. It stacks two copies of your RNN layer, making one backward. 

Image from AIM.

You can either specify the backward_layer as a second RNN layer or just wrap a single one, which will make the Bidirectional instance use a copy as the backward model. An implementation can be found below.

The score is comparable to the Stacked LSTM.

from keras.layers import Bidirectional bilstm = Sequential()
bilstm.add(Bidirectional(LSTM(100, activation='relu'), input_shape=(X_train.shape[1], X_train.shape[2])))
bilstm.add(Dense(n_steps)) fit_model(bilstm)

Encoder-Decoder LSTM

An Encoder-Decoder structure is designed in a way you have one network dedicated to feature selection and a second one to the actual forecast. The architectures used can be of different types; even of recurrent-non recurrent pairs are allowed.

Here we explore two pairs: LSTM-LSTM and CNN-LSTM. 

Compared to the previous presented architectures, the main difference is the inclusion of the RepeatVector layer and the wrapper TimeDistributed.

Although the RepeatVector is smoothly included, the TimeDistributed layer needs some care. It wraps a layer object and has the duty to apply a copy of each to each temporal slice imputed into it. It considers the .shape[1] of the first input as the temporal dimension (our prepare_data is in accordance to that).

Moreover, one has to watch out since it outputs a 3D array, in particular our model will output 3D predictions.

For this reason, we have to feed the model with reshaped y_val, y_train so that the loss functions can be computed. Fortunately, we already included the time_distributed parameter in the fit_model to deal with the reshaping.

We also increase the number of Epochs since these networks seem to take longer to find a minimum. We include an EarlyStopping though. It already gives an astonishing score!

from keras.layers import RepeatVector, TimeDistributed # Data preparation
n_lags, n_steps, lead_time, test_size = 10, 5, 0, .2 # hyperparameters
epochs, batch_size, verbose = 300, 32, 0 model_params = {'callbacks':[EarlyStopping( monitor="val_loss", patience=20, mode="auto")]} # preparing data
X_train, X_val, y_train, y_val = prepare_data(data, 'btc', n_lags, n_steps, lead_time, test_size, normalize=True) # Encoder
lstmlstm = Sequential()
lstmlstm.add(LSTM(100, activation='relu', input_shape=(X_train.shape[1], X_train.shape[2])))
lstmlstm.add(RepeatVector(n_steps)) # Decoder
lstmlstm.add(LSTM(100, activation='relu', return_sequences=True))
lstmlstm.add(TimeDistributed(Dense(n_steps))) fit_model(lstmlstm, time_distributed=True)

This is the first time the steps outputs are visibly different from each other.

Nevertheless, it seems to be following some trend. In theory, the NN should be so powerful that it can capture trends as well. However, in practice detrending often gives better results. Nevertheless, 0.82 is a massive increase from our 0.32 XGBoost. 

Encoder-Decoder CNN-LSTM Network

The last architecture we present is the CNN-LSTM one.

Here a Convolutional Neural Network is used as a feature selector, being well-known to perform well in this role for photos and videos.

The main reason they are so useful in this case is mathematical: the convolutional part of CNN’s name refers to the convolution operation in mathematics, which is used to emphasize translation-invariant features.

That makes complete sense when you have a photo, since you want your mobile phone to recoginze Toto as a dog, independent if it is in the lower-left corner or in the upper-center of the picture (of course your dog’s name is Toto, right?). You may recognize the CNN action as the smoothed lines in the graph. 

from keras.layers import RepeatVector, TimeDistributed, Conv1D, MaxPooling1D, Flatten # Data preparation
n_lags, n_steps, lead_time, test_size = 10, 5, 0, .2 # hyperparameters
epochs, batch_size, verbose = 300, 32, 0 model_params = {'callbacks':[EarlyStopping( monitor="val_loss", patience=20, mode="auto")]} # preparing data
X_train, X_val, y_train, y_val = prepare_data(data, 'btc', n_lags, n_steps, lead_time, test_size) # Encoder
cnn_lstm = Sequential()
cnn_lstm.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(X_train.shape[1], X_train.shape[2])))
cnn_lstm.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
cnn_lstm.add(MaxPooling1D(pool_size=2))
cnn_lstm.add(Flatten())
cnn_lstm.add(RepeatVector(n_steps)) # Decoder
cnn_lstm.add(LSTM(200, activation='relu', return_sequences=True))
cnn_lstm.add(TimeDistributed(Dense(100, activation='relu')))
cnn_lstm.add(TimeDistributed(Dense(n_steps))) fit_model(cnn_lstm, time_distributed=True)

Extra Perks

For the sake of completion, we tweaked the code around a bit.

Do you remember the seemly significant correlation popped up in the 20-days lags? Well, increasing from 10 to 20 timesteps actually increases the R2 score in the last model:

Funnily enough, it increases even more if you use unnormalized data, making a stellar ~.94 score! 

The last thing worth mentioning is the choice of the activation function. If you got the Warning below and wonder why, the Keras’ LSTM documentation provides an answer.

🛑 WARNING: tensorflow:Layer lstm_70 will not use cuDNN kernels since it doesn’t meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.

(No, I did not loaded 70 LSTM layers. I loaded around 210 😵‍💫)

The documentation says:

“The requirements to use the cuDNN implementation are:

  1. activation == tanh
  2. recurrent_activation == sigmoid
  3. recurrent_dropout == 0
  4. unroll is False
  5. use_bias is True
  6. Inputs, if use masking, are strictly right-padded.
  7. Eager execution is enabled in the outermost context.”

Changing the activation to ‘tanh‘ is enough in our case to use cuDNN, and they are incredibly faster! However tanh fits poorly into our problem:

fit_model(cnn_lstm, time_distributed=True, learning_rate=1)

(You saw it right, the learning rate is 1000x larger than the default. Otherwise the loss curve does not even change.)

Main Takeaways

There are a few points we have to keep in mind about LSTM:

  • The shape of their input 
  • What are time steps
  • The shape of the layer’s output, especially when using return_sequences
  • Hyperparameters tunning is worth your time. For instance, the activation functions relu and tanh have their own pros and cons.
  • There are different architectures to play with (and many more to come – we will deal with Attention blocks and Multi-headed networks soon). Consider using them. I’ve become specially inclined towards the Encoder-Decoders

Feel free to use and edit the code here.