Python-Tut – Page 31

Posted on October 11, 2022 by — Leave a comment

Solidity Deep Dive — Syllabus + Video Tutorial Resources

5/5 – (1 vote)

Do you want to learn Solidity and create your own dApps and smart contracts? This free online course gives you a comprehensive overview that is aimed to be more accessible than the Solidity documentation but still complete and descriptive.

Multimodal Learning: Each tutorial comes with a tutorial video that helps you grasp the concepts in a more interactive manner.

Are you ready to build your skills as a highly sought-after Blockchain Developer or Solidity Engineer? Let’s dive right in!

Basic Introduction and Overview

Installation and Technical Requirements

Guided Example Smart Contracts

Solidity Layout

Solidity Language Elements

Posted on October 10, 2022 by — Leave a comment

Python Print Dictionary Values Without “dict_values”

5/5 – (1 vote)

Problem Formulation and Solution Overview

If you print all values from a dictionary in Python using print(dict.values()), Python returns a dict_values object, a view of the dictionary values. The representation prints the keys enclosed in a weird dict_values(...), for example: dict_values([1, 2, 3]).

Here’s an example:

my_dict = {'name': 'Carl', 'age': 42, 'income': 100000}
print(my_dict.values())
# dict_values(['Carl', 42, 100000])

There are multiple ways to change the string representation of the values, so that the print() output doesn’t yield the strange dict_values view object.

Method 1: Convert to List

An easy way to obtain a pretty output when printing the dictionary values without dict_values(...) representation is to convert the dict_value object to a list using the list() built-in function. For instance, print(list(my_dict.value())) prints the dictionary values as a simple list.

Here’s an example:

my_dict = {'name': 'Carl', 'age': 42, 'income': 100000}
print(list(my_dict.values()))
# ['Carl', 42, 100000]

So far, so simple. Read on to learn or recap some important Python features and improve your skills. There are many paths to Rome!

Method 2: Unpacking

An easy and Pythonic way to print a dictionary without the dict_values prefix is to unpack all values into the print() function using the asterisk operator. This works because the print() function allows an arbitrary number of values as input. It prints those values separated by a single whitespace character per default.

Here’s an example:

my_dict = {'name': 'Carl', 'age': 42, 'income': 100000}
print(*my_dict.values())
# Carl 42 100000

It cannot get any more concise, frankly.

Of course, you can change the separator and end arguments accordingly to obtain more control of the output:

my_dict = {'name': 'Carl', 'age': 42, 'income': 100000}
print(*my_dict.values(), sep='\n', end='\nThe End')

Output:

Carl
42
100000
The End

Do you need even greater flexibility than this? No problem! See here:

Method 3: String Join Function and Generator Expression

To convert the dictionary values to a single string object without 'dict_values' in it and with maximal control, you can use the string.join() function in combination with a generator expression and the built-in str() function.

Here’s an example:

my_dict = {'name': 'Carl', 'age': 42, 'income': 100000}
print(', '.join(str(x) for x in my_dict.values()))
# Carl, 42, 100000

Note: You can replace the comma ',' with your desired separator character and modify the representation of each individual element by modifying the expression str(x) of the generator expression to something arbitrary complicated.

See here for something crazy that wouldn’t make any sense:

my_dict = {'name': 'Carl', 'age': 42, 'income': 100000}
print(' | '.join('x' + str(x) + 'x' for x in my_dict.values()))
# xCarlx | x42x | x100000x

Note that you could also use the repr() function instead of the str() function in this example—it wouldn’t matter too much.

Finally, I’d recommend you check out this tutorial to learn more how generator expressions work—many Python beginners struggle with this concept even though it’s ubiquitous in expert coders’ code bases.

Recommended Tutorial: Understanding One-Line Generators in Python

Posted on October 9, 2022 by — Leave a comment

Python Print Dictionary Without One Key or Multiple Keys

5/5 – (1 vote)

The most Pythonic way to print a dictionary except for one or multiple keys is to filter it using dictionary comprehension and pass the filtered dictionary into the print() function.

There are multiple ways to accomplish this and I’ll show you the best ones in this tutorial. Let’s get started!

Recommended Tutorial: How to Filter a Dictionary in Python

Method 1: Dictionary Comprehension

Say, you have one or more keys stored in a variable ignore_keys that may be a list or a set for efficiency reasons.

Create a filtered dictionary without one or multiple keys using the dictionary comprehension {k:v for k,v in my_dict.items() if k not in ignore_keys} that iterates over the original dictionary’s key-value pairs and confirms for each key that it doesn’t belong to the ones that should be ignored.

Here’s a minimal example:

ignore_keys = {'x', 'y'}
my_dict = {'x': 1, 'y': 2, 'z': 3} filtered_dict = {k:v for k,v in my_dict.items() if k not in ignore_keys}
print(filtered_dict)
# {'z': 3}

The dict.items() method creates an iterable of key-value pairs over which we can iterate.

The membership operator k not in ignore_keys tests if a given key doesn’t belong to the set.

The runtime complexity of the membership check is constant O(1) if you use a set for the ignore_keys data structure. It would be linear O(n) in the number of elements if you used a list which is not a good idea for that reason.

Note that you can also use this approach to print a dictionary except a single key by putting only one key into the ignore list.

Recommended Tutorial: Dictionary Comprehension in Python

Method 2: Simple For Loop with If Condition

A not-so-Pythonic but reasonably readable way to print a dict without one or multiple keys is to use a simple for loop with if condition to avoid all keys in the ignore list.

Here’s an example using three lines and directly printing the key-value pairs:

ignore_keys = {'x', 'y'}
my_dict = {'x': 1, 'y': 2, 'z': 3} for k, v in my_dict.items(): if k not in ignore_keys: print(k, v)

The output:

z 3

Of course, you can modify the output to your own needs. See the customizations of the built-in print() function and its awesome arguments:

Recommended Tutorial: Python print() and Separator and End Arguments

My Recommendation – Use This Method!

I could have listed many more ways to solve this problem of printing a dict except one or more keys.

I have seen super inefficient ways proposed on forums that use exclude_keys that are list types.

I have also seen elaborate schemes to use set difference operations or more.

But I don’t recommend anything else than dict comprehension if you want to create a filtered dictionary object first and the simple for loop if you want to print on the fly.

That’s it.

Posted on October 8, 2022 by — Leave a comment

State Variables in Solidity

5/5 – (1 vote)

In this article, I’ll be going over the different types of state variables in Solidity and how to use them. State variables are one of the most important parts of any smart contract, as they allow us to store data that can change over time.

This article is mainly focused on value types of state variables, but I’ll be continuing with another two articles on reference and complex types as well as data location. Let’s dive in!

Basics – A Quick Review

Smart contracts are pieces of code that are deployed in blockchain nodes. They are immutable, meaning they cannot be changed once they have been deployed. This can make it necessary to redeploy the code as a new smart contract or redirect calls from an old contract to new ones.

A smart contract is initiated by a message embedded in a transaction. Ethereum enables these transactions, which may carry out more sophisticated operations like conditional transfers.

A conditional transfer, such as one that depends on the age of the buyer or the value of their bid, could be required.

Example: If the buyer is over 21 and their bid is greater than the minimum bid, then accept the bid. Otherwise reject it.

Smart contracts are executed when predetermined conditions are met to automate the execution of an agreement so that all parties can be immediately certain of the outcome without the need for an intermediary.

How Do You Write a Smart Contract?

Smart contracts are similar to a class definition in an object-oriented programming language.

The smart contracts are:

data (its state);
a collection of code (its functions or methods with modifiers public or private with getter and set functions).

What is the structure of a smart contract?

As we have seen in other articles in Finxter, the structure of a smart contract is as follows:

Contract in the Ethereum blockchain has pragma directive;
Name of the contract;
Data or the state variable that define the state of the contract;
Collection of functions to carry out the intent of a smart contract;

Note that the identifiers representing these elements are restricted to the ASCII character set. Make sure you select meaningful identifiers and follow camel case convention in naming them.

Variable Declaration

To declare a variable in Solidity, you must first specify its data type. This is followed by an access modifier and the variable name.

Structure

<type> <access modifier> <variable name> ;

Example:

What Categories of Variables Exist in Solidity?

Solidity supports three categories of variables:

(1) State Variables

State variables are variables whose values are permanently stored in a contract storage.

What does this mean?

State variables are an essential part of any contract. They are variables whose values are permanently stored in the contract storage. They can be thought of as a single slot in a database that you can query and alter by calling functions of the code that manages the database. The set and get functions can be used to modify and retrieve the value of the variables.

In other words, the data (state variables) are stored contiguously item after item starting with the first state variable, stored in slot 0. For each variable, the size in bytes is determined according to its type. Several contiguous items that require less than 32 bytes are packed into a single storage slot if possible.

To make it easier, if you use other languages and want to store user information for a long time, you would connect your application to a database server and then store the information in the database. In Solidity, however, you do not need to connect, you can simply store the data permanently using state variables.

(2) Local Variables

Local variables are variables whose values exist until the function is executed; the context of local variables is within the function and cannot be accessed outside.

Typically, these variables are used to hold temporary values for processing or computing something. In the following example, “temp” is a local variable that cannot be used outside the “set” function.

(3) Global Variables

Global variables are variables whose values exist in the global namespace to obtain information about the blockchain.

Each function has its own scope, but state variables should always be defined outside the scope, like the attributes of a class.

They are permanently stored in the Ethereum blockchain, more precisely in the storage Merkle-Patricia tree, which is part of the information that forms the state of an account (that’s why we call them state variables).

What Types of Valid State Variables Exist?

Info: Solidity is a statically typed language, meaning each variable’s type must be specified at the time of its declaration.

“Undefined” or “null” values do not exist in Solidity, but newly declared variables always have a default value depending on their type, typically called “zero- state”.

For example, the default value for bool is false.

As in other languages (not Python ), there are two types in Solidity: value types and reference types.

The value type is a variable that stores its value or its own data directly; it is a value type. If the variable contains a location of the data – it is a reference type.
The reference types are discussed in a separate article.

For example, consider the integer variable int i = 100;

The system stores 100 in the memory location allocated for the variable i. The following image shows how 100 is stored in a hypothetical location in memory (0x239110) for “i”:

What are the Modifiers for the State Variables?

Visibility – access modifiers

Access modifiers are the keywords used to specify the declared accessibility of a state variable and functions.

Variables in Solidity have three types of visibility: public, private, and internal. If visibility is not explicitly declared, the compiler considers it internal.

For variables of type public, the compiler automatically creates a method to retrieve them through a call. This does not apply to private or internal variables.

Example:

uint256 public a; is actually exactly the same thing as : uint256 private a;
function a() public view returns(uint256) {
return a;
}

When you create a public variable, it is stored the same way as a private variable, but the compiler automatically creates a getter function for it.

The difference between private and internal variables is that internal variables are inherited by child contracts, while private variables are not.

To learn more about private variables:

contract Addition { uint x; //internal variable uint public y; // contract Child is Addition{ //no need to define x since the child contract inherits the variable //uintx function setX(uint _x) public { x =_x; function getX() public view returns (uint) { return x; }
}

Note that the data location (memory, storage, and call data) must be specified for variables of reference type. This is necessary when function arguments are involved. We will cover this in an article on data location.

Other keywords

The following keywords can be used for state variables to restrict changes to their state.

Constant (replaced by “view” and “pure” in functions)

Constant disallows assignment (except at initialization), i.e. they cannot be changed after initialization, but must be initialized at the time of their declaration.

Example:

uint private constant t = 40;

The variable t has been declared once and therefore cannot be changed.

It is interesting to note that the declaration of a constant variable without initialization is forbidden and the compiler displays an error, e.g.:

Contract Addition { uint private x; uint public y; uint private constant z; //gives an error because constant variables must be initialized when declared.

Immutable

These variables can be declared without being initialized, but the assignment, which is only one, must be done in the constructor. After that, the variable is constant thereafter.

 uint private immutable w; //now we declare a constructor for the contract, using the function constructor constructor() { w = 20; //initiate variable }

Override

This keyword states that the public state variables change the behavior of a function.

Value Types

These variables are passed by value. That is, they are copied when they are used either in an assignment or in a function argument.

If this sentence is not clear, you can check here.

Here we will see the basic value types.

Value types are booleans, integers, addresses, enums, and bytes.

Booleans

Boolean values can be true or false

An example of a boolean type:

contract ExampleBool { // example of a bool value type in solidity bool public IsVerified = false; bool public IsSent = true; }

Integers

There are int/uint (signed and unsigned integers) types of various sizes. It stores the values in a range of 8, int16, …up to int256. Int256 is the same as int, same for uint8, and uint256.

Note: uint256 is the same as uint.

The type uint stands for positive integers. The type int stands for both positive and negative integers.

Recommended Tutorial: Solidity Data Types – Integer and Boolean

The type uint8 (has 8 bits, which corresponds to 1 byte. This means that it accepts numbers between 0 and 255; bit is a binary digit. So one byte can hold 2 (binary) ^ 8 numbers from 0 to 2^8-1 = 255. This is the same as asking why a three-digit decimal number can represent the values 0 to 999.

The type uint256 accepts numbers between 0 and 2^256.

If we try to assign the value 256 to a variable of type uint8, the compiler will print an error.

The best practice for integers is to specify the value of the bits at the declaration stage to use as little space as possible and reduce the cost of storage. So use uint8 or uint16 instead of always using int (uint256).

contract SimpleContract{ uint32 public uidata = 1234567; //un-signed integer int32 public idata = -1234567; //signed integer }

Fixed Point Numbers

According to the Solidity documents, fixed-point numbers are the type for floating-point numbers. However, the official document states that “Fixed point numbers are not yet fully supported by Solidity”. They can be declared, but cannot be added to or derived from.

However, you can use floating point numbers for calculations, but the value resulting from the calculation should be an integer.

Here is an example,

contract additionContract{ uint8 result; function Addition(uint) public { result = 2/3; //error result = 3.5 + 1.5; // final result will be an integer } }

Let’s do a subtle change,

Address

The address data type is very specific to Solidity.

On the Ethereum blockchain, every account and smart contract has an address that is used to send and receive Ether from one account to another.

This is your public identity on the blockchain.

Also, when you deploy a smart contract on the blockchain, that contract is assigned an address that you can use to identify and call the smart contract.

There are two variants for the address type, which are identical:

address – stores a 20-byte value (the size of an Ethereum address or account). The default value for the address is 0x…followed by 40 0’s, or 20 bytes of 0’s.
address payable – like address, but transfer and send with the additional members.

The idea behind this distinction is that the address payable is an address you can send Ether to, while you should not send Ether to a plain address, as it could be a smart contract that was not built to accept Ether.

 contract ExampleAddress { address public myAddress = 0xc895t6ea1bc39595cf849612ffta7427f5792987

Enums

What stands for enumerable is a user-defined data type that restricts the variable to have only one of the predefined values.

These values listed in the enumerated list are called enums, and internally these enums are treated like numbers (resource). This makes the contract more readable and maintainable.

contract SampleEnum{ //Creating an enumerator enum animal_classes { Mammals, Fish, Amphibians, Reptiles, Birds } function getFirstEnum() public pure returns(animal_classes){ return animal_classes.Mammals; } // result: // 0: uint8: 0 }

With enums, we can also set a default value;

animal_classes constant defaultValue = animal_classes.Reptiles; function getDefaultValue() public pure returns(animal_classes) { return defaultValue; } } //result // result: // 0: uint8: 2

Bytes and Strings

A byte refers to signed 8-bit integers. Everything in memory is stored in bits with binary values 0 and 1.

Solidity supports string literals that use both double quotes (") and single quotes ('). It provides String as a data type to declare a variable of type String.

Strings are unique in Solidity compared to Python or other programming languages in that there are no functions for manipulating strings, except that you can concatenate strings. The reason for this is that storing strings in a blockchain is very expensive.

Bytes and strings are easy to handle in Solidity because Solidity treats them similarly to an array. The two are very similar. (See Arrays in the Reference Type article).

Conclusion

Smart contracts reside at a specific address in the Ethereum blockchain. In this article, we learned about state variables in Solidity.

We looked at state, local variables, and the different types with a value type.

We tried to understand Boolean, Integers, Enums, Addresses, Bytes, and Strings (although the last ones are treated with more depth in reference types)

Bibliography

Posted on October 7, 2022 by — Leave a comment

How to Print a List Without Commas in Python

5/5 – (1 vote)

Problem Formulation

Given a Python list of elements.

If you print the list to the shell using print([1, 2, 3]), the output is enclosed in square brackets and separated by commas like so:

[1, 2, 3]

But you want the list without commas like so:

[1 2 3]

print([1, 2, 3])
# Output: [1, 2, 3]
# Desired: [1 2 3]

How to print the list without separating commas in Python?

Note that this is slightly different to those two problem variants—feel free to click there to learn more about those problem variants:

Recommended Tutorial: How to Print a List Without Brackets and Commas in Python?

Recommended Tutorial: How to Print a List Without Brackets in Python?

Method 1: Unpacking Multiple Values into Print Function

The asterisk operator * is used to unpack an iterable into the argument list of a given function.

You can unpack all list elements into the print() function to print all values individually, separated by an empty space per default (that you can override using the sep argument). For example, the expression print('[', *lst, ']') prints the elements in my_list, empty space separated, with the enclosing square brackets and without the separating commas!

Here’s an example:

lst = [1, 2, 3]
print('[', *lst, ']')
# [ 1 2 3 ]

You can learn about the ins and outs of the built-in print() function in the following video:

To master the basics of unpacking, feel free to check out this video on the asterisk operator:

Method 2: String Replace Method

A simple way to print a list without commas is to first convert t he list to a string using the built-in str() function. Then modify the resulting string representation of the list by using the string.replace() method until you get the desired result.

Here’s an example:

my_list = [1, 2, 3] # Convert List to String
s = str(my_list)
print(s)
# [1, 2, 3] # Replace Separating Commas
s = s.replace(',', '') # Print List Without Commas
print(s)
# [1 2 3]

The last line of the code snippet shows that the commas are removed from the output.

Method 3: String Join With Generator Expression

You can print a list without commas using the string.join() method on any separator string such as ' ' or '\t'. Pass a generator expression to convert each list element to a string using the str() built-in function.

Specifically, the expression print('[', ' '.join(str(x) for x in my_list), ']') prints my_list to the shell without separating commas.

my_list = [1, 2, 3]
print('[', ' '.join(str(x) for x in my_list), ']')
# Output: [ 1 2 3 ]

The string.join(iterable) method concatenates the elements in the given iterable.
The str(object) built-in function converts a given object to its string representation.
Generator expressions or list comprehensions are concise one-liner ways to create a new iterable based by reusing elements from another iterable.

You can dive deeper into generators in the following video:

Note: Combining the join() method with a generator expression and string concatenation is the recommended approach of choice if you want to convert a list to a string without commas instead of printing it.

Here’s an example:

my_list = [1, 2, 3]
s = '[' + ' '.join(str(x) for x in my_list) + ']'
print(s)
# Output: [ 1 2 3 ]

Method 4: Print NumPy Array

Sometimes it is sufficient to use the NumPy default output that is without separating commas. For example, if you print a list it yields [1, 2, 3]. And if you print an array it yields [1 2 3]. You can easily convert a list to a NumPy array using the np.array(lst) constructor.

import numpy as np my_list = [1, 2, 3]
print(np.array(my_list))
# Output: [1 2 3]

Recommended Tutorial: How to Install NumPy?

Where to Go From Here?

Enough theory. Let’s get some practice!

Coders get paid six figures and more because they can solve problems more effectively using machine intelligence and automation.

To become more successful in coding, solve more real problems for real people. That’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

You build high-value coding skills by working on practical coding projects!

Do you want to stop learning with toy projects and focus on practical code projects that earn you money and solve real problems for people?

If your answer is YES!, consider becoming a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

If you just want to learn about the freelancing opportunity, feel free to watch my free webinar “How to Build Your High-Income Skill Python” and learn how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

Programmer Humor

Question: How did the programmer die in the shower? ☠

❗ Answer: They read the shampoo bottle instructions: Lather. Rinse. Repeat.

Posted on October 6, 2022 by — Leave a comment

Python Print List Without Truncating

5/5 – (1 vote)

How to Print a List Without Truncating?

Per default, Python doesn’t truncate lists when printing them to the shell, even if they are large. For example, you can call print(my_list) and see the full list even if the list has one thousand elements or more!

Here’s an example:

However, Python may squeeze the text (e.g., in programming environments such as IDLE) so you would have to press the button before seeing the output. The reason is that showing the whole output could be time-consuming and visually cluttering.

Here’s an example:

How to Print a NumPy Array Without Truncating?

In many cases, large NumPy arrays when printed out are not truncated as well on the default Python programming environment IDLE:

However, in the interactive mode of the Python shell, a NumPy array may be truncated, unlike a Python list:

>>> np.arange(10000)
array([ 0, 1, 2, ..., 9997, 9998, 9999])

To print the NumPy array without truncating, simply

>>> import sys, numpy
>>> numpy.set_printoptions(threshold=sys.maxsize)
>>> np.arange(10000)

The output shows the full array without converting it to a list first:

Not all output is shown to save some space.

Of course, you could also convert the NumPy array to a Python list first.

Recommended Tutorial: How to Print a NumPy Array Without Truncating It?

Feel free to check out our Python cheat sheets and free email academy:

Posted on October 5, 2022 by — Leave a comment

About Me – When The Going Gets Tough, Keep Going

5/5 – (1 vote)

Welcome to the Finxter blog! My name is Chris, and I started this coding venture a couple of years ago.

Over the years, I have chatted with tens of thousands of Finxters who shared their stories and struggles with me.

See here and here to read a lot of feedback from the community.

Today, allow me to share my story about why I started teaching freelancing.

It may inspire you to take control of your life if you’re in a tough spot right now – for example, struggling with the economic, military, and energy crises that are happening right now.

If you’re not interested in my personal story, now would be the time to stop reading. I won’t blame you!

~~~

Once upon a time, when I was a timid and naive 20-year-old dreamer, my 18-year-old girlfriend unexpectedly got pregnant.

She was still in high school, and I had just started studying computer science.

At the time, we had zero income and maybe $900 in savings.

I was living in a cheap 15-square-meter room with a desk and a bed and not much else.

As young and poor parents without any education or degree, we constantly felt judgment and pity from society.

We couldn’t even rent a flat because no landlord was crazy enough to take us in.

During all the struggle, we had love and dreams and the belief that everything would get better eventually: I was going to be a computer scientist in five years.

That is if I found a way to support my family on a shoestring – and avoided screwing up my education.

The first ten years, money was tight as hell. Little time. Lots of hard work. No TV. No Games. No Saturday night partying.

Well, maybe a little…

I am not a wunderkind. But I have good work ethic, and long-term goals, and I don’t give up easily. Finally, after ten tough years, I got my Ph.D. in computer science “summa cum laude”.

I now had a steady paycheck from my government job. But I eventually learned that the academic degrees didn’t help in improving our financial situation.

People made far more money and had far more free time coding in the private sector and without academic degrees.

I decided to take matters into my own hands again by creating my own coding business as a freelance developer.

In little time, I reached six-figure income levels. And I had much more free time compared to my government job that I held before.

My second child – now five years old – knows his father to have infinite time playing soccer, video games, or watching the Tesla Bot taking his first steps on YouTube.

(He plans to become CEO of Tesla – stay tuned @Elon).

~~~

Becoming a freelancer was a pivot point in my life.

To share all I know about creating a thriving coding business online, I have set up our freelancer course.

It focuses on the fundamentals:

find your niche,
build your skills,
create value for your customers, and
take massive action.

Simple, but sometimes not so easy…

If you want more from life and you love coding, feel free to subscribe to my free email academy, I’d love to have you in our community of ambitious coders who have not yet lost their ability to dream of a better life!

Posted on October 4, 2022 by — Leave a comment

Solidity Contract Types, Byte Arrays, and {Address, Int, Rational} Literals

5/5 – (1 vote)

With this article, we continue our journey through the realm of Solidity data types following today’s topics:

contract types,
fixed-size byte arrays,
dynamically-sized byte arrays,
address literals,
rational, and
integer literals.

It’s part of our long-standing tradition to make this (and other) articles a faithful companion or a supplement to the official Solidity documentation.

Download PDF Slide Deck at the end of this tutorial!

Contract Types

To quote the official Solidity documentation, “every contract defines its own type”.

This statement might seem a bit cryptic, and since we’re an efficient crowd, we’d surely like to know what it means.

We can all remember that some number of articles ago, we mentioned how Solidity has key elements of an object-oriented programming language (OOPL). We also emphasized how smart contracts in Solidity are very similar to classes in an OOPL.

Classes themselves are a mesh of custom data types, i.e. structs, and functions, which qualifies classes to be treated as types.

By extension, our contracts are also treated as types, and as every contract is unique in its own right, it defines its own type. Being a type, we can implicitly convert a specific contract to a contract it inherits from, i.e. if contract “Aa” inherits from contract A, it can also be converted to contract “A”.

Besides that, we can explicitly convert each contract to and from the address type. Even more, we can conditionally convert a contract to and from the address payable type (remember, that’s the same type as the address type, but predetermined to receive Ether).

The condition is that the contract type must have a receive or payable fallback function. If it does, we can make the conversion to address payable by using address(x).

However, if the contract type does not implement (a more professional way to say “have”) a receive or payable fallback function, then the conversion to address payable has to be even more explicit (no swearing!) by stating payable(address(x)).

A local variable obc of a contract type OurBeautifulContract is declared by OurBeautifulContract obc;.

Once we point our variable obc to an instantiated (newly created) contract, we’d be able to call functions on that contract.

In terms of its data representation, a contract is identical to the address type. This is important because the contract type is not directly supported by the ABI, but the address type, as its representative, is supported by the ABI.

In contrast to the types mentioned so far, contract types don’t support any operators.

The members of contract types are the external functions (the functions only available to other contracts) and state variables whose visibility is set to public.

When we need to access type information about the contract, like the OurBeautifulContract above, we’d call the type(OurBeautifulContract) function (docs).

Fixed-Size Byte Arrays

The value type bytesN holds a sequence of bytes, whose length, and accordingly N goes from 1 to up to 32, i.e., bytes1, …, bytes32.

The available operators for fixed-size operators are:

Comparisons: <=, <, ==, !=, >=, > (evaluate to bool)
Bit operators: &, |, ^ (bitwise exclusive or), ~ (bitwise negation)
Shift operators: << (left shift), >> (right shift)
Index access: If x is of type bytesN, then x[k] for 0 <= k < N returns the k-th byte (read-only). In other words, x[0] up to (inclusive) x[N-1] is available for index access; if N = 1, then only x is of type bytes1, and x[0] is the only element, i.e. byte accessible by the index.

The shifting operator always uses an unsigned integer type as a right operand, which represents the number of bits to shift by, and returns the type of the left operand.

Let’s take a look at a simple example to illustrate:

bytes2 lo = 0x1234; // (lo is the left operand)
uint8 ro = 5; // (ro is the right operand variable, must be u... type)
lo << ro // will evaluate to an lo type, bytes2

A fixed-size byte array has only one member, .length, that holds the fixed length of the byte array. This member is accessible as the read-only value.

Warning: Since the type bytes1 is a sequence of 1 byte in length, the type bytes1[] is a fixed-size byte array of 1-byte sequences. However, each element of the array is padded with 31 bytes, due to padding rules for elements stored in memory, stack, and call data, i.e., except in storage. Therefore, according to the official Solidity documentation, it’s better to use bytes type instead of bytes1[].

Note: Value types in storage are packed/compacted together and share a storage slot, taking only as much space per value type as really needed. In contrast, the stack, memory, and calldata pad value types and store in separate slots, meaning that each variable uses a whole slot of 32 bytes, even if the value type is shorter than 32 bytes, effectively wasting the memory space.

Before Solidity v0.8.0, the keyword byte was an alias for bytes1.

Dynamically-Sized Byte Arrays

There are two dynamically-sized non-value types, namely bytes and string.

bytes is a dynamically-sized byte array, while
string is a dynamically-sized UTF-8-encoded string.

Address Literals

Address literals are hexadecimal literals that pass the address checksum test, e.g. 0xdCad3a6d3569DF655070DEd06cb7A1b2Ccd1D3AF.

Hexadecimal literals will produce an error if they are between 39 and 41 digits long and do not pass the checksum test.

However, we can remove the error by prepending zeros to integer types or appending zeros to bytesNN types.

The Ethereum Improvement Proposal EIP-55 defines the mixed-case address checksum.

Integer and Rational Literals

Integer Literals

Integer literals are created using a sequence of digits from a range 0-9, and each digit is interpreted (weighted) based on its position in the sequence.

Multiplied by an exponent of 10, e.g. 217 is interpreted as two hundred and seventeen, because, reading from right to left, we have 7 * 10⁰ + 1 * 10¹ + 2 * 10².

A reminder, 10⁰ = 1.

Octal literals don’t exist in Solidity and leading zeros are invalid.

Decimal Fractional Literals

Decimal fractional literals consist of a dot . (or, depending on the locale) and at least one number on either of the sides, e.g. 1., .1, and 1.3.

Info: “A locale consists of a number of categories for which country-dependent formatting or other specifications exist” (sou rce).

Scientific Notation

Solidity also supports scientific notation in the form of 2e10, where 2 (left of “e”) is called mantissa (M) and the exponent (E) must be an integer. In a general form, we would write it as MeE and it is interpreted as M * 10**E, e.g. 2e10, -2e10, 2e-10, 2.5e1.

Readable Underscore Notation

We can also do a neat thing: separate the digits of a numeric literal for easier readability, such as in decimal 123_000, hexadecimal 0x2eff_abde, scientific decimal notation 1_2e345_678.

However, there are no leading, trailing, or multiple underscores; they can only be added between two digits.

Number Literal Expressions

Expressions containing number literals preserve their precision until they are converted to a non-literal type.

Such a conversion means an explicit conversion, or that the number literals are used with something else than a number literal expression, like boolean literals.

This behavior implies that computations don’t overflow and divisions don’t truncate in number literal expressions.

A very good example would be a number literal expression (2**800 + 1) – 2**800, which results in the constant 1 (of type uint8), although the intermediate results would not fit the capacity of the EVM word length of 32 bytes.

One more example shows that an integer 4 is produced by computing the expression .5 * 8, although the intermediary results are not integers.

More Operations

Warning: most operators produce a literal expression when applied to number literals, but there are also two exceptions:

Ternary operator (... ? ... : ...),
Array subscript (<array>[<index>]).

In other words, expressions like 255 + (true ? 1 : 0) or 255 + [1, 2, 3][0] are not equivalent to using the literal 256 (the result of these two expressions), as they are computed within the type uint8 and can lead to an overflow.

Number literal expressions can use the same operators as the integers, but both operands must compute yield an integer.

If either of the operands is fractional, bit operations are inapplicable for use;
If the exponent is a decimal fractional literal, the exponentiation operation is also inapplicable for use.

Shifts and exponentiation * operations with literal numbers in place of a left (base*) operand and integer types in place of the right (exponent*) operand are performed in the uint256 for non-negative literals or int256 for negative literals (a * symbol pertains to the exponentiation operations context).

Warning: Since Solidity v0.4.0 division on integer literals produces a rational number, e.g. 7 / 2 = 3.5.

Solidity has a number literal types for each rational number, e.g. integer literals and rational number literals belong to the same number literal type.

All number literal expressions (expressions with only number literals and operators) also belong to number literal types, e.g. 1 + 2 and 2 + 1 belong to the same number literal type.

Note: When number literal types are used with non-literal expressions, they are converted into a non-literal type, e.g. uint128 a = 1; uint128 b = 2.5 + a + 0.5;

Here, 1 is converted into a non-literal type uint128, i.e. variable a, but a common type for both 2.5 and uint128 doesn’t exist and the compiler will reject the code.

Conclusion

In this article, we added even more data types in Solidity under our proverbial belt!

First, we introduced and learned about the contract type.
Second, we fixed our understanding of the fixed-size byte array type.
Third, the situation got dynamic by studying the dynamically-sized byte array type.
Fourth, we addressed the… what was it called… Aha – address literals!
Fifth, we came to the most rational decision and discovered what rational and integer literals are and, of course, how can they be put to good use.

Slide Deck Data Types

You can scroll through the data types discussed in this tutorial here:

Solidity-Overview-Data-Types-More Download

Posted on October 3, 2022 by — Leave a comment

How to Convert Bool (True/False) to a String in Python?

5/5 – (1 vote)

Question: Given a Boolean value True or False. How to convert it to a string "True" or "False" in Python?

Note that this tutorial doesn’t concern “concatenating a Boolean to a string”. If you want to do this, check out our in-depth article on the Finxter blog.

Simple Bool to String Conversion

To convert a given Boolean value to a string in Python, use the str(boolean) function and pass the Boolean value into it. This converts Boolean True to string "True" and Boolean False to string "False".

Here’s a minimal example:

>>> str(True) 'True'
>>> str(False) 'False'

Python Boolean Type is Integer

Booleans are represented by integers in Python, i.e., bool is a subclass of int. Boolean value True is represented with integer 1. And Boolean value False is represented with integer 0.

Here’s a minimal example:

>>> True == 1
True
>>> False == 0
True

Convert True to ‘1’ and False to ‘0’

To convert a Boolean value to a string '1' or '0', use the expression str(int(boolean)). For instance, str(int(True)) returns '1' and str(int(False)) returns '0'. This is because of Python’s use of integers to represent Boolean values.

Here’s a minimal example:

>>> str(int(True)) '1'
>>> str(int(False)) '0'

Convert List of Boolean to List of Strings

To convert a Boolean to a string list, use the list comprehension expression [str(x) for x in my_bools] assuming the Boolean list is stored in variable my_bools. This converts each Boolean x to a string using the built-in str() function and repeats it for all x in the Boolean list.

Here’s a simple example:

my_bools = [True, True, False, False, True]
my_strings = [str(x) for x in my_bools]
print(my_strings)
# ['True', 'True', 'False', 'False', 'True']

Convert String Back to Boolean

What if you want to convert the string representation 'True' and 'False' (or: '1' and '0') back to the Boolean representation True and False?

Recommended Tutorial: String to Boolean Conversion

Here’s the short summary:

You can convert a string value s to a Boolean value using the Python function bool(s).

For example, bool('True') and bool('1') return True.

However, bool('False') and bool('0') return False as well which may come unexpected to you.

This is because all Python objects are “truthy”, i.e., they have an associated Boolean value. As a rule of thumb: empty values return Boolean True and non-empty values return Boolean False. So, only bool('') on the empty string '' returns False. All other strings return True!

You can see this in the following example:

>>> bool('True')
True
>>> bool('1')
True
>>> bool('2')
True
>>> bool('False')
True
>>> bool('0')
True
>>> bool('')
False

Okay, what to do about it?

Easy – first pass the string into the eval() function and then pass the result into the bool() function. In other words, the expression bool(eval(my_string)) converts a string to a Boolean mapping 'True' and '1' to Boolean True and 'False' and '0' to Boolean False.

Finally – this behavior is as expected by many coders just starting out.

Here’s an example:

>>> bool(eval('False'))
False
>>> bool(eval('0'))
False
>>> bool(eval('True'))
True
>>> bool(eval('1'))
True

Feel free to go over our detailed guide on the function:

Recommended Tutorial: Python eval() deep dive

Posted on October 2, 2022 by — Leave a comment

Python Time Series Forecast on Bitcoin Data (Part II)

5/5 – (1 vote)

A Time Series is essentially a tabular data with the special feature of having a time index. The common forecast taks is ‘knowing the past (and sometimes the present), predict the future’. This task, taken as a principle, reveals itself in several ways: in how to interpret your problem, in feature engineering and in which forecast strategy to take.

This is the second article in our series. In the first article we discussed how to create features out of a time series using lags and trends. Today we follow the opposite direction by highlighting trends as something you want directly deducted from your model.

Reason is, Machine Learning models work in different ways. Some are good with subtractions, others are not.

For example, for any feature you include in a Linear Regression, the model will automatically detect whether to deduce it from the actual data or not. A Tree Regressor (and its variants) will not behave in the same way and usually will ignore a trend in the data.

Therefore, whenever using the latter type of models, one usually calls for a hybrid model, meaning, we use a Linear(ish) first model to detect global periodic patterns and then apply a second Machine Learning model to infer more sophisticated behavior.

We use the Bitcoin Sentiment Analysis data we captured in the last article as a proof of concept.

The hybrid model part of this article is heavily based on Kaggle’s Time Series Crash Course, however, we intend to automate the process and discuss more in-depth the DeterministicProcess class.

Trends, as something you don’t want to have

(Or that you want it deducted from your model)

An aerodynamic way to deal with trends and seasonality is using, respectively, DeterministicProcess and CalendarFourier from statsmodel. Let us start with the former.

DeterministicProcess aims at creating features to be used in a Regression model to determine trend and periodicity. It takes your DatetimeIndex and a few other parameters and returns a DataFrame full of features for your ML model.

A usual instance of the class will read like the one below. We use the sentic_mean column to illustrate.

from statsmodels.tsa.deterministic import DeterministicProcess y = dataset['sentic_mean'].copy() dp = DeterministicProcess(
index=y.index, constant=True, order=2
) X = dp.in_sample() X

We can use X and y as features and target to train a LinearRegression model. In this way, the LinearRegression will learn whatever characteristics from y can be inferred (in our case) solely out of:

the number of elapsed time intervals (trend column);
the last number squared (trend_squared); and
a bias term (const).

Check out the result:

from sklearn.linear_model import LinearRegression model = LinearRegression().fit(X,y) predictions = pd.DataFrame( model.predict(X), index=X.index, columns=['Deterministic Curve']
)

Comparing predictions and actual values gives:

import matplotlib.pyplot as plt plt.figure()
ax = plt.subplot()
y.plot(ax=ax, legend=True)
predictions.plot(ax=ax)
plt.show()

Even the quadratic term seems ignorable here. The DeterministicProcess class also helps us with future predictions since it carries a method that provides the appropriate future form of the chosen features.

Specifically, the out_of_sample method of dp takes the number of time intervals we want to predict as input and generates the needed features for you.

We use 60 days below as an example:

X_out = dp.out_of_sample(60) predictions_out = pd.DataFrame( model.predict(X_out), index=X_out.index, columns=['Future Predictions']
) plt.figure()
ax = plt.subplot()
y.plot(ax=ax, legend=True)
predictions.plot(ax=ax)
predictions_out.plot(ax=ax, color='red')
plt.show()

Let us repeat the process with sentic_count to have a feeling of a higher-order trend.

As a rule of thumb, the order should be one plus the total number of (trending) hills + peaks in the graph, but not much more than that.

We choose 3 for sentic_count and compare the output with the order=2 result (we do not write the code twice, though).

y = dataset['sentic_count'].copy() from statsmodels.tsa.deterministic import DeterministicProcess, CalendarFourier dp = DeterministicProcess( index=y.index, constant=True, order=3
)
X = dp.in_sample() model = LinearRegression().fit(X,y) predictions = pd.DataFrame( model.predict(X), index=X.index, columns=['Deterministic Curve']
) X_out = dp.out_of_sample(60) predictions_out = pd.DataFrame( model.predict(X_out), index=X_out.index, columns=['Future Predictions']
) plt.figure()
ax = plt.subplot()
y.plot(ax=ax, legend=True)
predictions.plot(ax=ax)
predictions_out.plot(ax=ax, color='red')
plt.show()

Although the order-three polynomial fits the data better, use discretion in deciding whether the sentiment count will decrease so drastically in the next 60 days or not. Usually, trust short-time predictions rather than long ones.

DeterministicProcess accepts other parameters, making it a very interesting tool. Find a description of the almost full list below.

dp = DeterministicProcess( index, # the DatetimeIndex of your data period: int or None, # in case the data shows some periodicity, include the size of the periodic cycle here: 7 would mean 7 days in our case constant: bool, # includes a constant feature in the returned DataFrame, i.e., a feature with the same value for everyone. It returns the equivalent of a bias term in Linear Regression order: int, # order of the polynomial that you think better approximates your trend: the simplest the better seasonal: bool, # make it True if you think the data has some periodicity. If you make it True and do not specify the period, the dp will try to infer the period out of the index additional_terms: tuple of statsmodel's DeterministicTerms, # we come back to this next drop: bool # drops resulting features which are collinear to others. If you will use a linear model, make it True
)

Seasonality

As a hardened Mathematician, seasonality is my favorite part because it deals with Fourier analysis (and wave functions are just… cool!):

Do you remember your first ML course when you heard Linear Regression can fit arbitrary functions, not only lines? So, why not a wave function? We just did it for polynomials and didn’t even feel like it

In general, for any expression f which is a function of a feature or of your DatetimeIndex, you can create a feature column whose ith row is the value of f corresponding to the ith index.

Then linear regression finds the constant coefficient multiplying f that best fits your data. Again, this procedure works in general, not only with Datetime indexes – the trend_squared term above is an example of it.

For seasonality, we use a second statsmodel‘s amazing class: CalendarFourier. It is another statsmodel‘s DeterministicTerm class (i.e., with the in_sample and out_of_sample methods) and instantiates with two parameters, 'frequency' and 'order'.

As a 'frequency', the class expects a string such as ‘D’, ‘W’, ‘M’ for day, week or month, respectively, or any of the quite comprehensive Pandas Datetime offset aliases.

The 'order' is the Fourier expansion order which should be understood as the number of waves you are expecting in your chosen frequency (count the number of ups and downs – one wave would be understood as one up and one down)

CalendarFourier integrates swiftly with DeterministicProcess by including an instance of it in the list of additional_terms.

Here is the full code for sentic_mean:

from statsmodels.tsa.deterministic import DeterministicProcess, CalendarFourier y = dataset['sentic_mean'].copy() fourier = CalendarFourier(freq='A',order=2) dp = DeterministicProcess( index=y.index, constant=True, order=2, seasonal=False, additional_terms=[fourier], drop=True
)
X = dp.in_sample() from sklearn.linear_model import LinearRegression model = LinearRegression().fit(X,y) predictions = pd.DataFrame( model.predict(X), index=X.index, columns=['Prediction']
) X_out = dp.out_of_sample(60) predictions_out = pd.DataFrame( model.predict(X_out), index=X_out.index, columns=['Prediction']
) plt.figure()
ax = plt.subplot()
y.plot(ax=ax, legend=True)
predictions.plot(ax=ax)
predictions_out.plot(ax=ax, color='red')
plt.show()

If we take seasonal=True inside DeterministicProcess, we get a crispier line:

Including ax.set_xlim(('2022-08-01', '2022-10-01')) before plt.show() zooms the graph in:

Although I suggest using the seasonal=True parameter with care, it does find interesting patterns (with huge RMSE error, though).

For instance, look at this BTC percentage change zoomed chart:

Here period is set to 30 and seasonal=True. I also manually rescaled the predictions to be better visible in the graphic. Although the predictions are far away from truth, thinking as a trader, isn’t it impressive how many peaks and hills it gets right? At least for this zoomed month…

To maintain the workflow promise, I prepared a code that does everything so far in one shot:

def deseasonalize(df: pd.Series, season_freq='A', fourier_order=0, constant=True, dp_order=1, dp_drop=True, model=LinearRegression(), fourier=None, dp=None, **DeterministicProcesskwargs)->(pd.Series, plt.Axes, pd.DataFrame): """ Returns a deseasonalized and detrended df, a seasonal plot, and the fitted DeterministicProcess instance. """ if fourier is None: fourier = CalendarFourier(freq=season_freq, order=fourier_order) if dp is None: dp = DeterministicProcess( index=df.index, constant=True, order=dp_order, additional_terms=[fourier], drop=dp_drop, **DeterministicProcesskwargs ) X = dp.in_sample() model = LinearRegression().fit(X, df) y_pred = pd.Series( model.predict(X), index=X.index, name=df.name+'_pred' ) ax = plt.subplot() y.plot(ax=ax, legend=True) predictions.plot(ax=ax) y_pred.columns = df.name y_deseason = df - y_pred y_deseason.name = df.name +'_deseasoned' return y_deseason, ax, dp The sentic_mean analyses get reduced to: y_deseason, ax, dp= deseasonalize(y, season_freq='A', fourier_order=2, constant=True, dp_order=2, dp_drop=True, model=LinearRegression() )

Cycles and Hybrid Models

Let us move on to a complete Machine Learning prediction. We use XGBRegressor and compare its performance among three instances:

Predict sentic_mean directly using lags;
Same prediction adding the seasonal/trending with a DeterministicProcess;
A hybrid model, using LinearRegression to infer and remove seasons/trends, and then apply a XGBRegressor.

The first part will be the bulkier since the other two follow from simple modifications in the resulting code.

Preparing the data

Before any analysis, we split the data in train and test sets. Since we are dealing with time series, this means we set the ‘present date’ as a point in the past and try to predict its respective ‘future’. Here we pick 22 days in the past.

s = dataset['sentic_mean'] s_train = s[:'2022-09-01']

We made this first split in order to not leak data while doing any analysis.

Next, we prepare target and feature sets. Recall our SentiCrypto’s data was set to be available everyday at 8AM. Imagine we are doing the prediction by 9AM.

In this case, anything until the present data (the ‘lag_0‘) can be used as features, and our target is s_train‘s first lead (which we define as a -1 lag). To choose other lags as features, we examine theirs statsmodel’s partial auto-correlation plot:

from statsmodels.graphics.tsaplots import plot_pacf plot_pacf(s_train, lags=20)

We use the first four for sentic_mean and the first seven + the 11th for sentic_count (you can easily test different combinations with the code below.)

Now we finish choosing features, we go back to the full series for engineering. We apply to s_maen and s_count the make_lags function we defined in the last article (which we transcribe here for convenience).

def make_lags(df, n_lags=1, lead_time=1): """ Compute lags of a pandas.Series from lead_time to lead_time + n_lags. Alternatively, a list can be passed as n_lags. Returns a pd.DataFrame whose ith column is either the i+lead_time lag or the ith element of n_lags. """ if isinstance(n_lags,int): lag_list = list(range(lead_time, n_lags+lead_time)) else: lag_list = n_lags lags ={ f'{df.name}_lag_{i}': df.shift(i) for i in lag_list } return pd.concat(lags,axis=1) X = make_lags(s, [0,1,2,3,4]) y = make_lags(s, [-1]) display(X)
y

Now a train-test split with sklearn is convenient (Notice the shuffle=False parameter, that is key for time series):

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=22, shuffle=False) X_train

(Observe that the final date is set correctly, in accordance with our analysis’ split.)

Applying the regressor:

xgb = XGBRegressor(n_estimators=50) xgb.fit(X_train,y_train) predictions_train = pd.DataFrame( xgb.predict(X_train), index=X_train.index, columns=['Prediction']
) predictions_test = pd.DataFrame( xgb.predict(X_test), index=X_test.index, columns=['Prediction']
) print(f'R2 train score: {r2_score(y_train[:-1],predictions_train[:-1])}') plt.figure()
ax = plt.subplot()
y_train.plot(ax=ax, legend=True)
predictions_train.plot(ax=ax)
plt.show() plt.figure()
ax = plt.subplot()
y_test.plot(ax=ax, legend=True)
predictions_test.plot(ax=ax)
plt.show() print(f'R2 test score: {r2_score(y_test[:-1],predictions_test[:-1])}')

You can reduce overfitness by reducing the number of estimators, but the R2 test score maintains negative.

We can replicate the process for sentic_count (or whatever you want). Below is a function to automate it.

from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from statsmodels.tsa.stattools import pacf def apply_univariate_prediction(series, test_size, to_predict=1, nlags=20, minimal_pacf=0.1, model=XGBRegressor(n_estimators=50)): ''' Starting from series, breaks it in train and test subsets; chooses which lags to use based on pacf > minimal_pacf; and applies the given sklearn-type model. Returns the resulting features and targets and the trained model. It plots the graph of the training and prediction, together with their r2_score. ''' s = series.iloc[:-test_size] if isinstance(to_predict,int): to_predict = [to_predict] from statsmodels.tsa.stattools import pacf s_pacf = pd.Series(pacf(s, nlags=nlags)) column_list = s_pacf[s_pacf>minimal_pacf].index X = make_lags(series, n_lags=column_list).dropna() y = make_lags(series,n_lags=[-x for x in to_predict]).loc[X.index] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, shuffle=False) model.fit(X_train,y_train) predictions_train = pd.DataFrame( model.predict(X_train), index=X_train.index, columns=['Train Predictions'] ) predictions_test = pd.DataFrame( model.predict(X_test), index=X_test.index, columns=['Test Predictions'] ) fig, (ax1,ax2) = plt.subplots(1,2, figsize=(14,5), sharey=True) y_train.plot(ax=ax1, legend=True) predictions_train.plot(ax=ax1) ax1.set_title('Train Predictions') y_test.plot(ax=ax2, legend=True) predictions_test.plot(ax=ax2) ax2.set_title('Test Predictions') plt.show() print(f'R2 train score: {r2_score(y_train[:-1],predictions_train[:-1])}') print(f'R2 test score: {r2_score(y_test[:-1],predictions_test[:-1])}') return X, y, model apply_univariate_prediction(dataset['sentic_count'],22)

apply_univariate_prediction(dataset['BTC-USD'], 22)

Predicting with Seasons

Since the features created by DeterministicProcess are only time-dependent, we can add them harmlessly to the feature DataFrame we automated get from our univariate predictions.

The predictions, though, are still univariate. We use the deseasonalize function to obtain the season features. The data preparation is as follows:

s = dataset['sentic_mean'] X, y, _ = apply_univariate_prediction(s,22); s_deseason, _, dp = deseasonalize(s, season_freq='A', fourier_order=2, constant=True, dp_order=2, dp_drop=True, model=LinearRegression() );
X_f = dp.in_sample().shift(-1) X = pd.concat([X,X_f], axis=1, join='inner').dropna()

With a bit of copy and paste, we arrive at:

And we actually perform way worse!

Deseasonalizing

Nevertheless, the right-hand graphic illustrates the inability of grasping trends. Our last shot is a hybrid model.

Here we follow three steps:

We use the LinearRegression to capture the seasons and trends, rendering the series y_s. Then we acquire a deseasonalized target y_ds = y-y_s;
Train an XGBRegressor on y_ds and the lagged features, resulting in deseasonalized predictions y_pred;
Finally, we incorporate y_s back to y_pred to compare the final result.

Although Bitcoin-related data are hard to predict, there was a huge improvement on the r2_score (finally something positive!). We define the used function below.

get_hybrid_univariate_prediction(dataset['sentic_mean'], 22, season_freq='A', fourier_order=2, constant=True, dp_order=2, dp_drop=True, model1=LinearRegression(), fourier=None, is_seasonal=True, season_period=7, dp=None, to_predict=1, nlags=20, minimal_pacf=0.1, model2=XGBRegressor(n_estimators=50) )

Instead of going through every detail, we will also automate this code. In order to get the code running smoothly, we revisit the deseasonalize and the apply_univariate_prediction functions in order to remove the plotting part of them.

The final function only plots graphs and returns nothing. It intends to give you a baseline for a hybrid model score. Change the function at will to make it return whatever you need.

def get_season(series: pd.Series, test_size, season_freq='A', fourier_order=0, constant=True, dp_order=1, dp_drop=True, model1=LinearRegression(), fourier=None, is_seasonal=False, season_period=None, dp=None): """ Decompose series in a deseasonalized and a seasonal part. The parameters are relative to the fourier and DeterministicProcess used. Returns y_ds and y_s. """ se = series.iloc[:-test_size] if fourier is None: fourier = CalendarFourier(freq=season_freq, order=fourier_order) if dp is None: dp = DeterministicProcess( index=se.index, constant=True, order=dp_order, additional_terms=[fourier], drop=dp_drop, seasonal=is_seasonal, period=season_period ) X_in = dp.in_sample() X_out = dp.out_of_sample(test_size) model1 = model1.fit(X_in, se) X = pd.concat([X_in,X_out],axis=0) y_s = pd.Series( model1.predict(X), index=X.index, name=series.name+'_pred' ) y_s.name = series.name y_ds = series - y_s y_ds.name = series.name +'_deseasoned' return y_ds, y_s def prepare_data(series, test_size, to_predict=1, nlags=20, minimal_pacf=0.1): ''' Creates a feature dataframe by making lags and a target series by a negative to_predict-shift. Returns X, y. ''' s = series.iloc[:-test_size] if isinstance(to_predict,int): to_predict = [to_predict] from statsmodels.tsa.stattools import pacf s_pacf = pd.Series(pacf(s,nlags=nlags)) column_list = s_pacf[s_pacf>minimal_pacf].index X = make_lags(series, n_lags=column_list).dropna() y = make_lags(series,n_lags=[-x for x in to_predict]).loc[X.index].squeeze() return X, y def get_hybrid_univariate_prediction(series: pd.Series, test_size, season_freq='A', fourier_order=0, constant=True, dp_order=1, dp_drop=True, model1=LinearRegression(), fourier=None, is_seasonal=False, season_period=None, dp=None, to_predict=1, nlags=20, minimal_pacf=0.1, model2=XGBRegressor(n_estimators=50) ): """ Apply the hybrid model method by deseasonalizing/detrending a time series with model1 and investigating the resulting series with model2. It plots the respective graphs and computes r2_scores. """ y_ds, y_s = get_season(series, test_size, season_freq=season_freq, fourier_order=fourier_order, constant=constant, dp_order=dp_order, dp_drop=dp_drop, model1=model1, fourier=fourier, dp=dp, is_seasonal=is_seasonal, season_period=season_period) X, y_ds = prepare_data(y_ds,test_size=test_size) X_train, X_test, y_train, y_test = train_test_split(X, y_ds, test_size=test_size, shuffle=False) y = y_s.squeeze() + y_ds.squeeze() model2 = model2.fit(X_train,y_train) predictions_train = pd.Series( model2.predict(X_train), index=X_train.index, name='Prediction' )+y_s[X_train.index] predictions_test = pd.Series( model2.predict(X_test), index=X_test.index, name='Prediction' )+y_s[X_test.index] fig, (ax1,ax2) = plt.subplots(1,2, figsize=(14,5), sharey=True) y_train_ps = y.loc[y_train.index] y_test_ps = y.loc[y_test.index] y_train_ps.plot(ax=ax1, legend=True) predictions_train.plot(ax=ax1) ax1.set_title('Train Predictions') y_test_ps.plot(ax=ax2, legend=True) predictions_test.plot(ax=ax2) ax2.set_title('Test Predictions') plt.show() print(f'R2 train score: {r2_score(y_train_ps[:-to_predict],predictions_train[:-to_predict])}') print(f'R2 test score: {r2_score(y_test_ps[:-to_predict],predictions_test[:-to_predict])}')

A note of warning: if you do not expect your data to follow time patterns, do focus on cycles! The hybrid model succeeds well for many tasks, but it actually decreases the R2 score of our previous Bitcoin prediction:

get_hybrid_univariate_prediction(dataset['BTC-USD'], 22, season_freq='A', fourier_order=4, constant=True, dp_order=5, dp_drop=True, model1=LinearRegression(), fourier=None, is_seasonal=True, season_period=30, dp=None, to_predict=1, nlags=20, minimal_pacf=0.05, model2=XGBRegressor(n_estimators=20) )

The former score was around 0.31.

Conclusion

This article aims at presenting functions for your time series workflow, specially for lags and deseasonalization. Use them with care, though: apply them to have baseline scores before delving into more sophisticated models.

In future articles we will bring forth multi-step predictions (predict more than one day ahead) and compare performance of different models, both univariate and multivariate.