Do you need to create a function that returns a string but you don’t know how? No worries, in sixty seconds, you’ll know! Go!
A Python function can return any object such as a string. To return a string, create the string object within the function body, assign it to a variable my_string, and return it to the caller of the function using the keyword operation return my_string. Or simply create the string within the return expression like so: return "hello world"
def f(): return 'hello world' f()
# hello world
Create String in Function Body
Let’s have a look at another example:
The following code creates a function create_string() that iterates over all numbers 0, 1, 2, …, 9, appends them to the string my_string, and returns the string to the caller of the function:
def create_string(): ''' Function to return string ''' my_string = '' for i in range(10): my_string += str(i) return my_string s = create_string()
print(s)
# 0123456789
Note that you store the resulting string in the variable s. The local variable my_string that you created within the function body is only visible within the function but not outside of it.
So, if you try to access the name my_string, Python will raise a NameError:
>>> print(my_string)
Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> print(my_string)
NameError: name 'my_string' is not defined
To fix this, simply assign the return value of the function — a string — to a new variable and access the content of this new variable:
>>> s = create_string()
>>> print(s)
0123456789
There are many other ways to return a string in Python.
Return String With List Comprehension
For example, you can use a list comprehension in combination with the string.join() method instead that is much more concise than the previous code—but creates the same string of digits:
def create_string(): ''' Function to return string ''' return ''.join([str(i) for i in range(10)]) s = create_string()
print(s)
# 0123456789
For a quick recap on list comprehension, feel free to scroll down to the end of this article.
You can also add some separator strings like so:
def create_string(): ''' Function to return string ''' return ' xxx '.join([str(i) for i in range(10)]) s = create_string()
print(s)
# 0 xxx 1 xxx 2 xxx 3 xxx 4 xxx 5 xxx 6 xxx 7 xxx 8 xxx 9
def create_string(): ''' Function to return string ''' return 'ho' * 10 s = create_string()
print(s)
# hohohohohohohohohoho
String Concatenation of Function Arguments
Here’s an example of string concatenation that appends all arguments to a given string and returns the result from the function:
def create_string(a, b, c): ''' Function to return string ''' return 'My String: ' + a + b + c s = create_string('python ', 'is ', 'great')
print(s)
# My String: python is great
Concatenate Arbitrary String Arguments and Return String Result
You can also use dynamic argument lists to be able to add an arbitrary number of string arguments and concatenate all of them:
def create_string(*args): ''' Function to return string ''' return ' '.join(str(x) for x in args) print(create_string('python', 'is', 'great'))
# python is great print(create_string(42, 41, 40, 41, 42, 9999, 'hi'))
# 42 41 40 41 42 9999 hi
Background List Comprehension
Knowledge: List comprehension is a very useful Python feature that allows you to dynamically create a list by using the syntax [expression context]. You iterate over all elements in a given context “for i in range(10)“, and apply a certain expression, e.g., the identity expression i, before adding the resulting values to the newly-created list.
In case you need to learn more about list comprehension, feel free to check out my explainer video:
Programmer Humor
Q: How do you tell an introverted computer scientist from an extroverted computer scientist? A: An extroverted computer scientist looks at your shoes when he talks to you.
Although Neural Networks do a tremendous job learning rules in tabular, structured data, it leaves a great deal to be desired in terms of ‘unstructured’ data. And there we come to a new concept: Recurrent Neural Networks.
Recurrent Neural Network
A Recurrent Neural Network is to a Feedforward Neural Network as a single object is to a list: it may be thought as a set of interrelated feedforward networks, or a looped network.
It is specialized in picking up and highlighting the main characteristics of your data (more on that in Andrej Karpathy’s Blog). They are often followed by a Feed Forward (Dense) Layer which will weigh the output.
Long Short-Term Memory
Long Short-Term Memory (LSTM) clusters have the extra special ability to deal with time (more on it can be found in Colah’s article).
As the term memory suggests, its greatest promise is to understand correlations between past and present events. In particular, they fit naturally in time series forecasts.
Here we aim at a hands-on introduction to several LSTM-based architectures (and more is to come ).
Article Overview
We use Bitcoin daily closing price as a case study. Specifically, we use the Bitcoin price and sentiment analysis we have gathered in a previous article. We use TensorFlow‘s Keras API for the implementation.
In this article will aim at the following architectures:
‘Vanilla’ LSTM
Stacked LSTM
Bidirectional LSTM
Encoder-Decoder LSTM-LSTM
Encoder-Decoder CNN-LSTM
The last one being the more convoluted (pun intended).
There is one main issue dealing with time series, which is the implementation of the problem. Are common situation both having only the historical target value alone (univariate problem) or together with other information (multivariate problem).
Moreover, you might be interested in one-step prediction or a multi-step prediction, i.e., predicting only the next day or, say, all days in the next week. Although it doesn’t sound so, you have to adjust your model to whatever situation you are facing.
Think of how you would deal with a multivariate multi-step problem: should you train a one-step model and forecast all features in order to feed your model to predict the following days? That would be a crazy!
Kaggle’s time series course does a good job introducing the several strategies present to deal with multi-step prediction. Fortunately, setting an LSTM network for a multi-step multivariate problem is as easy as setting it for a univariate one-step problem – you just need to change two numbers.
This is another advantage of Neural Networks, apart from its capacity of memory.
Of course, the architecture list above is not exhaustive. For instance, a new Attention layer was recently introduced, which has been working wonders. We shall come back to it in a next article, where we will walk through a hybrid Attention-CLX model.
Disclaimer: This article is a programming/data analysis tutorial only and is not intended to be any kind of investment advice.
How to Prepare the Data for LSTM?
We will use two sources of data, both explicit in our previous article: the SentiCrypt‘s Bitcoin sentiment analysis and Bitcoin’s daily closing price (by following the steps in the previous article, you can do it differently, using a minute-base data, for example).
Let us load the already-saved sentiment analysis and download the Bitcoin price:
import pandas as pd
import yfinance as yf sentic = pd.read_csv('sentic.csv', index_col=0, parse_dates=True)
sentic.index.freq='D' btc = yf.download('BTC-USD', start='2020-02-14', end='2022-09-23', period='1d')[['Close']]
btc.columns = ['btc'] data = pd.concat([sentic,btc], axis=1) data
The LSTM layer expects a 3D array as input whose shape represents:
(data_size, timesteps, number_of_features).
Meaning, the first and last elements are the number of rows and columns from the input data, respectively. The timestep argument is the size of the time chunk you want your LSTM to process at a time. This will be the time frame the LSTM will look for relations between past and present. It is essentially the size of its (long short-term) memory.
To decide how many time-steps, we recall our first time series article where we explored partial auto-correlations of Bitcoin price’s lags.
from statsmodels.graphics.tsaplots import plot_pacf
import matplotlib.pyplot as plt plot_pacf(data.btc, lags=20)
plt.show()
If you were there, in the first article, with me, you might remember our curious 10-lags correlation. Here we use this magic number and feed the model with a 10 days frame and to make a 5 days prediction. I found the results with 10 days better than for 6 or 20 days (for most cases – see below for more about this). We also assume we have today’s data and try to forecast the next 5 days.
An easy way to accomplish the reshaping of the data is through (a slight modification) of our make_lags function together with NumPy’s reshape() method.
So, instead of a Series, we will take a DataFrame as input and will output a concatenation of the original frame with its respective lags. We use negative lags to prepare the target DataFrame. We will ignore observations with the produced NaN values and will use the align method to align their indexes.
def make_lags(df, n_lags=1, lead_time=1): """ Compute lags of a pandas.DataFrame from lead_time to lead_time + n_lags. Alternatively, a list can be passed as n_lags. Returns a pd.DataFrame resulting from the concatenation of df's shifts. """ if isinstance(n_lags,int): lag_list = range(lead_time, n_lags+lead_time) else: lag_list = n_lags lags=list() for i in lag_list: df_lag = df.shift(i) if i!=0: df_lag.columns = [f'{col}_lag_{i}' for col in df.columns] lags.append(df_lag) return pd.concat(lags, axis=1) X = make_lags(data, n_lags=20, lead_time=0).dropna()
y = make_lags(data[['btc']], n_lags=range(-5,0)).dropna() X, y = X.align(y, join='inner', axis=0)
Next, we train-test split the data with sklearn, taking 10% as test size. As usual for time series, we include shuffle=False as a parameter.
from sklearn.model_selection import train_test_split X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=.1, shuffle=False)
Before proceeding, it is good practice to normalize the data before feeding it into a Neural Network. We do it now, before things get 3D.
Finally, we use NumPy to reshape everything to 3D arrays. Observe that there is not such a thing as a 3D pd.DataFrame.
import numpy as np def add_dim(df, timesteps=5): """ Transforms a pd.DataFrame into a 3D np.array with shape (n_samples, timesteps, n_features) """ df = np.array(df) array_3d = df.reshape(df.shape[0],timesteps ,df.shape[1]//timesteps) return array_3d X_train, X_val = map(add_dim, [X_train, X_val], [timesteps]*2)
Of course, you can always prepare a function to do everything in one shot:
def prepare_data(df, target_name, n_lags, n_steps, lead_time, test_size, normalize=True): ''' Prepare data for LSTM. ''' if isinstance(n_steps,int): n_steps = range(1,n_steps+1) n_steps = [-x for x in list(n_steps)] X = make_lags(df, n_lags=n_lags, lead_time=lead_time).dropna() y = make_lags(df[[target_name]], n_lags=n_steps).dropna() X, y = X.align(y, join='inner', axis=0) from sklearn.model_selection import train_test_split X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=test_size, shuffle=False) if normalize: from sklearn.preprocessing import MinMaxScaler mms = MinMaxScaler().fit(X_train) X_train, X_val = mms.transform(X_train), mms.transform(X_val) if isinstance(n_lags,int): timesteps = n_lags else: timesteps = len(n_lags) return add_dim(X_train,timesteps), add_dim(X_val,timesteps), y_train, y_val
Note that one should give positive values to n_steps to have the right negative shifts. Fortunately, y_train, y_val are not reshaped, which makes life easier when comparing predictions with reality.
All set, let’s start with the most basic Vanilla model.
Side note: We are keeping things simple here, but in a future post, we will prepare our own batches and explore better the stateful parameter of an LSTM layer. More on its input and output can be found in Mohammad’s Git.
How to Implement Vanilla LSTM with Keras?
A model is called Vanilla when it has no additional structure apart from the output layer.
To implement it we add an LSTM and a Dense layer. We must pass the number of units of each and the input shape for the LSTM layer.
The input shape is exactly (n_timesteps, n_features) which can be inferred from X_train.shape. The number of units for the LSTM layer is a hyperparameter and shall be tuned, for the Dense layer it is the number of outputs we want. Therefore 5.
Next follows a hypertuning-friendly code, specifying the main parameters in advance.
The model_paramsdictionary will be useful for including additional parameters to the compile method, such as an EarlyStopping callback.
We also write a function that fits the model, plot and assess predictions. The present code does not output anything, so, feel free to change it in order to do so. We fix the optimizer as Adam and the loss metric as Mean Squared Error.
def fit_model(model, learning_rate=0.001, time_distributed=False, epochs=epochs, batch_size=batch_size, verbose=verbose): y_ind = y_val.index if time_distributed: y_train_0 = y_train.to_numpy().reshape((y_train.shape[0], y_train.shape[1],1)) y_val_0 = y_val.to_numpy().reshape((y_val.shape[0], y_val.shape[1],1)) else: y_train_0 = y_train y_val_0 = y_val # fit network from keras.optimizers import Adam adam = Adam(learning_rate=learning_rate) model.compile(loss='mse', optimizer='adam') history = model.fit(X_train, y_train_0, epochs=epochs, batch_size=batch_size, verbose=verbose, **model_params, validation_data=(X_val, y_val_0), shuffle=False) # make a prediction if time_distributed: predictions = model.predict(X_val)[:,:,0] else: predictions = model.predict(X_val) yhat = pd.DataFrame(predictions, index=y_ind, columns=[f'pred_lag_{i}' for i in range(-n_steps,0)]) yhat_shifted = pd.concat([yhat.iloc[:,i].shift(-n_steps+i) for i in range(len(yhat.columns))], axis=1) # calculate RMSE from sklearn.metrics import mean_squared_error, r2_score rmse = np.sqrt(mean_squared_error(y_val, yhat)) import matplotlib.pyplot as plt fig, (ax1,ax2) = plt.subplots(2,1,figsize=(14,14)) y_val.iloc[:,0].plot(ax=ax2,legend=True) yhat_shifted.plot(ax=ax2) ax2.set_title('Prediction comparison') ax2.annotate(f'RMSE: {rmse:.5f} \n R2 score: {r2_score(yhat,y_val):.5f}', xy=(.68,.93), xycoords='axes fraction') ax1.plot(history.history['loss'], label='train') ax1.plot(history.history['val_loss'], label='test') ax1.legend() plt.show()
The time_distributed parameter will be used in the last two architectures.
I opted to set a manual learning_rate since once the Stacked LSTM’s output was an array of NaNs. After figuring out that the gradient descent was not converging, that was fixed by decreasing Adam’s learning rate.
Use verbose=1 as a global parameter to debug your network.
Without further ado:
fit_model(vanilla)
The performance is comparable to our XGBoost 1-day prediction in the last article:
Moreover, we are predicting 5 days, not only one, making the r2 score more impressive.
What bothers me, on the other hand, is the fact the predictions for all five days look identical. It requires further analysis to understand why that is happening, which we will not do here.
How to Build a Stacked LSTM?
We also can queue two LSTM layers.
To this aim, we need to be careful to give a 3D input to the second LSTM layer and that is the role the parameter return_sequences plays. We gain a slight increase in the training score in this case.
In general, any RNN within minimal requirements can be made bidirectional through Keras’ Bidirectional layer. It stacks two copies of your RNN layer, making one backward.
You can either specify the backward_layer as a second RNN layer or just wrap a single one, which will make the Bidirectional instance use a copy as the backward model. An implementation can be found below.
An Encoder-Decoder structure is designed in a way you have one network dedicated to feature selection and a second one to the actual forecast. The architectures used can be of different types; even of recurrent-non recurrent pairs are allowed.
Here we explore two pairs: LSTM-LSTM and CNN-LSTM.
Compared to the previous presented architectures, the main difference is the inclusion of the RepeatVector layer and the wrapper TimeDistributed.
Although the RepeatVector is smoothly included, the TimeDistributed layer needs some care. It wraps a layer object and has the duty to apply a copy of each to each temporal slice imputed into it. It considers the .shape[1] of the first input as the temporal dimension (our prepare_data is in accordance to that).
Moreover, one has to watch out since it outputs a 3D array, in particular our model will output 3D predictions.
For this reason, we have to feed the model with reshaped y_val, y_train so that the loss functions can be computed. Fortunately, we already included the time_distributed parameter in the fit_model to deal with the reshaping.
We also increase the number of Epochs since these networks seem to take longer to find a minimum. We include an EarlyStopping though. It already gives an astonishing score!
This is the first time the steps outputs are visibly different from each other.
Nevertheless, it seems to be following some trend. In theory, the NN should be so powerful that it can capture trends as well. However, in practice detrending often gives better results. Nevertheless, 0.82 is a massive increase from our 0.32 XGBoost.
Encoder-Decoder CNN-LSTM Network
The last architecture we present is the CNN-LSTM one.
Here a Convolutional Neural Network is used as a feature selector, being well-known to perform well in this role for photos and videos.
The main reason they are so useful in this case is mathematical: the convolutional part of CNN’s name refers to the convolution operationin mathematics, which is used to emphasize translation-invariant features.
That makes complete sense when you have a photo, since you want your mobile phone to recoginze Toto as a dog, independent if it is in the lower-left corner or in the upper-center of the picture (of course your dog’s name is Toto, right?). You may recognize the CNN action as the smoothed lines in the graph.
For the sake of completion, we tweaked the code around a bit.
Do you remember the seemly significant correlation popped up in the 20-days lags? Well, increasing from 10 to 20 timesteps actually increases the R2 score in the last model:
Funnily enough, it increases even more if you use unnormalized data, making a stellar ~.94 score!
The last thing worth mentioning is the choice of the activation function. If you got the Warning below and wonder why, the Keras’ LSTM documentation provides an answer.
WARNING: tensorflow:Layer lstm_70 will not use cuDNN kernels since it doesn’t meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
(No, I did not loaded 70 LSTM layers. I loaded around 210 )
The documentation says:
“The requirements to use the cuDNN implementation are:
activation == tanh
recurrent_activation == sigmoid
recurrent_dropout == 0
unroll is False
use_bias is True
Inputs, if use masking, are strictly right-padded.
Eager execution is enabled in the outermost context.”
Changing the activation to ‘tanh‘ is enough in our case to use cuDNN, and they are incredibly faster! However tanh fits poorly into our problem:
(You saw it right, the learning rate is 1000x larger than the default. Otherwise the loss curve does not even change.)
Main Takeaways
There are a few points we have to keep in mind about LSTM:
The shape of their input
What are time steps
The shape of the layer’s output, especially when using return_sequences
Hyperparameters tunning is worth your time. For instance, the activation functions relu and tanh have their own pros and cons.
There are different architectures to play with (and many more to come – we will deal with Attention blocks and Multi-headed networks soon). Consider using them. I’ve become specially inclined towards the Encoder-Decoders
Do you want to learn Solidity and create your own dApps and smart contracts? This free online course gives you a comprehensive overview that is aimed to be more accessible than the Solidity documentation but still complete and descriptive.
Multimodal Learning: Each tutorial comes with a tutorial video that helps you grasp the concepts in a more interactive manner.
If you print all values from a dictionary in Python using print(dict.values()), Python returns a dict_values object, a view of the dictionary values. The representation prints the keys enclosed in a weird dict_values(...), for example: dict_values([1, 2, 3]).
There are multiple ways to change the string representation of the values, so that the print() output doesn’t yield the strange dict_values view object.
Method 1: Convert to List
An easy way to obtain a pretty output when printing the dictionary values without dict_values(...) representation is to convert the dict_value object to a list using the list() built-in function. For instance, print(list(my_dict.value())) prints the dictionary values as a simple list.
So far, so simple. Read on to learn or recap some important Python features and improve your skills. There are many paths to Rome!
Method 2: Unpacking
An easy and Pythonic way to print a dictionary without the dict_values prefix is to unpack all values into the print() function using the asterisk operator. This works because the print() function allows an arbitrary number of values as input. It prints those values separated by a single whitespace character per default.
Do you need even greater flexibility than this? No problem! See here:
Method 3: String Join Function and Generator Expression
To convert the dictionary values to a single string object without 'dict_values' in it and with maximal control, you can use the string.join() function in combination with a generator expression and the built-in str() function.
Here’s an example:
my_dict = {'name': 'Carl', 'age': 42, 'income': 100000}
print(', '.join(str(x) for x in my_dict.values()))
# Carl, 42, 100000
Note: You can replace the comma ',' with your desired separator character and modify the representation of each individual element by modifying the expression str(x) of the generator expression to something arbitrary complicated.
See here for something crazy that wouldn’t make any sense:
my_dict = {'name': 'Carl', 'age': 42, 'income': 100000}
print(' | '.join('x' + str(x) + 'x' for x in my_dict.values()))
# xCarlx | x42x | x100000x
Note that you could also use the repr() function instead of the str() function in this example—it wouldn’t matter too much.
Finally, I’d recommend you check out this tutorial to learn more how generator expressions work—many Python beginners struggle with this concept even though it’s ubiquitous in expert coders’ code bases.
The most Pythonic way to print a dictionary except for one or multiple keys is to filter it using dictionary comprehension and pass the filtered dictionary into the print() function.
There are multiple ways to accomplish this and I’ll show you the best ones in this tutorial. Let’s get started!
Say, you have one or more keys stored in a variable ignore_keys that may be a list or a set for efficiency reasons.
Create a filtered dictionary without one or multiple keys using the dictionary comprehension {k:v for k,v in my_dict.items() if k not in ignore_keys} that iterates over the original dictionary’s key-value pairs and confirms for each key that it doesn’t belong to the ones that should be ignored.
Here’s a minimal example:
ignore_keys = {'x', 'y'}
my_dict = {'x': 1, 'y': 2, 'z': 3} filtered_dict = {k:v for k,v in my_dict.items() if k not in ignore_keys}
print(filtered_dict)
# {'z': 3}
The dict.items() method creates an iterable of key-value pairs over which we can iterate.
The membership operatork not in ignore_keys tests if a given key doesn’t belong to the set.
The runtime complexity of the membership check is constant O(1) if you use a set for the ignore_keys data structure. It would be linear O(n) in the number of elements if you used a list which is not a good idea for that reason.
Note that you can also use this approach to print a dictionary except a single key by putting only one key into the ignore list.
A not-so-Pythonic but reasonably readable way to print a dict without one or multiple keys is to use a simple for loop with if condition to avoid all keys in the ignore list.
Here’s an example using three lines and directly printing the key-value pairs:
ignore_keys = {'x', 'y'}
my_dict = {'x': 1, 'y': 2, 'z': 3} for k, v in my_dict.items(): if k not in ignore_keys: print(k, v)
The output:
z 3
Of course, you can modify the output to your own needs. See the customizations of the built-in print() function and its awesome arguments:
I could have listed many more ways to solve this problem of printing a dict except one or more keys.
I have seen super inefficient ways proposed on forums that use exclude_keys that are list types.
I have also seen elaborate schemes to use set difference operations or more.
But I don’t recommend anything else than dict comprehension if you want to create a filtered dictionary object first and the simple for loop if you want to print on the fly.
In this article, I’ll be going over the different types of state variables in Solidity and how to use them. State variables are one of the most important parts of any smart contract, as they allow us to store data that can change over time.
This article is mainly focused on value types of state variables, but I’ll be continuing with another two articles on reference and complex types as well as data location. Let’s dive in!
Basics – A Quick Review
Smart contracts are pieces of code that are deployed in blockchain nodes. They are immutable, meaning they cannot be changed once they have been deployed. This can make it necessary to redeploy the code as a new smart contract or redirect calls from an old contract to new ones.
A smart contract is initiated by a message embedded in a transaction. Ethereum enables these transactions, which may carry out more sophisticated operations like conditional transfers.
A conditional transfer, such as one that depends on the age of the buyer or the value of their bid, could be required.
Example: If the buyer is over 21 and their bid is greater than the minimum bid, then accept the bid. Otherwise reject it.
Smart contracts are executed when predetermined conditions are met to automate the execution of an agreement so that all parties can be immediately certain of the outcome without the need for an intermediary.
a collection of code (its functions or methods with modifiers public or private with getter and set functions).
What is the structure of a smart contract?
As we have seen in other articles in Finxter, the structure of a smart contract is as follows:
Contract in the Ethereum blockchain has pragma directive;
Name of the contract;
Data or the state variable that define the state of the contract;
Collection of functions to carry out the intent of a smart contract;
Note that the identifiers representing these elements are restricted to the ASCII character set. Make sure you select meaningful identifiers and follow camel case convention in naming them.
Variable Declaration
To declare a variable in Solidity, you must first specify its data type. This is followed by an access modifier and the variable name.
Structure
<type> <access modifier> <variable name> ;
Example:
What Categories of Variables Exist in Solidity?
Solidity supports three categories of variables:
(1) State Variables
State variables are variables whose values are permanently stored in a contract storage.
What does this mean?
State variables are an essential part of any contract. They are variables whose values are permanently stored in the contract storage. They can be thought of as a single slot in a database that you can query and alter by calling functions of the code that manages the database. The set and get functions can be used to modify and retrieve the value of the variables.
In other words, the data (state variables) are stored contiguously item after item starting with the first state variable, stored in slot 0. For each variable, the size in bytes is determined according to its type. Several contiguous items that require less than 32 bytes are packed into a single storage slot if possible.
To make it easier, if you use other languages and want to store user information for a long time, you would connect your application to a database server and then store the information in the database. In Solidity, however, you do not need to connect, you can simply store the data permanently using state variables.
(2) Local Variables
Local variables are variables whose values exist until the function is executed; the context of local variables is within the function and cannot be accessed outside.
Typically, these variables are used to hold temporary values for processing or computing something. In the following example, “temp” is a local variable that cannot be used outside the “set” function.
(3) Global Variables
Global variables are variables whose values exist in the global namespace to obtain information about the blockchain.
Each function has its own scope, but state variables should always be defined outside the scope, like the attributes of a class.
They are permanently stored in the Ethereum blockchain, more precisely in the storage Merkle-Patricia tree, which is part of the information that forms the state of an account (that’s why we call them state variables).
What Types of Valid State Variables Exist?
Info: Solidity is a statically typed language, meaning each variable’s type must be specified at the time of its declaration.
“Undefined” or “null” values do not exist in Solidity, but newly declared variables always have a default value depending on their type, typically called “zero- state”.
For example, the default value for bool is false.
As in other languages (not Python ), there are two types in Solidity: value types and reference types.
The value type is a variable that stores its value or its own data directly; it is a value type. If the variable contains a location of the data – it is a reference type.
The reference types are discussed in a separate article.
For example, consider the integer variable int i = 100;
The system stores 100 in the memory location allocated for the variable i. The following image shows how 100 is stored in a hypothetical location in memory (0x239110) for “i”:
What are the Modifiers for the State Variables?
Visibility – access modifiers
Access modifiers are the keywords used to specify the declared accessibility of a state variable and functions.
Variables in Solidity have three types of visibility: public, private, and internal. If visibility is not explicitly declared, the compiler considers it internal.
For variables of type public, the compiler automatically creates a method to retrieve them through a call. This does not apply to private or internal variables.
Example:
uint256 public a; is actually exactly the same thing as : uint256 private a;
function a() public view returns(uint256) {
return a;
}
When you create a public variable, it is stored the same way as a private variable, but the compiler automatically creates a getter function for it.
The difference between private and internal variables is that internal variables are inherited by child contracts, while private variables are not.
To learn more about private variables:
contract Addition { uint x; //internal variable uint public y; // contract Child is Addition{ //no need to define x since the child contract inherits the variable //uintx function setX(uint _x) public { x =_x; function getX() public view returns (uint) { return x; }
}
Note that the data location (memory, storage, and call data) must be specified for variables of reference type. This is necessary when function arguments are involved. We will cover this in an article on data location.
Other keywords
The following keywords can be used for state variables to restrict changes to their state.
Constant (replaced by “view” and “pure” in functions)
Constant disallows assignment (except at initialization), i.e. they cannot be changed after initialization, but must be initialized at the time of their declaration.
Example:
uint private constant t = 40;
The variable t has been declared once and therefore cannot be changed.
It is interesting to note that the declaration of a constant variable without initialization is forbidden and the compiler displays an error, e.g.:
Contract Addition { uint private x; uint public y; uint private constant z; //gives an error because constant variables must be initialized when declared.
Immutable
These variables can be declared without being initialized, but the assignment, which is only one, must be done in the constructor. After that, the variable is constant thereafter.
uint private immutable w; //now we declare a constructor for the contract, using the function constructor constructor() { w = 20; //initiate variable }
Override
This keyword states that the public state variables change the behavior of a function.
Value Types
These variables are passed by value. That is, they are copied when they are used either in an assignment or in a function argument.
If this sentence is not clear, you can check here.
Here we will see the basic value types.
Value types are booleans, integers, addresses, enums, and bytes.
Booleans
Boolean values can be true or false
An example of a boolean type:
contract ExampleBool { // example of a bool value type in solidity bool public IsVerified = false; bool public IsSent = true; }
Integers
There are int/uint (signed and unsigned integers) types of various sizes. It stores the values in a range of 8, int16, …up to int256. Int256 is the same as int, same for uint8, and uint256.
Note: uint256 is the same as uint.
The type uint stands for positive integers. The type int stands for both positive and negative integers.
The type uint8 (has 8 bits, which corresponds to 1 byte. This means that it accepts numbers between 0 and 255; bit is a binary digit. So one byte can hold 2 (binary) ^ 8 numbers from 0 to 2^8-1 = 255. This is the same as asking why a three-digit decimal number can represent the values 0 to 999.
The type uint256 accepts numbers between 0 and 2^256.
If we try to assign the value 256 to a variable of type uint8, the compiler will print an error.
The best practice for integers is to specify the value of the bits at the declaration stage to use as little space as possible and reduce the cost of storage. So use uint8 or uint16 instead of always using int (uint256).
contract SimpleContract{ uint32 public uidata = 1234567; //un-signed integer int32 public idata = -1234567; //signed integer }
Fixed Point Numbers
According to the Solidity documents, fixed-point numbers are the type for floating-point numbers. However, the official document states that “Fixed point numbers are not yet fully supported by Solidity”. They can be declared, but cannot be added to or derived from.
However, you can use floating point numbers for calculations, but the value resulting from the calculation should be an integer.
Here is an example,
contract additionContract{ uint8 result; function Addition(uint) public { result = 2/3; //error result = 3.5 + 1.5; // final result will be an integer } }
Let’s do a subtle change,
Address
The address data type is very specific to Solidity.
On the Ethereum blockchain, every account and smart contract has an address that is used to send and receive Ether from one account to another.
This is your public identity on the blockchain.
Also, when you deploy a smart contract on the blockchain, that contract is assigned an address that you can use to identify and call the smart contract.
There are two variants for the address type, which are identical:
address – stores a 20-byte value (the size of an Ethereum address or account). The default value for the address is 0x…followed by 40 0’s, or 20 bytes of 0’s.
address payable – like address, but transfer and send with the additional members.
The idea behind this distinction is that the address payable is an address you can send Ether to, while you should not send Ether to a plain address, as it could be a smart contract that was not built to accept Ether.
contract ExampleAddress { address public myAddress = 0xc895t6ea1bc39595cf849612ffta7427f5792987
Enums
What stands for enumerable is a user-defined data type that restricts the variable to have only one of the predefined values.
These values listed in the enumerated list are called enums, and internally these enums are treated like numbers (resource). This makes the contract more readable and maintainable.
contract SampleEnum{ //Creating an enumerator enum animal_classes { Mammals, Fish, Amphibians, Reptiles, Birds } function getFirstEnum() public pure returns(animal_classes){ return animal_classes.Mammals; } // result: // 0: uint8: 0 }
With enums, we can also set a default value;
animal_classes constant defaultValue = animal_classes.Reptiles; function getDefaultValue() public pure returns(animal_classes) { return defaultValue; } } //result // result: // 0: uint8: 2
Bytes and Strings
A byte refers to signed 8-bit integers. Everything in memory is stored in bits with binary values 0 and 1.
Solidity supports string literals that use both double quotes (") and single quotes ('). It provides String as a data type to declare a variable of type String.
Strings are unique in Solidity compared to Python or other programming languages in that there are no functions for manipulating strings, except that you can concatenate strings. The reason for this is that storing strings in a blockchain is very expensive.
Bytes and strings are easy to handle in Solidity because Solidity treats them similarly to an array. The two are very similar. (See Arrays in the Reference Type article).
Conclusion
Smart contracts reside at a specific address in the Ethereum blockchain. In this article, we learned about state variables in Solidity.
We looked at state, local variables, and the different types with a value type.
We tried to understand Boolean, Integers, Enums, Addresses, Bytes, and Strings (although the last ones are treated with more depth in reference types)
You can unpack all list elements into the print() function to print all values individually, separated by an empty space per default (that you can override using the sep argument). For example, the expression print('[', *lst, ']') prints the elements in my_list, empty space separated, with the enclosing square brackets and without the separating commas!
Here’s an example:
lst = [1, 2, 3]
print('[', *lst, ']')
# [ 1 2 3 ]
You can learn about the ins and outs of the built-in print() function in the following video:
To master the basics of unpacking, feel free to check out this video on the asterisk operator:
Method 2: String Replace Method
A simple way to print a list without commas is to first convert the list to a string using the built-in str() function. Then modify the resulting string representation of the list by using the string.replace() method until you get the desired result.
Here’s an example:
my_list = [1, 2, 3] # Convert List to String
s = str(my_list)
print(s)
# [1, 2, 3] # Replace Separating Commas
s = s.replace(',', '') # Print List Without Commas
print(s)
# [1 2 3]
The last line of the code snippet shows that the commas are removed from the output.
Method 3: String Join With Generator Expression
You can print a list without commas using the string.join() method on any separator string such as ' ' or '\t'. Pass a generator expression to convert each list element to a string using the str() built-in function.
Specifically, the expression print('[', ' '.join(str(x) for x in my_list), ']') prints my_list to the shell without separating commas.
my_list = [1, 2, 3]
print('[', ' '.join(str(x) for x in my_list), ']')
# Output: [ 1 2 3 ]
The str(object) built-in function converts a given object to its string representation.
Generator expressions or list comprehensions are concise one-liner ways to create a new iterable based by reusing elements from another iterable.
You can dive deeper into generators in the following video:
Note: Combining the join() method with a generator expression and string concatenation is the recommended approach of choice if you want to convert a list to a string without commas instead of printing it.
Here’s an example:
my_list = [1, 2, 3]
s = '[' + ' '.join(str(x) for x in my_list) + ']'
print(s)
# Output: [ 1 2 3 ]
Method 4: Print NumPy Array
Sometimes it is sufficient to use the NumPy default output that is without separating commas. For example, if you print a list it yields [1, 2, 3]. And if you print an array it yields [1 2 3]. You can easily convert a list to a NumPy array using the np.array(lst) constructor.
Coders get paid six figures and more because they can solve problems more effectively using machine intelligence and automation.
To become more successful in coding, solve more real problems for real people. That’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?
You build high-value coding skills by working on practical coding projects!
Do you want to stop learning with toy projects and focus on practical code projects that earn you money and solve real problems for people?
If your answer is YES!, consider becoming a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.
If you just want to learn about the freelancing opportunity, feel free to watch my free webinar “How to Build Your High-Income Skill Python” and learn how I grew my coding business online and how you can, too—from the comfort of your own home.
Per default, Python doesn’t truncate lists when printing them to the shell, even if they are large. For example, you can call print(my_list) and see the full list even if the list has one thousand elements or more!
Here’s an example:
However, Python may squeeze the text (e.g., in programming environments such as IDLE) so you would have to press the button before seeing the output. The reason is that showing the whole output could be time-consuming and visually cluttering.
Here’s an example:
How to Print a NumPy Array Without Truncating?
In many cases, large NumPy arrays when printed out are not truncated as well on the default Python programming environment IDLE:
However, in the interactive mode of the Python shell, a NumPy array may be truncated, unlike a Python list:
Welcome to the Finxter blog! My name is Chris, and I started this coding venture a couple of years ago.
Over the years, I have chatted with tens of thousands of Finxters who shared their stories and struggles with me.
See here and here to read a lot of feedback from the community.
Today, allow me to share my story about why I started teaching freelancing.
It may inspire you to take control of your life if you’re in a tough spot right now – for example, struggling with the economic, military, and energy crises that are happening right now.
If you’re not interested in my personal story, now would be the time to stop reading. I won’t blame you!
~~~
Once upon a time, when I was a timid and naive 20-year-old dreamer, my 18-year-old girlfriend unexpectedly got pregnant.
She was still in high school, and I had just started studying computer science.
At the time, we had zero income and maybe $900 in savings.
I was living in a cheap 15-square-meter room with a desk and a bed and not much else.
As young and poor parents without any education or degree, we constantly felt judgment and pity from society.
We couldn’t even rent a flat because no landlord was crazy enough to take us in.
During all the struggle, we had love and dreams and the belief that everything would get better eventually: I was going to be a computer scientist in five years.
That is if I found a way to support my family on a shoestring – and avoided screwing up my education.
The first ten years, money was tight as hell. Little time. Lots of hard work. No TV. No Games. No Saturday night partying.
Well, maybe a little…
I am not a wunderkind. But I have good work ethic, and long-term goals, and I don’t give up easily. Finally, after ten tough years, I got my Ph.D. in computer science “summa cum laude”.
I now had a steady paycheck from my government job. But I eventually learned that the academic degrees didn’t help in improving our financial situation.
People made far more money and had far more free time coding in the private sector and without academic degrees.
I decided to take matters into my own hands again by creating my own coding business as a freelance developer.
In little time, I reached six-figure income levels. And I had much more free time compared to my government job that I held before.
My second child – now five years old – knows his father to have infinite time playing soccer, video games, or watching the Tesla Bot taking his first steps on YouTube.
(He plans to become CEO of Tesla – stay tuned @Elon).
~~~
Becoming a freelancer was a pivot point in my life.
To share all I know about creating a thriving coding business online, I have set up our freelancer course.
It focuses on the fundamentals:
find your niche,
build your skills,
create value for your customers, and
take massive action.
Simple, but sometimes not so easy…
If you want more from life and you love coding, feel free to subscribe to my free email academy, I’d love to have you in our community of ambitious coders who have not yet lost their ability to dream of a better life!
It’s part of our long-standing tradition to make this (and other) articles a faithful companion or a supplement to the official Solidity documentation.
Download PDF Slide Deck at the end of this tutorial!
Contract Types
To quote the official Solidity documentation, “every contract defines its own type”.
This statement might seem a bit cryptic, and since we’re an efficient crowd, we’d surely like to know what it means.
We can all remember that some number of articles ago, we mentioned how Solidity has key elements of an object-oriented programming language (OOPL). We also emphasized how smart contracts in Solidity are very similar to classes in an OOPL.
Classes themselves are a mesh of custom data types, i.e. structs, and functions, which qualifies classes to be treated as types.
By extension, our contracts are also treated as types, and as every contract is unique in its own right, it defines its own type. Being a type, we can implicitly convert a specific contract to a contract it inherits from, i.e. if contract “Aa” inherits from contract A, it can also be converted to contract “A”.
Besides that, we can explicitly convert each contract to and from the address type. Even more, we can conditionally convert a contract to and from the address payable type (remember, that’s the same type as the address type, but predetermined to receive Ether).
The condition is that the contract type must have a receive or payable fallback function. If it does, we can make the conversion to address payable by using address(x).
However, if the contract type does not implement (a more professional way to say “have”) a receive or payable fallback function, then the conversion to address payable has to be even more explicit (no swearing!) by stating payable(address(x)).
A local variable obc of a contract type OurBeautifulContract is declared by OurBeautifulContract obc;.
Once we point our variable obc to an instantiated (newly created) contract, we’d be able to call functions on that contract.
In terms of its data representation, a contract is identical to the address type. This is important because the contract type is not directly supported by the ABI, but the address type, as its representative, is supported by the ABI.
In contrast to the types mentioned so far, contract types don’t support any operators.
The members of contract types are the external functions (the functions only available to other contracts) and state variables whose visibility is set to public.
When we need to access type information about the contract, like the OurBeautifulContract above, we’d call the type(OurBeautifulContract) function (docs).
Fixed-Size Byte Arrays
The value type bytesN holds a sequence of bytes, whose length, and accordingly N goes from 1 to up to 32, i.e., bytes1, …, bytes32.
The available operators for fixed-size operators are:
Comparisons: <=, <, ==, !=, >=, > (evaluate to bool)
Index access: If x is of type bytesN, then x[k] for 0 <= k < N returns the k-th byte (read-only). In other words, x[0] up to (inclusive) x[N-1] is available for index access; if N = 1, then only x is of type bytes1, and x[0] is the only element, i.e. byte accessible by the index.
The shifting operator always uses an unsigned integer type as a right operand, which represents the number of bits to shift by, and returns the type of the left operand.
Let’s take a look at a simple example to illustrate:
bytes2 lo = 0x1234; // (lo is the left operand)
uint8 ro = 5; // (ro is the right operand variable, must be u... type)
lo << ro // will evaluate to an lo type, bytes2
A fixed-size byte array has only one member, .length, that holds the fixed length of the byte array. This member is accessible as the read-only value.
Warning: Since the type bytes1 is a sequence of 1 byte in length, the type bytes1[] is a fixed-size byte array of 1-byte sequences. However, each element of the array is padded with 31 bytes, due to padding rules for elements stored in memory, stack, and call data, i.e., except in storage. Therefore, according to the official Solidity documentation, it’s better to use bytes type instead of bytes1[].
Note: Value types in storage are packed/compacted together and share a storage slot, taking only as much space per value type as really needed. In contrast, the stack, memory, and calldata pad value types and store in separate slots, meaning that each variable uses a whole slot of 32 bytes, even if the value type is shorter than 32 bytes, effectively wasting the memory space.
Before Solidity v0.8.0, the keyword byte was an alias for bytes1.
Dynamically-Sized Byte Arrays
There are two dynamically-sized non-value types, namely bytes and string.
bytes is a dynamically-sized byte array, while
string is a dynamically-sized UTF-8-encoded string.
Address Literals
Address literals are hexadecimal literals that pass the address checksum test, e.g. 0xdCad3a6d3569DF655070DEd06cb7A1b2Ccd1D3AF.
Hexadecimal literals will produce an error if they are between 39 and 41 digits long and do not pass the checksum test.
However, we can remove the error by prepending zeros to integer types or appending zeros to bytesNN types.
The Ethereum Improvement Proposal EIP-55 defines the mixed-case address checksum.
Integer and Rational Literals
Integer Literals
Integer literals are created using a sequence of digits from a range 0-9, and each digit is interpreted (weighted) based on its position in the sequence.
Multiplied by an exponent of 10, e.g. 217 is interpreted as two hundred and seventeen, because, reading from right to left, we have 7 * 100 + 1 * 101 + 2 * 102.
A reminder, 100 = 1.
Octal literals don’t exist in Solidity and leading zeros are invalid.
Decimal Fractional Literals
Decimal fractional literals consist of a dot . (or, depending on the locale) and at least one number on either of the sides, e.g. 1., .1, and 1.3.
Info: “A locale consists of a number of categories for which country-dependent formatting or other specifications exist” (source).
Scientific Notation
Solidity also supports scientific notation in the form of 2e10, where 2 (left of “e”) is called mantissa (M) and the exponent (E) must be an integer. In a general form, we would write it as MeE and it is interpreted as M * 10**E, e.g. 2e10, -2e10, 2e-10, 2.5e1.
Readable Underscore Notation
We can also do a neat thing: separate the digits of a numeric literal for easier readability, such as in decimal 123_000, hexadecimal 0x2eff_abde, scientific decimal notation 1_2e345_678.
However, there are no leading, trailing, or multiple underscores; they can only be added between two digits.
Number Literal Expressions
Expressions containing number literals preserve their precision until they are converted to a non-literal type.
Such a conversion means an explicit conversion, or that the number literals are used with something else than a number literal expression, like boolean literals.
This behavior implies that computations don’t overflow and divisions don’t truncate in number literal expressions.
A very good example would be a number literal expression (2**800 + 1) – 2**800, which results in the constant 1 (of type uint8), although the intermediate results would not fit the capacity of the EVM word length of 32 bytes.
One more example shows that an integer 4 is produced by computing the expression .5 * 8, although the intermediary results are not integers.
More Operations
Warning: most operators produce a literal expression when applied to number literals, but there are also two exceptions:
Ternary operator (... ? ... : ...),
Array subscript (<array>[<index>]).
In other words, expressions like 255 + (true ? 1 : 0) or 255 + [1, 2, 3][0] are not equivalent to using the literal 256 (the result of these two expressions), as they are computed within the type uint8 and can lead to an overflow.
Number literal expressions can use the same operators as the integers, but both operands must compute yield an integer.
If either of the operands is fractional, bit operations are inapplicable for use;
If the exponent is a decimal fractional literal, the exponentiation operation is also inapplicable for use.
Shifts and exponentiation * operations with literal numbers in place of a left (base*) operand and integer types in place of the right (exponent*) operand are performed in the uint256 for non-negative literals or int256 for negative literals (a * symbol pertains to the exponentiation operations context).
Warning: Since Solidity v0.4.0 division on integer literals produces a rational number, e.g. 7 / 2 = 3.5.
Solidity has a number literal types for each rational number, e.g. integer literals and rational number literals belong to the same number literal type.
All number literal expressions (expressions with only number literals and operators) also belong to number literal types, e.g. 1 + 2 and 2 + 1 belong to the same number literal type.
Note: When number literal types are used with non-literal expressions, they are converted into a non-literal type, e.g. uint128 a = 1; uint128 b = 2.5 + a + 0.5;
Here, 1 is converted into a non-literal type uint128, i.e. variable a, but a common type for both 2.5 and uint128 doesn’t exist and the compiler will reject the code.
Conclusion
In this article, we added even more data types in Solidity under our proverbial belt!
First, we introduced and learned about the contract type.
Second, we fixed our understanding of the fixed-size byte array type.
Third, the situation got dynamic by studying the dynamically-sized byte array type.
Fourth, we addressed the… what was it called… Aha – address literals!
Fifth, we came to the most rational decision and discovered what rational and integer literals are and, of course, how can they be put to good use.
Slide Deck Data Types
You can scroll through the data types discussed in this tutorial here: