Posted on Leave a comment

Demonstrating Perl with Tic-Tac-Toe, Part 4

This is the final article to the series demonstrating Perl with Tic-Tac-Toe. This article provides a module that can compute better game moves than the previously presented modules. For fun, the modules chip1.pm through chip3.pm can be incrementally moved out of the hal subdirectory in reverse order. With each chip that is removed, the game will become easier to play. The game must be restarted each time a chip is removed.

An example Perl program

Copy and paste the below code into a plain text file and use the same one-liner that was provided in the the first article of this series to strip the leading numbers. Name the version without the line numbers chip3.pm and move it into the hal subdirectory. Use the version of the game that was provided in the second article so that the below chip will automatically load when placed in the hal subdirectory. Be sure to also include both chip1.pm and chip2.pm from the second and third articles, respectively, in the hal subdirectory.

00 # artificial intelligence chip
01 02 package chip3;
03 require chip2;
04 require chip1;
05 06 use strict;
07 use warnings;
08 09 sub moverama {
10 my $game = shift;
11 my @nums = $game =~ /[1-9]/g;
12 my $rama = qr/[1973]/;
13 my %best;
14 15 for (@nums) {
16 my $ra = $_;
17 next unless $ra =~ $rama;
18 $best{$ra} = 0;
19 for (@nums) {
20 my $ma = $_;
21 next unless $ma =~ $rama;
22 if (($ra-$ma)*(10-$ra-$ma)) {
23 $best{$ra} += 1;
24 }
25 }
26 }
27 28 @nums = sort { $best{$b} <=> $best{$a} } keys %best;
29 30 return $nums[0];
31 }
32 33 sub hal_move {
34 my $game = shift;
35 my $mark = shift;
36 my @mark = @{ shift; };
37 my $move;
38 39 $move = chip2::win_move $game, $mark, \@mark;
40 41 if (not defined $move) {
42 $mark = ($mark eq $mark[0]) ? $mark[1] : $mark[0];
43 $move = chip2::win_move $game, $mark, \@mark;
44 }
45 46 if (not defined $move) {
47 $move = moverama $game;
48 }
49 50 if (not defined $move) {
51 $move = chip1::hal_move $game;
52 }
53 54 return $move;
55 }
56 57 sub complain {
58 print 'Just what do you think you\'re doing, ',
59 ((getpwnam($ENV{'USER'}))[6]||$ENV{'USER'}) =~ s! .*!!r, "?\n";
60 }
61 62 sub import {
63 no strict;
64 no warnings;
65 66 my $p = __PACKAGE__;
67 my $c = caller;
68 69 *{ $c . '::hal_move' } = \&{ $p . '::hal_move' };
70 *{ $c . '::complain' } = \&{ $p . '::complain' };
71 72 if (&::MARKS->[0] ne &::HAL9K) {
73 @{ &::MARKS } = reverse @{ &::MARKS };
74 }
75 }
76 77 1;

How it works

Rather than making a random move or making a move based on probability, this final module to the Perl Tic-Tac-Toe game uses a more deterministic algorithm to calculate the best move.

The big takeaway from this Perl module is that it is yet another example of how references can be misused or abused, and as a consequence lead to unexpected program behavior. With the addition of this chip, the computer learns to cheat. Can you figure out how it is cheating? Hints:

  1. Constants are implemented as subroutines.
  2. References allow data to be modified out of scope.

Final notes

Line 12 demonstrates that a regular expression can be pre-compiled and stored in a scalar for later use. This is useful as performance optimization when you intend to re-use the same regular expression many times over.

Line 59 demonstrates that some system library calls are available directly in Perl’s built-in core functionality. Using the built-in functions alleviates some overhead that would otherwise be required to launch an external program and setup the I/O channels to communicate with it.

Lines 72 and 73 demonstrate the use of &:: as a shorthand for &main::.

The full source code for this Perl game can be cloned from the git repository available here: https://pagure.io/tic-tac-toe.git

Posted on Leave a comment

LaTeX typesetting, Part 3: formatting

This series covers basic formatting in LaTeX. Part 1 introduced lists. Part 2 covered tables. In part 3, you will learn about another great feature of LaTeX: the flexibility of granular document formatting. This article covers customizing the page layout, table of contents, title sections, and page style.

Page dimension

When you first wrote your LaTeX document you may have noticed that the default margin is slightly bigger than you may imagine. The margins have to do with the type of paper you specified, for example, a4, letter, and the document class: article, book, report, and so on. To modify the page margins there are a few options, one of the simplest options is using the fullpage package.

This package sets the body of the page such that the page is almost full.

Fullpage package documentation

The illustration below demonstrates the LaTeX default body compared to using the fullpage package.

Another option is to use the geometry package. Before you explore how the geometry package can manipulate margins, first look at the page dimensions as depicted below.

  1. one inch + \hoffset
  2. one inch + \voffset
  3. \oddsidemargin = 31pt
  4. \topmargin = 20pt
  5. \headheight = 12pt
  6. \headsep = 25pt
  7. \textheight = 592pt
  8. \textwidth = 390pt
  9. \marginparsep = 35pt
  10. \marginparwidth = 35pt
  11. \footskip = 30pt

To set the margin to 1 (one) inch using the geometry package use the following example

\usepackage{geometry}
\geometry{a4paper, margin=1in}

In addition to the above example, the geometry command can modify the paper size, and orientation. To change the size of the paper, use the example below:

\usepackage[a4paper, total={7in, 8in}]{geometry}

To change the page orientation, you need to add landscape to the geometry options as shown below:

\usepackage{geometery}
\geometry{a4paper, landscape, margin=1.5in
Landscape Orientation

Table of contents

By default, a LaTeX table of contents is titled “Contents”. There are times when you prefer to relabel the text to be “Table of Content”, change the vertical spacing between the ToC and your first section of chapter, or simply change the color of the text.

To change the text you add the following lines to your preamble, substitute english with your desired language :

\usepackage[english]{babel}
\addto\captionsenglish{
\renewcommand{\contentsname}
{\bfseries{Table of Contents}}}

To manipulate the virtual spacing between ToC and the list of figures, sections, and chapters, use the tocloft package. The two options used in this article are cftbeforesecskip and cftaftertoctitleskip.

The tocloft package provides means of controlling the typographic design of the ToC, List of Figures and List of Tables.

Tocloft package doucmentation

\usepackage{tocloft}
\setlength\ctfbeforesecskip{2pt}
\setlength\cftaftertoctitleskip{30pt}

cftbeforesecskip is the spacing between the sections in the ToC, while
cftaftertoctitleskip is the space between text “Table of Contents” and the first section in the ToC. The below image shows the differences between the default and the modified ToC.

Default ToC
Customized ToC

Borders

When using the package hyperref in your document, LaTeX section lists in the ToC and references including \url have a border, as shown in the images below.

To remove these borders, include the following in the preamble, In the previous section, “Table of Contents,” you will see that there are not any borders in the ToC.

\usepackage{hyperref}
\hypersetup{ pdfborder = {0 0 0}}

Title section

To modify the title section font, style, and/or color, use the package titlesec. In this example, you will change the font size, font style, and font color of the section, subsection, and subsubsection. First, add the following to the preamble.

\usepackage{titlesec}
\titleformat*{\section}{\Huge\bfseries\color{darkblue}}
\titleformat*{\subsection}{\huge\bfseries\color{darkblue}}
\titleformat*{\subsubsection}{\Large\bfseries\color{darkblue}}

Taking a closer look at the code, \titleformat*{\section} specifies the depth of section to use. The above example, uses up to the third depth. The {\Huge\bfseries\color{darkblue}} portion specifies the size of the font, font style and, font color

Page style

To customize the page headers and footers one of the packages, use fancyhdr. This example uses this package to modify the page style, header, and footer. The code below provides a brief description of what each option does.

\pagestyle{fancy} %for header to be on each page
\fancyhead[L]{} %keep left header blank
\fancyhead[C]{} %keep centre header blank
\fancyhead[R]{\leftmark} %add the section/chapter to the header right
\fancyfoot[L]{Static Content} %add static test to the left footer
\fancyfoot[C]{} %keep centre footer blank
\fancyfoot[R]{\thepage} %add the page number to the right footer
\setlength\voffset{-0.25in} %space between page border and header (1in + space)
\setlength\headheight{12pt} %height of the actual header.
\setlength\headsep{25pt} %separation between header and text.
\renewcommand{\headrulewidth}{2pt} % add header horizontal line
\renewcommand{\footrulewidth}{1pt} % add footer horizontal line

The results of this change are shown below:

Header
Footer

Tips

Centralize the preamble

If write many TeX documents, you can create a .tex file with all your preamble based on your document categories and reference this file. For example, I use a structure.tex as shown below.

$ cat article_structure.tex
\usepackage[english]{babel}
\addto\captionsenglish{
\renewcommand{\contentsname}
{\bfseries{\color{darkblue}Table of Contents}}
} % Relable the contents
%\usepackage[margin=0.5in]{geometry} % specifies the margin of the document
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{graphicx} % allows you to add graphics to the document
\usepackage{hyperref} % permits redirection of URL from a PDF document
\usepackage{fullpage} % formate the content to utilise the full page
%\usepackage{a4wide}
\usepackage[export]{adjustbox} % to force image position
%\usepackage[section]{placeins} % to have multiple images in a figure
\usepackage{tabularx} % for wrapping text in a table
%\usepackage{rotating}
\usepackage{multirow}
\usepackage{subcaption} % to have multiple images in a figure
%\usepackage{smartdiagram} % initialise smart diagrams
\usepackage{enumitem} % to manage the spacing between lists and enumeration
\usepackage{fancyhdr} %, graphicx} %for header to be on each page
\pagestyle{fancy} %for header to be on each page
%\fancyhf{}
\fancyhead[L]{}
\fancyhead[C]{}
\fancyhead[R]{\leftmark}
\fancyfoot[L]{Static Content} %\includegraphics[width=0.02\textwidth]{virgin_voyages.png}}
\fancyfoot[C]{} % clear center
\fancyfoot[R]{\thepage}
\setlength\voffset{-0.25in} %Space between page border and header (1in + space)
\setlength\headheight{12pt} %Height of the actual header.
\setlength\headsep{25pt} %Separation between header and text.
\renewcommand{\headrulewidth}{2pt} % adds horizontal line
\renewcommand{\footrulewidth}{1pt} % add horizontal line (footer)
%\renewcommand{\oddsidemargin}{2pt} % adjuct the margin spacing
%\renewcommand{\pagenumbering}{roman} % change the numbering style
%\renewcommand{\hoffset}{20pt}
%\usepackage{color}
\usepackage[table]{xcolor}
\hypersetup{ pdfborder = {0 0 0}} % removes the red boarder from the table of content
%\usepackage{wasysym} %add checkbox
%\newcommand\insq[1]{%
% \Square\ #1\quad%
%} % specify the command to add checkbox
%\usepackage{xcolor}
%\usepackage{colortbl}
%\definecolor{Gray}{gray}{0.9} % create new colour
%\definecolor{LightCyan}{rgb}{0.88,1,1} % create new colour
%\usepackage[first=0,last=9]{lcg}
%\newcommand{\ra}{\rand0.\arabic{rand}}
%\newcolumntype{g}{>{\columncolor{LightCyan}}c} % create new column type g
%\usesmartdiagramlibrary{additions}
%\setcounter{figure}{0}
\setcounter{secnumdepth}{0} % sections are level 1
\usepackage{csquotes} % the proper was of using double quotes
%\usepackage{draftwatermark} % Enable watermark
%\SetWatermarkText{DRAFT} % Specify watermark text
%\SetWatermarkScale{5} % Toggle watermark size
\usepackage{listings} % add code blocks
\usepackage{titlesec} % Manipulate section/subsection
\titleformat{\section}{\Huge\bfseries\color{darkblue}} % update sections to bold with the colour blue \titleformat{\subsection}{\huge\bfseries\color{darkblue}} % update subsections to bold with the colour blue
\titleformat*{\subsubsection}{\Large\bfseries\color{darkblue}} % update subsubsections to bold with the colour blue
\usepackage[toc]{appendix} % Include appendix in TOC
\usepackage{xcolor}
\usepackage{tocloft} % For manipulating Table of Content virtical spacing
%\setlength\cftparskip{-2pt}
\setlength\cftbeforesecskip{2pt} %spacing between the sections
\setlength\cftaftertoctitleskip{30pt} % space between the first section and the text ``Table of Contents''
\definecolor{navyblue}{rgb}{0.0,0.0,0.5}
\definecolor{zaffre}{rgb}{0.0, 0.08, 0.66}
\definecolor{white}{rgb}{1.0, 1.0, 1.0}
\definecolor{darkblue}{rgb}{0.0, 0.2, 0.6}
\definecolor{darkgray}{rgb}{0.66, 0.66, 0.66}
\definecolor{lightgray}{rgb}{0.83, 0.83, 0.83}
%\pagenumbering{roman}

In your articles, refer to the structure.tex file as shown in the example below:

\documentclass[a4paper,11pt]{article}
\input{/path_to_structure.tex}}
\begin{document}
…...
\end{document}

Add watermarks

To enable watermarks in your LaTeX document, use the draftwatermark package. The below code snippet and image demonstrates the how to add a watermark to your document. By default the watermark color is grey which can be modified to your desired color.

\usepackage{draftwatermark} \SetWatermarkText{\color{red}Classified} %add watermark text \SetWatermarkScale{4} %specify the size of the text

Conclusion

In this series you saw some of the basic, but rich features that LaTeX provides for customizing your document to cater to your needs or the audience the document will be presented to. With LaTeX, there are many packages available to customize the page layout, style, and more.

Posted on Leave a comment

SCP user’s migration guide to rsync

As part of the 8.0 pre-release announcement, the OpenSSH project stated that they consider the scp protocol outdated, inflexible, and not readily fixed. They then go on to recommend the use of sftp or rsync for file transfer instead.

Many users grew up on the scp command, however, and so are not familiar with rsync. Additionally, rsync can do much more than just copy files, which can give a beginner the impression that it’s complicated and opaque. Especially when broadly the scp flags map directly to the cp flags while the rsync flags do not.

This article will provide an introduction and transition guide for anyone familiar with scp. Let’s jump into the most common scenarios: Copying Files and Copying Directories.

Copying files

For copying a single file, the scp and rsync commands are effectively equivalent. Let’s say you need to ship foo.txt to your home directory on a server named server.

$ scp foo.txt me@server:/home/me/

The equivalent rsync command requires only that you type rsync instead of scp:

$ rsync foo.txt me@server:/home/me/

Copying directories

For copying directories, things do diverge quite a bit and probably explains why rsync is seen as more complex than scp. If you want to copy the directory bar to server the corresponding scp command looks exactly like the cp command except for specifying ssh information:

$ scp -r bar/ me@server:/home/me/

With rsync, there are more considerations, as it’s a more powerful tool. First, let’s look at the simplest form:

$ rsync -r bar/ me@server:/home/me/

Looks simple right? For the simple case of a directory that contains only directories and regular files, this will work. However, rsync cares a lot about sending files exactly as they are on the host system. Let’s create a slightly more complex, but not uncommon, example.

# Create a multi-level directory structure
$ mkdir -p bar/baz
# Create a file at the root directory
$ touch bar/foo.txt
# Now create a symlink which points back up to this file
$ cd bar/baz
$ ln -s ../foo.txt link.txt
# Return to our original location
$ cd -

We now have a directory tree that looks like the following:

bar
├── baz
│   └── link.txt -> ../foo.txt
└── foo.txt 1 directory, 2 files

If we try the commands from above to copy bar, we’ll notice very different (and surprising) results. First, let’s give scp a go:

$ scp -r bar/ me@server:/home/me/

If you ssh into your server and look at the directory tree of bar you’ll notice an important and subtle difference from your host system:

bar
├── baz
│   └── link.txt
└── foo.txt 1 directory, 2 files

Note that link.txt is no longer a symlink. It is now a full-blown copy of foo.txt. This might be surprising behavior if you’re used to cp. If you did try to copy the bar directory using cp -r, you would get a new directory with the exact symlinks that bar had. Now if we try the same rsync command from before we’ll get a warning:

$ rsync -r bar/ me@server:/home/me/
skipping non-regular file "bar/baz/link.txt"

Rsync has warned us that it found a non-regular file and is skipping it. Because you didn’t tell it to copy symlinks, it’s ignoring them. Rsync has an extensive manual section titled “SYMBOLIC LINKS” that explains all of the possible behavior options available to you. For our example, we need to add the –links flag.

$ rsync -r --links bar/ me@server:/home/me/

On the remote server we see that the symlink was copied over as a symlink. Note that this is different from how scp copied the symlink.

bar/
├── baz
│   └── link.txt -> ../foo.txt
└── foo.txt 1 directory, 2 files

To save some typing and take advantage of more file-preserving options, use the –archive (-a for short) flag whenever copying a directory. The archive flag will do what most people expect as it enables recursive copy, symlink copy, and many other options.

$ rsync -a bar/ me@server:/home/me/

The rsync man page has in-depth explanations of what the archive flag enables if you’re curious.

Caveats

There is one caveat, however, to using rsync. It’s much easier to specify a non-standard ssh port with scp than with rsync. If server was using port 8022 SSH connections, for instance, then those commands would look like this:

$ scp -P 8022 foo.txt me@server:/home/me/

With rsync, you have to specify the “remote shell” command to use. This defaults to ssh. You do so using the -e flag.

$ rsync -e 'ssh -p 8022' foo.txt me@server:/home/me/

Rsync does use your ssh config; however, so if you are connecting to this server frequently, you can add the following snippet to your ~/.ssh/config file. Then you no longer need to specify the port for the rsync or ssh commands!

Host server Port 8022

Alternatively, if every server you connect to runs on the same non-standard port, you can configure the RSYNC_RSH environment variable.

Why else should you switch to rsync?

Now that we’ve covered the everyday use cases and caveats for switching from scp to rsync, let’s take some time to explore why you probably want to use rsync on its own merits. Many people have made the switch to rsync long before now on these merits alone.

In-flight compression

If you have a slow or otherwise limited network connection between you and your server, rsync can spend more CPU cycles to save network bandwidth. It does this by compressing data before sending it. Compression can be enabled with the -z flag.

Delta transfers

Rsync also only copies a file if the target file is different than the source file. This works recursively through directories. For instance, if you took our final bar example above and re-ran that rsync command multiple times, it would do no work after the initial transfer. Using rsync even for local copies is worth it if you know you will repeat them, such as backing up to a USB drive, for this feature alone as it can save a lot of time with large data sets.

Syncing

As the name implies, rsync can do more than just copy data. So far, we’ve only demonstrated how to copy files with rsync. If you instead want rsync to make the target directory look like your source directory, you can add the –delete flag to rsync. The delete flag makes it so rsync will copy files from the source directory which don’t exist on the target directory. Then it will remove files on the target directory which do not exist in the source directory. The result is the target directory is identical to the source directory. By contrast, scp will only ever add files to the target directory.

Conclusion

For simple use cases, rsync is not significantly more complicated than the venerable scp tool. The only significant difference being the use of -a instead of -r for recursive copying of directories. However, as we saw rsync’s -a flag behaves more like cp’s -r flag than scp’s -r flag does.

Hopefully, with these new commands, you can speed up your file transfer workflow!

Posted on Leave a comment

Spam Classification with ML-Pack

Introduction

ML-Pack is a small footprint C++ machine learning library that can be easily integrated into other programs. It is an actively developed open source project and released under a BSD-3 license. Machine learning has gained popularity due to the large amount of electronic data that can be collected. Some other popular machine learning frameworks include TensorFlow, MxNet, PyTorch, Chainer and Paddle Paddle, however these are designed for more complex workflows than ML-Pack. On Fedora, ML-Pack is packaged by its lead developer Ryan Curtin. In addition to a command line interface, ML-Pack has bindings for Python and Julia. Here, we will focus on the command line interface since this may be useful for system administrators to integrate into their workflows.

Installation

You can install ML-Pack on the Fedora command line using

$ sudo dnf -y install mlpack mlpack-bin

You can also install the documentation, development headers and Python bindings by using …

$ sudo dnf -y install mlpack-doc \
mlpack-devel mlpack-python3

though they will not be used in this introduction.

Example

As an example, we will train a machine learning model to classify spam SMS messages. To keep this article brief, linux commands will not be fully explained, but you can find out more about them by using the man command, for example for the command first command used below, wget

$ man wget

will give you information that wget will download files from the web and options you can use for it.

Get a dataset

We will use an example spam dataset in Indonesian provided by Yudi Wibisono

 
$ wget https://drive.google.com/file/d/1-stKadfTgJLtYsHWqXhGO3nTjKVFxm_Q/view
$ unzip dataset_sms_spam_bhs_indonesia_v1.zip

Pre-process dataset

We will try to classify a message as spam or ham by the number of occurrences of a word in a message. We first change the file line endings, remove line 243 which is missing a label and then remove the header from the dataset. Then, we split our data into two files, labels and messages. Since the labels are at the end of the message, the message is reversed and then the label removed and placed in one file. The message is then removed and placed in another file.

$ tr 'r' 'n' < dataset_sms_spam_v1.csv > dataset.txt
$ sed '243d' dataset.txt > dataset1.csv
$ sed '1d' dataset1.csv > dataset.csv
$ rev dataset.csv | cut -c1 | rev > labels.txt
$ rev dataset.csv | cut -c2- | rev > messages.txt
$ rm dataset.csv
$ rm dataset1.csv
$ rm dataset.txt

Machine learning works on numeric data, so we will use labels of 1 for ham and 0 for spam. The dataset contains three labels, 0, normal sms (ham), 1, fraud (spam), and 2 promotion (spam). We will label all spam as 1, so promotions and fraud will be labelled as 1.

$ tr '2' '1' < labels.txt > labels.csv
$ rm labels.txt

The next step is to convert all text in the messages to lower case and for simplicity remove punctuation and any symbols that are not spaces, line endings or in the range a-z (one would need expand this range of symbols for production use)

$ tr '[:upper:]' '[:lower:]' < \
messages.txt > messagesLower.txt
$ tr -Cd 'abcdefghijklmnopqrstuvwxyz n' < \ messagesLower.txt > messagesLetters.txt
$ rm messagesLower.txt

We now obtain a sorted list of unique words used (this step may take a few minutes, so use nice to give it a low priority while you continue with other tasks on your computer).

$ nice -20 xargs -n1 < messagesLetters.txt > temp.txt
$ sort temp.txt > temp2.txt
$ uniq temp2.txt > words.txt
$ rm temp.txt
$ rm temp2.txt

We then create a matrix, where for each message, the frequency of word occurrences is counted (more on this on Wikipedia, here and here). This requires a few lines of code, so the full script, which should be saved as ‘makematrix.sh’ is below

#!/bin/bash
declare -a words=()
declare -a letterstartind=()
declare -a letterstart=()
letter=" "
i=0
lettercount=0
while IFS= read -r line; do labels[$((i))]=$line let "i++"
done < labels.csv
i=0
while IFS= read -r line; do words[$((i))]=$line firstletter="$( echo $line | head -c 1 )" if [ "$firstletter" != "$letter" ] then letterstartind[$((lettercount))]=$((i)) letterstart[$((lettercount))]=$firstletter letter=$firstletter let "lettercount++" fi let "i++"
done < words.txt
letterstartind[$((lettercount))]=$((i))
echo "Created list of letters" touch wordfrequency.txt
rm wordfrequency.txt
touch wordfrequency.txt
messagecount=0
messagenum=0
messages="$( wc -l messages.txt )"
i=0
while IFS= read -r line; do let "messagenum++" declare -a wordcount=() declare -a wordarray=() read -r -a wordarray <<> wordfrequency.txt echo "Processed message ""$messagenum" let "i++"
done < messagesLetters.txt
# Create csv file
tr ' ' ',' data.csv

Since Bash is an interpreted language, this simple implementation can take upto 30 minutes to complete. If using the above Bash script on your primary workstation, run it as a task with low priority so that you can continue with other work while you wait:

$ nice -20 bash makematrix.sh

Once the script has finished running, split the data into testing (30%) and training (70%) sets:

$ mlpack_preprocess_split \ --input_file data.csv \ --input_labels_file labels.csv \ --training_file train.data.csv \ --training_labels_file train.labels.csv \ --test_file test.data.csv \ --test_labels_file test.labels.csv \ --test_ratio 0.3 \ --verbose

Train a model

Now train a Logistic regression model:

$ mlpack_logistic_regression \
--training_file train.data.csv \
--labels_file train.labels.csv --lambda 0.1 \
--output_model_file lr_model.bin

Test the model

Finally we test our model by producing predictions,

$ mlpack_logistic_regression \
--input_model_file lr_model.bin \ --test_file test.data.csv \
--output_file lr_predictions.csv

and comparing the predictions with the exact results,

$ export incorrect=$(diff -U 0 lr_predictions.csv \
test.labels.csv | grep '^@@' | wc -l)
$ export tests=$(wc -l < lr_predictions.csv)
$ echo "scale=2; 100 * ( 1 - $((incorrect)) \
/ $((tests)))" | bc

This gives approximately 90% validation rate, similar to that obtained here.

The dataset is composed of approximately 50% spam messages, so the validation rates are quite good without doing much parameter tuning. In typical cases, datasets are unbalanced with many more entries in some categories than in others. In these cases a good validation rate can be obtained by mispredicting the class with a few entries. Thus to better evaluate these models, one can compare the number of misclassifications of spam, and the number of misclassifications of ham. Of particular importance in applications is the number of false positive spam results as these are typically not transmitted. The script below produces a confusion matrix which gives a better indication of misclassification. Save it as ‘confusion.sh’

#!/bin/bash
declare -a labels
declare -a lr
i=0
while IFS= read -r line; do labels[i]=$line let "i++"
done < test.labels.csv
i=0
while IFS= read -r line; do lr[i]=$line let "i++"
done < lr_predictions.csv
TruePositiveLR=0
FalsePositiveLR=0
TrueZerpLR=0
FalseZeroLR=0
Positive=0
Zero=0
for i in "${!labels[@]}"; do if [ "${labels[$i]}" == "1" ] then let "Positive++" if [ "${lr[$i]}" == "1" ] then let "TruePositiveLR++" else let "FalseZeroLR++" fi fi if [ "${labels[$i]}" == "0" ] then let "Zero++" if [ "${lr[$i]}" == "0" ] then let "TrueZeroLR++" else let "FalsePositiveLR++" fi fi done
echo "Logistic Regression"
echo "Total spam" $Positive
echo "Total ham" $Zero
echo "Confusion matrix"
echo " Predicted class"
echo " Ham | Spam "
echo " ---------------"
echo " Actual| Ham | " $TrueZeroLR "|" $FalseZeroLR
echo " class | Spam | " $FalsePositiveLR " |" $TruePositiveLR
echo ""

then run the script

$ bash confusion.sh

You should get output similar to

Logistic Regression
Total spam 183
Total ham 159
Confusion matrix

    Predicted class
    Ham Spam
Actual class Ham 128 26
Spam 31 157

which indicates a reasonable level of classification. Other methods you can try in ML-Pack for this problem include Naive Bayes, random forest, decision tree, AdaBoost and perceptron.

To improve the error rating, you can try other pre-processing methods on the initial data set. Neural networks can give upto 99.95% validation rates, see for example here, here and here. However, using these techniques with ML-Pack cannot be done on the command line interface at present and is best covered in another post.

For more on ML-Pack, please see the documentation.

Posted on Leave a comment

How to configure an SSH proxy server with Squid

Sometimes you can’t connect to an SSH server from your current location. Other times, you may want to add an extra layer of security to your SSH connection. In these cases connecting to another SSH server via a proxy server is one way to get through.

Squid is a full-featured proxy server application that provides caching and proxy services. It’s normally used to help improve response times and reduce network bandwidth by reusing and caching previously requested web pages during browsing.

However for this setup you’ll configure Squid to be used as an SSH proxy server since it’s a robust trusted proxy server that is easy to configure.

Installation and configuration

Install the squid package using sudo:

$ sudo dnf install squid -y

The squid configuration file is quite extensive but there are only a few things we need to configure. Squid uses access control lists to manage connections.

Edit the /etc/squid/squid.conf file to make sure you have the two lines explained below.

First, specify your local IP network. The default configuration file already has a list of the most common ones but you will need to add yours if it’s not there. For example, if your local IP network range is 192.168.1.X, this is how the line would look:

acl localnet src 192.168.1.0/24

Next, add the SSH port as a safe port by adding the following line:

acl Safe_ports port 22

Save that file. Now enable and restart the squid proxy service:

$ sudo systemctl enable squid
$ sudo systemctl restart squid

4.) By default squid proxy listens on port 3128. Configure firewalld to allow for this:

$ sudo firewall-cmd --add-service=squid --perm
$ sudo firewall-cmd --reload

Testing the ssh proxy connection

To connect to a server via ssh through a proxy server we’ll be using netcat.

Install nmap-ncat if it’s not already installed:

$ sudo dnf install nmap-ncat -y

Here is an example of a standard ssh connection:

$ ssh user@example.com

Here is how you would connect to that same server using the squid proxy server as a gateway.

This example assumes the squid proxy server’s IP address is 192.168.1.63. You can also use the host-name or the FQDN of the squid proxy server:

$ ssh user@example.com -o "ProxyCommand nc --proxy 192.168.1.63:3128 %h %p"

Here are the meanings of the options:

  • ProxyCommand – Tells ssh a proxy command is going to be used.
  • nc – The command used to establish the connection to the proxy server. This is the netcat command.
  • %h – The placeholder for the proxy server’s host-name or IP address.
  • %p – The placeholder for the proxy server’s port number.

There are many ways to configure an SSH proxy server but this is a simple way to get started.

Posted on Leave a comment

Fedora Classroom Session: Git 101 with Pagure

The Fedora Classroom is a project to help people by spreading knowledge on subjects related to Fedora for others, If you would like to propose a session, feel free to open a ticket here with the tag classroom. If you’re interested in taking a proposed session, kindly let us know and once you take it, you will be awarded the Sensei Badge too as a token of appreciation. Recordings from the previous sessions can be found here.

We’re back with another awesome classroom on Git 101 with Pagure led by Akashdeep Dhar (t0xic0der).

About the session

In short, the Git 101 with Pagure session will be a guide for newcomers on how to get started with Git with the git forge Pagure used by the Fedora community. After finishing the session you will have the knowledge to manage Git and Pagure and generate the first contributions on the Fedora Project.

When and where

The Classroom session will be organized on Jul 17th, 17:00 UTC. Here’s a link to see what time it is in your timezone. The session will be streamed on Fedora Project’s YouTube channel.

Topics covered in the session

  • Version Control Systems
  • Why Git?
  • VCS Hosting Sites
  • Fedora Pagure
  • Exploring Pagure
  • Git Fundamentals

About the instructor

Akashdeep Dhar is a cybersecurity enthusiast with keen interests in networking, cloud computing and operating systems. He is currently in the final year of his computer science major with cybersecurity minor bachelor degree. He has over five years of experience in using GNU/Linux systems and is new to the Fedora community with contributions made so far in infrastructure, classroom and documentation.

If you miss the session, the recording will also be uploaded in the Fedora Project‘s YouTube channel.

We hope you can attend and enjoy this experience from some of the awesome people that work in Fedora Project. We look forward to seeing you in the Classroom session.


Photograph used in feature image is San Simeon School House by Anita RitenourCC-BY 2.0.

Posted on Leave a comment

Automating Network Devices with Ansible

Ansible is a great automation tool for system and network engineers, with Ansible we can automate small network to a large scale enterprise network. I have been using Ansible to automate both Aruba, and Cisco switches from my Fedora powered laptops for a couple of years. This article covers the requirements and executing a couple of playbooks.

Configuring Ansible

If Ansible is not installed, it can be installed using the command below

$ sudo dnf -y install ansible

Once installed, create a folder in your home directory or a directory of your preference and copy the ansible configuration file. For this demonstration, I will be using the following.

$ mkdir -pv /home/$USER/network_automation
$ sudo cp -v /etc/ansible.cfg /home/$USER/network_automation
$ cd /home/$USER/network_automation
$ sudo chown $USER.$USER && chmod 0600 ansible.cfg

To prevent lengthy commands from failing, edit the ansible.cfg and append the following lines. We must add the persistent connection and set the desired time in seconds for the command_timeout as demonstrated below. A use case where this is useful is when you are performing backups of a network device that has a lengthy configuration.

$ vim ansible.cfg
[persistent_connection]
command_timeout = 300
connection_timeout = 30

Requirements

If SELinux is enabled, you will need to install SELinux binding, which is required when using the copy module.

# Install SELinux bindings
dnf -y install python3-libselinux python3-libsemanage

Creating the inventory

The inventory holds the names of the network assets, and grouping of the assets are in square brackets [], below is a  sample inventory.

[site_a]
Core_A ansible_host=192.168.122.200
Distro_A ansible_host=192.168.122.201
Distro_B ansible_host=192.168.122.202

Group vars can be used to address the common variables, for example, credentials, network operating system, and so on. Ansible document on inventory provides additional details.

Playbook

Playbooks are Ansible’s configuration, deployment, and orchestration language. They can describe a policy you want your remote systems to enforce, or a set of steps in a general IT process.
Ansible Playbook

Read Operations

Let us create a simple playbook to run a show command to read the configuration on a few switches.

 
  1 ---
  2 - name: Basic Playbook
  3   hosts: site_a
  4   connection: local
  5
  6   tasks:
  7   - name: Get Interface Brief
  8     ios_command:
  9       commands:
 10         - show ip interface brief | e una
 11     register: interfaces
 12
 13   - name: Print results
 14     debug:
 15       msg: "{{ interfaces.stdout[0] }}
Without Debug

With Debug

The above images show the differences without and with the debug module respectively.

Let’s break the playbook into three blocks, starting with lines 1 to 4.

  • The three dashes/hyphens starts the YAML document
  • The hosts defines the hosts or host groups, multiple groups are comma-separated
  • Connection defines the methodology to connect to the network devices. Another option is network_cli (recommended method) and will be used later in this article. See IOS Platform Options for more details.

Lines 6 to 11 starts the tasks, we will be using ios_command and ios_config. This play will execute the show command show ip interface brief | e una and save the output from the command into the interfaces variable, with the register key.

Lines 13 to 15, by default, when you execute a show command you will not see the output, though this is not used during automation. It is very useful for debugging; therefore, the debug module was used.

The below video shows the execution of the playbook. There are a couple of ways you can execute the playbook.

  • Passing arguments to the command line, for example, include -u <username> -k to prompt for the remote user credentials
 
ansible-playbook -i inventory show_demo.yaml -u admin -k
  • Include the credentials in the host or group vars
 
ansible-playbook -i inventory show_demo.yaml

Never store passwords in plain text. We recommend using SSH keys to authenticate SSH connections. Ansible supports ssh-agent to manage your SSH keys. If you must use passwords to authenticate SSH connections, we recommend encrypting them with
Using Vault in Playbooks

Passing arguments to the command line
Credentials in the inventory

If we want to save the output to a file, we will use the copy module as shown in the playbook below. In addition to using the copy module, we will include the backup_dir variable to specify the directory path.

 
---
- name: Get System Infomation
  hosts: site_a
  connection: network_cli
  gather_facts: no
 
  vars:
    backup_dir: /home/eramirez/dev/ansible/fedora_magazine
 
  tasks:
  - name: get system interfaces
    ios_command:
      commands:
        - show ip int br | e una
    register: interface
   
  - name: Save result to disk
    copy:
      content: "{{ interface.stdout[0] }}"
      dest: "{{ backup_dir }}/{{ inventory_hostname }}.txt"

To demonstrate the use of variables in the inventory, we will use plain text. This method Must not be used in production.

 
[site_a]
Core_A ansible_host=192.168.122.200
Distro_A ansible_host=192.168.122.201
Distro_B ansible_host=192.168.122.202
[all:vars]
ansible_connection=network_cli
ansible_network_os=ios
ansible_user=admin
ansible_password=fedora
ansible_become=yes
ansible_become_password=yes
ansible_become_method=enable

Write Operations

In the previous section, we saw that we could get information from the network devices; in this section, we will write (add/modify) the configuration on these network devices. To make changes to the network device, we will be using the ios config module.

Let us create a playbook to configure a couple of interfaces in all of the network devices in site_a. We will first take a backup of the current configuration of all devices in site_a. Lastly, we will save the configuration.

 
---
- name: Get System Infomation
  hosts: site_a
  connection: network_cli
  gather_facts: no
 
  vars:
    backup_dir: /home/eramirez/dev/ansible/fedora_magazine
 
  tasks:
  - name: Backup configs
    ios_config:
      backup: yes
      backup_options:
        filename: "{{ inventory_hostname }}_running_cfg.txt"
        dir_path: "{{ backup_dir }}"
   
  - name: get system interfaces
    ios_config:
      lines:
        - description Raspberry Pi
        - switchport mode access
        - switchport access vlan 100
        - spanning-tree portfast
        - logging event link-status
        - no shutdown
      parents: "{{ item }}"
    with_items:
      - interface FastEthernet1/12
      - interface FastEthernet1/13
     
  - name: Save switch configuration
    ios_config:
      save_when: modified

Before we execute the playbook, we will first validate the interface configuration. We will then run the playbook and confirm the changes as illustrated below.

Conclusion

This article is a basic introduction to whet your appetite that demonstrates how Ansible is used to manage network devices. Ansible is capable of automating a vast network, which includes MPLS routing and performing validation before executing the next task.

Posted on Leave a comment

Use DNS over TLS

The Domain Name System (DNS) that modern computers use to find resources on the internet was designed 35 years ago without consideration for user privacy. It is exposed to security risks and attacks like DNS Hijacking. It also allows ISPs to intercept the queries.

Luckily, DNS over TLS and DNSSEC are available. DNS over TLS and DNSSEC allow safe and encrypted end-to-end tunnels to be created from a computer to its configured DNS servers. On Fedora, the steps to implement these technologies are easy and all the necessary tools are readily available.

This guide will demonstrate how to configure DNS over TLS on Fedora using systemd-resolved. Refer to the documentation for further information about the systemd-resolved service.

Step 1 : Set-up systemd-resolved

Modify /etc/systemd/resolved.conf so that it is similar to what is shown below. Be sure to enable DNS over TLS and to configure the IP addresses of the DNS servers you want to use.

$ cat /etc/systemd/resolved.conf
[Resolve]
DNS=1.1.1.1 9.9.9.9
DNSOverTLS=yes
DNSSEC=yes
FallbackDNS=8.8.8.8 1.0.0.1 8.8.4.4
#Domains=~.
#LLMNR=yes
#MulticastDNS=yes
#Cache=yes
#DNSStubListener=yes
#ReadEtcHosts=yes

A quick note about the options:

  • DNS: A space-separated list of IPv4 and IPv6 addresses to use as system DNS servers
  • FallbackDNS: A space-separated list of IPv4 and IPv6 addresses to use as the fallback DNS servers.
  • Domains: These domains are used as search suffixes when resolving single-label host names, ~. stand for use the system DNS server defined with DNS= preferably for all domains.
  • DNSOverTLS: If true all connections to the server will be encrypted. Note that this mode requires a DNS server that supports DNS-over-TLS and has a valid certificate for it’s IP.

NOTE: The DNS servers listed in the above example are my personal choices. You should decide which DNS servers you want to use; being mindful of whom you are asking IPs for internet navigation.

Step 2 : Tell NetworkManager to push info to systemd-resolved

Create a file in /etc/NetworkManager/conf.d named 10-dns-systemd-resolved.conf.

$ cat /etc/NetworkManager/conf.d/10-dns-systemd-resolved.conf
[main]
dns=systemd-resolved

The setting shown above (dns=systemd-resolved) will cause NetworkManager to push DNS information acquired from DHCP to the systemd-resolved service. This will override the DNS settings configured in Step 1. This is fine on a trusted network, but feel free to set dns=none instead to use the DNS servers configured in /etc/systemd/resolved.conf.

Step 3 : start & restart services

To make the settings configured in the previous steps take effect, start and enable systemd-resolved. Then restart NetworkManager.

CAUTION: This will lead to a loss of connection for a few seconds while NetworkManager is restarting.

$ sudo systemctl start systemd-resolved
$ sudo systemctl enable systemd-resolved
$ sudo systemctl restart NetworkManager

NOTE: Currently, the systemd-resolved service is disabled by default and its use is opt-in. There are plans to enable systemd-resolved by default in Fedora 33.

Step 4 : Check if everything is fine

Now you should be using DNS over TLS. Confirm this by checking DNS resolution status with:

$ resolvectl status
MulticastDNS setting: yes DNSOverTLS setting: yes DNSSEC setting: yes DNSSEC supported: yes Current DNS Server: 1.1.1.1 DNS Servers: 1.1.1.1 9.9.9.9 Fallback DNS Servers: 8.8.8.8 1.0.0.1 8.8.4.4

/etc/resolv.conf should point to 127.0.0.53

$ cat /etc/resolv.conf
# Generated by NetworkManager
search lan
nameserver 127.0.0.53

To see the address and port that systemd-resolved is sending and receiving secure queries on, run:

$ sudo ss -lntp | grep '\(State\|:53 \)'
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=10410,fd=18))

To make a secure query, run:

$ resolvectl query fedoraproject.org
fedoraproject.org: 8.43.85.67 -- link: wlp58s0 8.43.85.73 -- link: wlp58s0 [..] -- Information acquired via protocol DNS in 36.3ms.
-- Data is authenticated: yes

BONUS Step 5 : Use Wireshark to verify the configuration

First, install and run Wireshark:

$ sudo dnf install wireshark
$ sudo wireshark

It will ask you which link device it have to begin capturing packets on. In my case, because I use a wireless interface, I will go ahead with wlp58s0. Set up a filter in Wireshark like tcp.port == 853 (853 is the DNS over TLS protocol port). You need to flush the local DNS caches before you can capture a DNS query:

$ sudo resolvectl flush-caches

Now run:

$ nslookup fedoramagazine.org

You should see a TLS-encryped exchange between your computer and your configured DNS server:

Poster in Cover Image Approved for Release by NSA on 04-17-2018, FOIA Case # 83661

Posted on Leave a comment

Running Rosetta@home on a Raspberry Pi with Fedora IoT

The Rosetta@home project is a not-for-profit distributed computing project created by the Baker laboratory at the University of Washington. The project uses idle compute capacity from volunteer computers to study protein structure, which is used in research into diseases such as HIV, Malaria, Cancer, and Alzheimer’s.

In common with many other scientific organizations, Rosetta@home is currently expending significant resources on the search for vaccines and treatments for COVID-19.

Rosetta@home uses the open source BOINC platform to manage donated compute resources. BOINC was originally developed to support the SETI@home project searching for Extraterrestrial Intelligence. These days, it is used by a number of projects in many different scientific fields. A single BOINC client can contribute compute resources to many such projects, though not all projects support all architectures.

For the example shown in this article a Raspberry Pi 3 Model B was used, which is one of the tested reference devices for Fedora IoT. This device, with only 1GB of RAM, is only just powerful enough to be able to make a meaningful contribution to Rosetta@home, and there’s certainly no way the Raspberry Pi can be used for anything else – such as running a desktop environment – at the same time.

It’s also worth mentioning at this point that the first rule of Raspberry Pi computing is to get the recommended power supply. It is important to get as close to the specified 2.5A as you can, and use a good quality micro-usb cable.

Getting Fedora IoT

To install Fedora IoT on a Raspberry Pi, the first step is to download the aarch64 Raw Image from the iot.fedoraproject.org download page.

Then use the arm-image-installer utility (sudo dnf install fedora-arm-installer) to write the image to the SD card. As always, be very sure which device name corresponds to your SD Card before continuing. Check the device with the lsblk command like this:

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 1 59.5G 0 disk
└─sdb1 8:17 1 59.5G 0 part /run/media/gavin/154F-1CEC
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 600M 0 part
...

If you’re still not sure, try running lsblk with the SD card removed, then again with the SD card inserted and comparing the outputs. In this case it lists the SD card as /dev/sdb. If you’re really unsure, there are some more tips described in the Getting Started guide.

We need to tell arm-image-installer which image file to use, what type of device we’re going to be using, and the device name – determined above – to use for writing the image. The arm-image-installer utility is also able to expand the filesystem to use the entire SD card at the point of writing the image.

Since we’re not going to use the zezere provisioning server to deploy SSH keys to the Raspberry Pi, we need to specify the option to remove the root password so that we can log in and set it at first boot.

In my case, the full command was:

sudo arm-image-installer --image ~/Downloads/Fedora-IoT-32-20200603.0.aarch64.raw.xz --target=rpi3 --media=/dev/sdb --resizefs --norootpass

After a final confirmation prompt:

= Selected Image: = /var/home/gavin/Downloads/Fedora-IoT-32-20200603.0.aarc...
= Selected Media : /dev/sdb
= U-Boot Target : rpi3
= Root Password will be removed.
= Root partition will be resized
===================================================== *****************************************************
*****************************************************
******** WARNING! ALL DATA WILL BE DESTROYED ********
*****************************************************
***************************************************** Type 'YES' to proceed, anything else to exit now 

the image is written to the SD Card.

...
= Installation Complete! Insert into the rpi3 and boot.

Booting the Raspberry Pi

For the initial setup, you’ll need to attach a keyboard and mouse to the Raspberry Pi. Alternatively, you can follow the instructions for connecting with a USB-to-Serial cable.

When the Raspberry Pi boots up, just type root at the login prompt and press enter.

localhost login: root
[root@localhost~]#

The first task is to set a password for the root user.

[root@localhost~]# passwd
Changing password for user root.
New password: Retype new password:
passwd: all authentication tokens updated successfully
[root@localhost~]#

Verifying Network Connectivity

To verify the network connectivity, the checklist in the Fedora IoT Getting Started guide was followed. This system is using a wired ethernet connection, which shows as eth0. If you need to set up a wireless connection this can be done with nmcli.

ip addr will allow you to check that you have a valid IP address.

[root@localhost ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether b8:27:eb:9d:6e:13 brd ff:ff:ff:ff:ff:ff
inet 192.168.178.60/24 brd 192.168.178.255 scope global dynamic noprefixroute eth0
valid_lft 863928sec preferred_lft 863928sec
inet6 fe80::ba27:ebff:fe9d:6e13/64 scope link
valid_lft forever preferred_lft forever
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether fe:d3:c9:dc:54:25 brd ff:ff:ff:ff:ff:ff

ip route will check that the network has a default gateway configured.

[root@localhost ~]# ip route
default via 192.168.178.1 dev eth0 proto dhcp metric 100 192.168.178.0/24 dev eth0 proto kernel scope link src 192.168.178.60 metric 100 

To verify internet access and name resolution, use ping

[root@localhost ~]# ping -c3 iot.fedoraproject.org
PING wildcard.fedoraproject.org (8.43.85.67) 56(84) bytes of data.
64 bytes from proxy14.fedoraproject.org (8.43.85.67): icmp_seq=1 ttl=46 time=93.4 ms
64 bytes from proxy14.fedoraproject.org (8.43.85.67): icmp_seq=2 ttl=46 time=90.0 ms
64 bytes from proxy14.fedoraproject.org (8.43.85.67): icmp_seq=3 ttl=46 time=91.3 ms --- wildcard.fedoraproject.org ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 90.043/91.573/93.377/1.374 ms

Optional: Configuring sshd so we can disconnect the keyboard and monitor

Before disconnecting the keyboard and monitor, we need to ensure that we can connect to the Raspberry Pi over the network.

First we verify that sshd is running

[root@localhost~]# systemctl is-active sshd
active

and that there is a firewall rule present to allow ssh.

[root@localhost ~]# firewall-cmd --list-all
public (active) target: default icmp-block-inversion: no interfaces: eth0 sources: services: dhcpv6-client mdns ssh ports: protocols: masquerade: no forward-ports: source-ports: icmp-blocks: rich rules: 

In the file /etc/ssh/sshd_config, find the section named

# Authentication

and add the line

PermitRootLogin yes

There will already be a line

#PermitRootLogin prohibit-password

which you can edit by removing the # comment character and changing the value to yes.

Restart the sshd service to pick up the change

[root@localhost ~]# systemctl restart sshd

If all this is in place, we should be able to ssh to the Raspberry Pi.

[gavin@desktop ~]$ ssh root@192.168.178.60
The authenticity of host '192.168.178.60 (192.168.178.60)' can't be established.
ECDSA key fingerprint is SHA256:DLdFaYbvKhB6DG2lKmJxqY2mbrbX5HDRptzWMiAUgBM.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.178.60' (ECDSA) to the list of known hosts.
root@192.168.178.60's password: Boot Status is GREEN - Health Check SUCCESS
Last login: Wed Apr 1 17:24:50 2020
[root@localhost ~]#

It’s now safe to log out from the console (exit) and disconnect the keyboard and monitor.

Disabling unneeded services

Since we’re right on the lower limit of viable hardware for Rosetta@home, it’s worth disabling any unneeded services. Fedora IoT is much more lightweight than desktop distributions, but there are still a few optimizations we can do.

Like disabling bluetooth, Modem Manager (used for cellular data connections), WPA supplicant (used for Wi-Fi) and the zezere services, which are used to centrally manage a fleet of Fedora IoT devices.

[root@localhost /]# for serviceName in bluetooth ModemManager wpa_supplicant zezere_ignition zezere_ignition.timer zezere_ignition_banner; do sudo systemctl stop $serviceName; sudo systemctl disable $serviceName; sudo systemctl mask $serviceName; done

Getting the BOINC client

Instead of installing the BOINC client directly onto the operating system with rpm-ostree, we’re going to use podman to run the containerized version of the client.

This image uses a volume mount to store its data, so we create the directories it needs in advance.

[root@localhost ~]# mkdir -p /opt/appdata/boinc/slots /opt/appdata/boinc/locale

We also need to add a firewall rule to allow the container to resolve external DNS names.

[root@localhost ~]# firewall-cmd --permanent --zone=trusted --add-interface=cni-podman0 success [root@localhost ~]# systemctl restart firewalld

Finally we are ready to pull and run the BOINC client container.

[root@localhost ~]# podman run --name boinc -dt -p 31416:31416 -v /opt/appdata/boinc:/var/lib/boinc:Z -e BOINC_GUI_RPC_PASSWORD="blah" -e BOINC_CMD_LINE_OPTIONS="--allow_remote_gui_rpc" boinc/client:arm64v8 
Trying to pull...
...
... 787a26c34206e75449a7767c4ad0dd452ec25a501f719c2e63485479f...

We can inspect the container logs to make sure everything is working as expected:

[root@localhost ~]# podman logs boinc
20-Jun-2020 09:02:44 [---] cc_config.xml not found - using defaults
20-Jun-2020 09:02:44 [---] Starting BOINC client version 7.14.12 for aarch64-unknown-linux-gnu
...
...
...
20-Jun-2020 09:02:44 [---] Checking presence of 0 project files
20-Jun-2020 09:02:44 [---] This computer is not attached to any projects
20-Jun-2020 09:02:44 Initialization completed

Configuring the BOINC container to run at startup

We can automatically generate a systemd unit file for the container with podman generate systemd.

[root@localhost ~]# podman generate systemd --files --name boinc
/root/container-boinc.service

This creates a systemd unit file in root’s home directory.

[root@localhost ~]# cat container-boinc.service 
# container-boinc.service
# autogenerated by Podman 1.9.3
# Sat Jun 20 09:13:58 UTC 2020 [Unit]
Description=Podman container-boinc.service
Documentation=man:podman-generate-systemd(1)
Wants=network.target
After=network-online.target [Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
ExecStart=/usr/bin/podman start boinc
ExecStop=/usr/bin/podman stop -t 10 boinc
PIDFile=/var/run/containers/storage/overlay-containers/787a26c34206e75449a7767c4ad0dd452ec25a501f719c2e63485479fbe21631/userdata/conmon.pid
KillMode=none
Type=forking [Install]
WantedBy=multi-user.target default.target

We install the file by moving it to the appropriate directory.

[root@localhost ~]# mv -Z container-boinc.service /etc/systemd/system
[root@localhost ~]# systemctl enable /etc/systemd/system/container-boinc.service
Created symlink /etc/systemd/system/multi-user.target.wants/container-boinc.service → /etc/systemd/system/container-boinc.service.
Created symlink /etc/systemd/system/default.target.wants/container-boinc.service → /etc/systemd/system/container-boinc.service.

Connecting to the Rosetta Stone project

You need to create an account at the Rosetta@home signup page, and retrieve your account key from your account home page. The key to copy is the “Weak Account Key”.

Finally, we execute the boinccmd configuration utility inside the container using podman exec, passing the Rosetta@home url and our account key.

[root@localhost ~]# podman exec boinc boinccmd --project_attach https://boinc.bakerlab.org/rosetta/ 2160739_cadd20314e4ef804f1d95ce2862c8f73

Running podman logs –follow boinc will allow us to see the container connecting to the project. You will probably see errors of the form

20-Jun-2020 10:18:40 [Rosetta@home] Rosetta needs 1716.61 MB RAM but only 845.11 MB is available for use.

This is because most, but not all, of the work units in Rosetta@Home require more memory than we have to offer. However, if you leave the device running for a while, it should eventually get some jobs to process. The polling interval seems to be approximately 10 minutes. We can also tweak the memory settings using BOINC manager to allow BOINC to use slightly more memory. This will increase the probability that Rosetta@home will be able to find tasks for us.

Installing BOINC Manager for remote access

You can use dnf to install the BOINC manager component to remotely manage the BOINC client on the Raspberry Pi.

[gavin@desktop ~]$ sudo dnf install boinc-manager

If you switch to “Advanced View” , you will be able to select “File -> Select Computer” and connect to your Raspberry Pi, using the IP address of the Pi and the value supplied for BOINC_GUI_RPC_PASSWORD in the podman run command, in my case “blah“.

Press Shift+Ctrl+I to connect BOINC manager to a remote computer

Under “Options -> Computing Preferences”, increase the value for “When Computer is not in use, use at most _ %”. I’ve been using 93%; this seems to allow Rosetta@home to schedule work on the pi, whilst still leaving it just about usable. It is possible that further fine tuning of the operating system might allow this percentage to be increased.

Using the Computing Preferences Dialog to set the memory threshhold

These settings can also be changed through the Rosetta@home website settings page, but bear in mind that changes made through the BOINC Manager client override preferences set in the web interface.

Wait

It may take a while, possibly several hours, for Rosetta@home to send work to our newly installed client, particularly as most work units are too big to run on a Raspberry Pi. COVID-19 has resulted in a large number of new computers being joined to the Rosetta@home project, which means that there are times when there isn’t enough work to do.

When we are assigned some work units, BOINC will download several hundred megabytes of data. This will be stored on the SD Card and can be viewed using BOINC manager.

We can also see the tasks running in the Tasks pane:

The client has downloaded four tasks, but only one of them is currently running due to memory constraints. At times, two tasks can run simultaneously, but I haven’t seen more than that. This is OK as long as the tasks are completed by the deadline shown on the right. I’m fairly confident these will be completed as long as the Raspberry Pi is left running. I have found that the additional memory overhead created by the BOINC Manager connection and sshd services can reduce parallelism, so I try to disconnect these when I’m not using them.

Conclusion

Rosetta@home, in common with many other distributed computing projects, is currently experiencing a large spike in participation due to COVID-19. That aside, the project has been doing valuable work for many years to combat a number of other diseases.

Whilst a Raspberry Pi is never going to appear at the top of the contribution chart, I think this is a worthwhile project to undertake with a spare Raspberry Pi. The existence of work units aimed at low-spec ARM devices indicates that the project organizers agree with this sentiment. I’ll certainly be leaving mine running for the foreseeable future.

Posted on Leave a comment

Demonstrating Perl with Tic-Tac-Toe, Part 3

The articles in this series have mainly focused on Perl’s ability to manipulate text. Perl was designed to manipulate and analyze text. But Perl is capable of much more. More complex problems often require working with sets of data objects and indexing and comparing them in elaborate ways to compute some desired result.

For working with sets of data objects, Perl provides arrays and hashes. Hashes are also known as associative arrays or dictionaries. This article will prefer the term hash because it is shorter.

The remainder of this article builds on the previous articles in this series by demonstrating basic use of arrays and hashes in Perl.

An example Perl program

Copy and paste the below code into a plain text file and use the same one-liner that was provided in the the first article of this series to strip the leading numbers. Name the version without the line numbers chip2.pm and move it into the hal subdirectory. Use the version of the game that was provided in the second article so that the below chip will automatically load when placed in the hal subdirectory.

00 # advanced operations chip
01 02 package chip2;
03 require chip1;
04 05 use strict;
06 use warnings;
07 08 use constant SCORE=>'
09 ┌───┬───┬───┐
10 │ 3 │ 2 │ 3 │
11 ├───┼───┼───┤
12 │ 2 │ 4 │ 2 │
13 ├───┼───┼───┤
14 │ 3 │ 2 │ 3 │
15 └───┴───┴───┘
16 ';
17 18 sub get_prob {
19 my $game = shift;
20 my @nums;
21 my %odds;
22 23 while ($game =~ /[1-9]/g) {
24 $odds{$&} = substr(SCORE, $-[0], 1);
25 }
26 27 @nums = sort { $odds{$b} <=> $odds{$a} } keys %odds;
28 29 return $nums[0];
30 }
31 32 sub win_move {
33 my $game = shift;
34 my $mark = shift;
35 my $tkns = shift;
36 my @nums = $game =~ /[1-9]/g;
37 my $move;
38 39 TRY: for (@nums) {
40 my $num = $_;
41 my $try = $game =~ s/$num/$mark/r;
42 my $vic = chip1::get_victor $try, $tkns;
43 44 if (defined $vic) {
45 $move = $num;
46 last TRY;
47 }
48 }
49 50 return $move;
51 }
52 53 sub hal_move {
54 my $game = shift;
55 my $mark = shift;
56 my @mark = @{ shift; };
57 my $move;
58 59 $move = win_move $game, $mark, \@mark;
60 61 if (not defined $move) {
62 $mark = ($mark eq $mark[0]) ? $mark[1] : $mark[0];
63 $move = win_move $game, $mark, \@mark;
64 }
65 66 if (not defined $move) {
67 $move = get_prob $game;
68 }
69 70 return $move;
71 }
72 73 sub complain {
74 print "My mind is going. I can feel it.\n";
75 }
76 77 sub import {
78 no strict;
79 no warnings;
80 81 my $p = __PACKAGE__;
82 my $c = caller;
83 84 *{ $c . '::hal_move' } = \&{ $p . '::hal_move' };
85 *{ $c . '::complain' } = \&{ $p . '::complain' };
86 }
87 88 1;

How it works

In the above example Perl module, each position on the Tic-Tac-Toe board is assigned a score based on the number of winning combinations that intersect it. The center square is crossed by four winning combinations – one horizontal, one vertical, and two diagonal. The corner squares each intersect one horizontal, one vertical, and one diagonal combination. The side squares each intersect one horizontal and one vertical combination.

The get_prob subroutine creates a hash named odds (line 21) and uses it to map the numbers on the current game board to their score (line 24). The keys of the hash are then sorted by their score and the resulting list is copied to the nums array (line 27). The get_prob subroutine then returns the first element of the nums array ($nums[0]) which is the number from the original game board that has the highest score.

The algorithm described above is an example of what is called a heuristic in artificial intelligence programming. With the addition of this module, the Tic-Tac-Toe game can be considered a very rudimentary artificial intelligence program. It is really just playing the odds though and it is quite beatable. The next module (chip3.pm) will provide an algorithm that actually calculates the best possible move based on the opponent’s counter moves.

The win_move subroutine simply tries placing the provided mark in each available position and passing the resulting game board to chip1’s get_victor subroutine to see if it contains a winning combination. Notice that the r flag is being passed to the substitution operation (s/$num/$mark/r) on line 41 so that, rather than modifying the original game board, a new copy of the board containing the substitution is created and returned.

Arrays

It was mentioned in part one that arrays are variables whose names are prefixed with an at symbol (@) when they are created. In Perl, these prefixed symbols are called sigils.

Context

In Perl, many things return a different value depending on the context in which they are accessed. The two contexts to be aware of are called scalar context and list context. In the following example, $value1 and $value2 are different because @nums is accessed first in scalar context and then in list context.

$value1 = @nums;
($value2) = @nums;

In the above example, it might seem like @nums should return the same value each time it is accessed, but it doesn’t because what is accessing it (the context) is different. $value1 is a scalar, so it receives the scalar value of @nums which is its length. ($value2) is a list, so it receives the list value of @nums. In the above example, $value2 will receive the value of the first element of the nums array.

In part one, the below statement from the get_mark subroutine copied the numbers from the current Tic-Tac-Toe board into an array named nums.

@nums = $game =~ /[1-9]/g

Since the nums array in the above statement receives one copy of each board number in each of its elements, the count of the board numbers is equal to the length of the array. In Perl, the length of an array is obtained by accessing it in scalar context.

Next, the following formula was used to compute which mark should be placed on the Tic-Tac-Toe board in the next turn.

$indx = (@nums+1) % 2;

Because the plus operator requires a single value (a scalar) on its left hand side, not a list of values, the nums array evaluates to its length, not the list of its values. The parenthesis, in the above example, are just being used to set the order of operations so that the addition (+) will happen before the modulo (%).

Copying

In Perl you can create a list for immediate use by surrounding the list values with parenthesis and separating them with commas. The following example creates a three-element list and copies its values to an array.

@nums = (4, 5, 6);

As long as the elements of the list are variables and not constants, you can also copy the elements of an array to a list:

($four, $five, $six) = @nums;

If there were more elements in the array than the list in the above example, the extra elements would simply be discarded.

Different from lists in scalar context

Be aware that lists and arrays are different things in Perl. A list accessed in scalar context returns its last value, not its length. In the following example, $value3 receives 3 (the length of @nums) while $value4 receives 6 (the last element of the list).

$value3 = @nums;
$value4 = (4, 5, 6);

Indexing

To access an individual element of an array or list, suffix it with the desired index in square brackets as shown on line 29 of the above example Perl module.

Notice that the nums array on line 29 is prefixed with the dollar sigil ($) rather than the at sigil (@). This is done because the get_prob subroutine is supposed to return a single value, not a list. If @nums[0] were used instead of $nums[0], the subroutine would return a one-element list. Since a list evaluates to its last element in scalar context, this program would probably work if I had used @nums[0], but if you mean to retrieve a single element from an array, be sure to use the dollar sigil ($), not the at sigil (@).

It is possible to retrieve a subset from an array (or a list) rather than just one value in which case you would use the at sigil and you would provide a series of indexes or a range instead of a single index. This is what is known in Perl as a list slice.

Hashes

Hashes are variables whose names are prefixed with the percent sigil (%) when they are created. They are subscripted with curly brackets ({}) when accessing individual elements or subsets of elements (hash slices). Like arrays, hashes are variables that can hold multiple discrete data elements. They differ from arrays in the following ways:

  1. Hashes are indexed by strings (or anything that can be converted to a string), not numbers.
  2. Hashes are unordered. If you retrieve a list of their keys, values or key-value pairs, the order of the listing will be random.
  3. The number of elements in the hash will be equal to the number of keys that have been assigned values. If a value is assigned to index 99 of an array that has only three elements (indexes 0-2), the array will grow to a length of 100 elements (indexes 0-99). If a value is assigned to a new key in a hash that has only three elements, the hash will grow by only one element.

As with arrays, if you mean to access (or assign to) a single element of a hash, you should prefix it with the dollar sigil ($). When accessing a single element, Perl will go by the type of the subscript to determine the type of variable being accessed – curly brackets ({}) for hashes or square brackets ([]) for arrays. The get_prob subroutine in the above Perl module demonstrates assigning to and accessing individual elements of a hash.

Perl has two special built-in functions for working with hashes – keys and values. The keys function, when provided a hash, returns a list of all the hash’s keys (indexes). Similarly, the values function will return a list of all the hash’s values. Remember though that the order in which the list is returned is random. This randomness can be seen when playing the Tic-Tac-Toe game. If there is more than one move available with the highest score, the computer will chose one at random because the keys function returns the available moves from the odds hash in random order.

On line 27 of the above example Perl module, the keys function is being used to retrieve the list of keys from the odds hash. The keys of the odds hash are the numbers that were found on the current game board. The values of the odds hash are the corresponding probabilities that were retrieved from the SCORE constant on line 24.

Admittedly, this example could have used an array instead of a string to store and retrieve the scores. I chose to use a string simply because I think it presents the layout of the board a little nicer. An array would likely perform better, but with such a small data set, the difference is probably too small to measure.

Sort

On line 27, the list of keys from the odds hash is being feed to Perl’s built-in sort function. Beware that Perl’s sort function sorts lexicographically by default, not numerically. For example, provided the list (10, 9, 8, 1), Perl’s sort function will return the list (1, 10, 8, 9).

The behavior of Perl’s sort function can be modified by providing it a code block as its first parameter as demonstrated on line 27. The result of the last statement in the code block should be a number less-than, equal-to, or greater-than zero depending on whether element $a should be placed before, concurrent-with, or after element $b in the resulting list respectively. $a and $b are pairs of elements from the provided list. The code in the block is executed repeatedly with $a and $b set to different pairs of elements from the original list until all the pairs have been compared and sorted.

The <=> operator is a special Perl operator that returns -1, 0, or 1 depending on whether the left argument is numerically less-than, equal-to, or greater-than the right argument respectively. By using the <=> operator in the code block of the sort function, Perl’s sort function can be made to sort numerically rather than lexicographically.

Notice that rather than comparing $a and $b directly, they are first being passed through the odds hash. Since the values of the odds hash are the probabilities that were retrieved from the SCORE constant, what is being compared is actually the score of $a versus the score of $b. Consequently, the numbers from the original game board are being sorted by their score, not their value. Numbers with an equal score are left in the same random order that the keys function returned them.

Notice also that I have reversed the typical order of the parameters to <=> in the code block of the sort function ($b on the left and $a on the right). By switching their order in this way, I have caused the sort function to return the elements in reverse order – from greatest to least – so that the number(s) with the highest score will be first in the list.

References

References provide an indirect means of accessing a variable. They are often used when making copies of the variable is either undesirable or impractical. References are a sort of short cut that allows you to skip performing the copy and instead provide access to the original variable.

Why to use references

There is a cost in time and memory associated with making copies of variables. References are sometimes used as a means of reducing that cost. Be aware, however, that recent versions of Perl implement a technology called copy-on-write that greatly reduces the cost of copying variables. This new optimization should work transparently. You don’t have to do anything special to enable the copy-on-write optimization.

Why not to use references

References violate the action-at-a-distance principle that was mentioned in part one of this series. References are just as bad as global variables in terms of their tendency to trip up programmers by allowing data to be modified outside the local scope. You should generally try to avoid using references. But there are times when they are necessary.

How to create references

An example of passing a reference is provided on line 59 of the above Perl module. Rather than placing the mark array directly in the list of parameters to the win_move subroutine, a reference to the array is provided instead by prefixing the variable’s sigil with a backslash (\).

It is necessary to use a reference (\@mark) on line 59 because if the array were placed directly on the list, it would expand such that the first element of the mark array would become the third parameter to the win_move function, the second element of the mark array would become the fourth parameter to the win_move function, and so on for as many elements as the mark array has. Whereas an array will expand in list context, a reference will not. If the array were passed in expanded form, the receiving subroutine would need to call shift once for each element of the array. Also, the receiving function would not be able to tell how long the original array was.

Three ways to dereference references

In the receiving subroutine, the reference has to be dereferenced to get at the original values. An example of dereferencing an array reference is provided on line 56. On line 56, the shift statement has been enclosed in curly brackets and the opening bracket has been prefixed with the array sigil (@).

There is also a shorter form for dereferencing an array reference that is demonstrated on line 43 of the chip1.pm module. The short form allows you to omit the curly brackets and instead place the array sigil directly in front of the sigil of the scalar that holds the array reference. The short form only works when you have an array reference stored in a scalar. When the array reference is coming from a function, as it is on line 56 of the above Perl module, the long form must be used.

There is yet a third way of dereferencing an array reference that is demonstrated on line 29 of the game script. Line 29 shows the MARKS array reference being dereferenced with the arrow operator (->) and an index enclosed in square brackets. The MARKS array reference is missing its sigil because it is a constant. You can tell that what is being dereferenced is an array reference because the arrow operator is followed by square brackets ([]). Had the MARKS constant been a hash reference, the arrow operator would have been followed by curly brackets ({}).

There are also corresponding long and short forms for dereferencing hash references that use the hash sigil (%) instead of the array sigil. Note also that hashes, just like arrays, need to be passed by reference to subroutines unless you want them to expand into their constituent elements. The latter is sometimes done in Perl as a clever way of emulating named parameters.

A word of caution about references

It was stated earlier that references allow data to be modified outside of their declared scope and, just as with global variables, this non-local manipulation of the data can be confusing to the programmer(s) and thereby lead to unintended bugs. This is an important point to emphasize and explain.

On line 35 of the win_move subroutine, you can see that I did not dereference the provided array reference (\@mark) but rather I chose to store the reference in a scalar named tkns. I did this because I do not need to access the individual elements of the provided array in the win_move subroutine. I only need to pass the reference on to the get_victor subroutine. Not making a local copy of the array is a short cut, but it is dangerous. Because $tkns is only a copy of the reference, not a copy of the original data being referred to, if I or a later program developer were to write something like $tkns->[0] = ‘Y’ in the win_move subroutine, it would actually modify the value of the mark array in the hal_move subroutine. By passing a reference to its mark array (\@mark) to the win_move subroutine, the hal_move subroutine has granted access to modify its local copy of @mark. In this case, it would probably be better to make a local copy of the mark array in the win_move subroutine using syntax similar to what is shown on line 56 rather than preserving the reference as I have done for the purpose of demonstration on line 35.

Aliases

In addition to references, there is another way that a local variable created with the my or state keyword can leak into the scope of a called subroutine. The list of parameters that you provide to a subroutine is directly accessible in the @_ array.

To demonstrate, the following example script prints b, not a, because the inc subroutine accesses the first element of @_ directly rather than first making a local copy of the parameter.

#!/usr/bin/perl sub inc { $_[0]++;
} MAIN: { my $var = 'a'; inc $var; print "$var\n";
}

Aliases are different from references in that you don’t have to dereference them to get at their values. They really are just alternative names for the same variable. Be aware that aliases occur in a few other places as well. One such place is the list returned from the sort function – if you were to modify an element of the returned list directly, without first copying it to another variable, you would actually be modifying the element in the original list that was provided to the sort function. Other places where aliases occur include the code blocks of functions like grep and map. The grep and map functions are not covered in this series of articles. See the provided links if you want to know more about them.

Final notes

Many of Perl’s built-in functions will operate on the default scalar ($_) or default array (@_) if they are not explicitly provided a variable to read from or write to. Line 40 of the above Perl module provides an example. The numbers from the nums array are sequentially aliased to $_ by the for keyword. If you chose to use these variables, in most cases you will probably want to retrieve your data from $_ or @_ fairly quickly to prevent it being accidentally overwritten by a subsequent command.

The substitution command (s/…/…/), for example, will manipulate the data stored in $_ if it is not explicitly bound to another variable by one of the =~ or !~ operators. Likewise, the shift function operates on @_ (or @ARGV if called in the global scope) if it is not explicitly provided an array to operate on. There is no obvious rule to which functions support this shortcut. You will have to consult the documentation for the command you are interested in to see if it will operate on a default variable when not provided one explicitly.

As demonstrated on lines 55 and 56, the same name can be reused for variables of different types. Reusing variable names generally makes the code harder to follow. It is probably better for the sake of readability to avoid variable name reuse.

Beware that making copies of arrays or hashes in Perl (as demonstrated on line 56) is shallow by default. If any of the elements of the array or hash are references, the corresponding elements in the duplicated array or hash will be references to the same original data. To make deep copies of data structures, use one of the Clone or Storable Perl modules. An alternative workaround that may work in the case of multi-dimensional arrays is to emulate them with a one-dimensional hash.

Similar in form to Perl’s syntax for creating lists – (1, 2, 3) – unnamed array references and unnamed hash references can be constructed on the fly by bounding a comma-separated set of elements in square brackets ([]) or curly brackets ({}) respectively. Line 07 of the game script demonstrates an unnamed (anonymous) array reference being constructed and assigned to the MARKS constant.

Notice that the import subroutine at the end of the above Perl module (chip2.pm) is assigning to some of the same names in the calling namespace as the previous module (chip1.pm). This is intentional. The hal_move and complain aliases created by chip1’s import subroutine will simply be overridden by the identically named aliases created by chip2’s import subroutine (assuming chip2.pm is loaded after chip1.pm in the calling namespace). Only the aliases are updated/overridden. The original subroutines from chip1 will still exist and can still be called with their full names – chip1::hal_move and chip1::complain.