Posted on Leave a comment

TCP window scaling, timestamps and SACK

The Linux TCP stack has a myriad of sysctl knobs that allow to change its behavior.  This includes the amount of memory that can be used for receive or transmit operations, the maximum number of sockets and optional features and protocol extensions.

There are  multiple articles that recommend to disable TCP extensions, such as timestamps or selective acknowledgments (SACK) for various “performance tuning” or “security” reasons.

This article provides background on what these extensions do, why they
are enabled by default, how they relate to one another and why it is normally a bad idea to turn them off.

TCP Window scaling

The data transmission rate that TCP can sustain is limited by several factors. Some of these are:

  • Round trip time (RTT).  This is the time it takes for a packet to get to the destination and a reply to come back. Lower is better.
  • lowest link speed of the network paths involved
  • frequency of packet loss
  • the speed at which new data can be made available for transmission
    For example, the CPU needs to be able to pass data to the network adapter fast enough. If the CPU needs to encrypt the data first, the adapter might have to wait for new data. In similar fashion disk storage can be a bottleneck if it can’t read the data fast enough.
  • The maximum possible size of the TCP receive window. The receive window determines how much data (in bytes) TCP can transmit before it has to wait for the receiver to report reception of that data. This is announced by the receiver. The receiver will constantly update this value as it reads and acknowledges reception of the incoming data. The receive windows current value is contained in the TCP header that is part of every segment sent by TCP. The sender is thus aware of the current receive window whenever it receives an acknowledgment from the peer. This means that the higher the round-trip time, the longer it takes for sender to get receive window updates.

TCP is limited to at most 64 kilobytes of unacknowledged (in-flight) data. This is not even close to what is needed to sustain a decent data rate in most networking scenarios. Let us look at some examples.

Theoretical data rate

With a round-trip-time of 100 milliseconds, TCP can transfer at most 640 kilobytes per second. With a 1 second delay, the maximum theoretical data rate drops down to only 64 kilobytes per second.

This is because of the receive window. Once 64kbyte of data have been sent the receive window is already full.  The sender must wait until the peer informs it that at least some of the data has been read by the application. 

The first segment sent reduces the TCP window by the size of that segment. It takes one round-trip before an update of the receive window value will become available. When updates arrive with a 1 second delay, this results in a 64 kilobyte limit even if the link has plenty of bandwidth available.

In order to fully utilize a fast network with several milliseconds of delay, a window size larger than what classic TCP supports is a must. The ’64 kilobyte limit’ is an artifact of the protocols specification: The TCP header reserves only 16bits for the receive window size. This allows receive windows of up to 64KByte. When the TCP protocol was originally designed, this size was not seen as a limit.

Unfortunately, its not possible to just change the TCP header to support a larger maximum window value. Doing so would mean all implementations of TCP would have to be updated simultaneously or they wouldn’t understand one another anymore. To solve this, the interpretation of the receive window value is changed instead.

The ‘window scaling option’ allows to do this while keeping compatibility to existing implementations.

TCP Options: Backwards-compatible protocol extensions

TCP supports optional extensions. This allows to enhance the protocol with new features without the need to update all implementations at once. When a TCP initiator connects to the peer, it also send a list of supported extensions. All extensions follow the same format: an unique option number followed by the length of the option and the option data itself.

The TCP responder checks all the option numbers contained in the connection request. If it does not understand an option number it skips
‘length’ bytes of data and checks the next option number. The responder omits those it did not understand from the reply. This allows both the sender and receiver to learn the common set of supported options.

With window scaling, the option data always consist of a single number.

The window scaling option

 
Window Scale option (WSopt): Kind: 3, Length: 3
    +---------+---------+---------+
    | Kind=3  |Length=3 |shift.cnt|
    +---------+---------+---------+
         1         1         1

The window scaling option tells the peer that the receive window value found in the TCP header should be scaled by the given number to get the real size.

For example, a TCP initiator that announces a window scaling factor of 7 tries to instruct the responder that any future packets that carry a receive window value of 512 really announce a window of 65536 byte. This is an increase by a factor of 128. This would allow a maximum TCP Window of 8 Megabytes.

A TCP responder that does not understand this option ignores it. The TCP packet sent in reply to the connection request (the syn-ack) then does not contain the window scale option. In this case both sides can only use a 64k window size. Fortunately, almost every TCP stack supports and enables this option by default, including Linux.

The responder includes its own desired scaling factor. Both peers can use a different number. Its also legitimate to announce a scaling factor of 0. This means the peer should treat the receive window value it receives verbatim, but it allows scaled values in the reply direction — the recipient can then use a larger receive window.

Unlike SACK or TCP timestamps, the window scaling option only appears in the first two packets of a TCP connection, it cannot be changed afterwards. It is also not possible to determine the scaling factor by looking at a packet capture of a connection that does not contain the initial connection three-way handshake.

The largest supported scaling factor is 14. This allows TCP window sizes
of up to one Gigabyte.

Window scaling downsides

It can cause data corruption in very special cases. Before you disable the option – it is impossible under normal circumstances. There is also a solution in place that prevents this. Unfortunately, some people disable this solution without realizing the relationship with window scaling. First, let’s have a look at the actual problem that needs to be addressed. Imagine the following sequence of events:

  1. The sender transmits segments: s_1, s_2, s_3, … s_n
  2.  The receiver sees: s_1, s_3, .. s_n and sends an acknowledgment for s_1.
  3.  The sender considers s_2 lost and sends it a second time. It also sends new data contained in segment s_n+1.
  4.  The receiver then sees: s_2, s_n+1, s_2: the packet s_2 is received twice.

This can happen for example when a sender triggers re-transmission too early. Such erroneous re-transmits are never a problem in normal cases, even with window scaling. The receiver will just discard the duplicate.

Old data to new data

The TCP sequence number can be at most 4 Gigabyte. If it becomes larger than this, the sequence wraps back to 0 and then increases again. This is not a problem in itself, but if this occur fast enough then the above scenario can create an ambiguity.

If a wrap-around occurs at the right moment, the sequence number s_2 (the re-transmitted packet) can already be larger than s_n+1. Thus, in the last step (4), the receiver may interpret this as: s_2, s_n+1, s_n+m, i.e. it could view the ‘old’ packet s_2 as containing new data.

Normally, this won’t happen because a ‘wrap around’ occurs only every couple of seconds or minutes even on high bandwidth links. The interval between the original and a unneeded re-transmit will be a lot smaller.

For example,with a transmit speed of 50 Megabytes per second, a
duplicate needs to arrive more than one minute late for this to become a problem. The sequence numbers do not wrap fast enough for small delays to induce this problem.

Once TCP approaches ‘Gigabyte per second’ throughput rates, the sequence numbers can wrap so fast that even a delay by only a few milliseconds can create duplicates that TCP cannot detect anymore. By solving the problem of the too small receive window, TCP can now be used for network speeds that were impossible before – and that creates a new, albeit rare problem. To safely use Gigabytes/s speed in environments with very low RTT receivers must be able to detect such old duplicates without relying on the sequence number alone.

TCP time stamps

A best-before date

In the most simple terms, TCP timestamps just add a time stamp to the packets to resolve the ambiguity caused by very fast sequence number wrap around. If a segment appears to contain new data, but its timestamp is older than the last in-window packet, then the sequence number has wrapped and the ”new” packet is actually an older duplicate. This resolves the ambiguity of re-transmits even for extreme corner cases.

But this extension allows for more than just detection of old packets. The other major feature made possible by TCP timestamps are more precise round-trip time measurements (RTTm).

A need for precise round-trip-time estimation

When both peers support timestamps,  every TCP segment carries two additional numbers: a timestamp value and a timestamp echo.

 
TCP Timestamp option (TSopt): Kind: 8, Length: 10
+-------+----+----------------+-----------------+
|Kind=8 | 10 |TS Value (TSval)|EchoReply (TSecr)|
+-------+----+----------------+-----------------+
    1      1         4                4

An accurate RTT estimate is crucial for TCP performance. TCP automatically re-sends data that was not acknowledged. Re-transmission is triggered by a timer: If it expires, TCP considers one or more packets that it has not yet received an acknowledgment for to be lost. They are then sent again.

But “has not been acknowledged” does not mean the segment was lost. It is also possible that the receiver did not send an acknowledgment so far or that the acknowledgment is still in flight. This creates a dilemma: TCP must wait long enough for such slight delays to not matter, but it can’t wait for too long either.

Low versus high network delay

In networks with a high delay, if the timer fires too fast, TCP frequently wastes time and bandwidth with unneeded re-sends.

In networks with a low delay however,  waiting for too long causes reduced throughput when a real packet loss occurs. Therefore, the timer should expire sooner in low-delay networks than in those with a high delay. The tcp retransmit timeout therefore cannot use a fixed constant value as a timeout. It needs to adapt the value based on the delay that it experiences in the network.

Round-trip time measurement

TCP picks a retransmit timeout that is based on the expected round-trip time (RTT). The RTT is not known in advance. RTT is estimated by measuring the delta between the time a segment is sent and the time TCP receives an acknowledgment for the data carried by that segment.

This is complicated by several factors.

  • For performance reasons, TCP does not generate a new acknowledgment for every packet it receives. It waits  for a very small amount of time: If more segments arrive, their reception can be acknowledged with a single ACK packet. This is called “cumulative ACK”.
  •  The round-trip-time is not constant. This is because of a myriad of factors. For example, a client might be a mobile phone switching to different base stations as its moved around. Its also possible that packet switching takes longer when link or CPU utilization increases.
  • a packet that had to be re-sent must be ignored during computation. This is because the sender cannot tell if the ACK for the re-transmitted segment is acknowledging the original transmission (that arrived after all) or the re-transmission.

This last point is significant: When TCP is busy recovering from a loss, it may only receives ACKs for re-transmitted segments. It then can’t measure (update) the RTT during this recovery phase. As a consequence it can’t adjust the re-transmission timeout, which then keeps growing exponentially. That’s a pretty specific case (it assumes that other mechanisms such as fast retransmit or SACK did not help). Nevertheless, with TCP timestamps, RTT evaluation is done even in this case.

If the extension is used, the peer reads the timestamp value from the TCP segments extension space and stores it locally. It then places this value in all the segments it sends back as the “timestamp echo”.

Therefore the option carries two timestamps: Its senders own timestamp and the most recent timestamp it received from the peer. The “echo timestamp” is used by the original sender to compute the RTT. Its the delta between its current timestamp clock and what was reflected in the “timestamp echo”.

Other timestamp uses

TCP timestamps even have other uses beyond PAWS and RTT measurements. For example it becomes possible to detect if a retransmission was unnecessary. If the acknowledgment carries an older timestamp echo, the acknowledgment was for the initial packet, not the re-transmitted one.

Another, more obscure use case for TCP timestamps is related to the TCP syn cookie feature.

TCP connection establishment on server side

When connection requests arrive faster than a server application can accept the new incoming connection, the connection backlog will eventually reach its limit. This can occur because of a mis-configuration of the system or a bug in the application. It also happens when one or more clients send connection requests without reacting to the ‘syn ack’ response. This fills the connection queue with incomplete connections. It takes several seconds for these entries to time out. This is called a “syn flood attack”.

TCP timestamps and TCP syn cookies

Some TCP stacks allow to accept new connections even if the queue is full. When this happens, the Linux kernel will print a prominent message to the system log:

Possible SYN flooding on port P. Sending Cookies. Check SNMP counters.

This mechanism bypasses the connection queue entirely. The information that is normally stored in the connection queue is encoded into the SYN/ACK responses TCP sequence number. When the ACK comes back, the queue entry can be rebuilt from the sequence number.

The sequence number only has limited space to store information. Connections established using the ‘TCP syn cookie’ mechanism can not support TCP options for this reason.

The TCP options that are common to both peers can be stored in the timestamp, however. The ACK packet reflects the value back in the timestamp echo field which allows to recover the agreed-upon TCP options as well. Else, cookie-connections are restricted by the standard 64 kbyte receive window.

Common myths – timestamps are bad for performance

Unfortunately some guides recommend disabling TCP timestamps to reduce the number of times the kernel needs to access the timestamp clock to get the current time. This is not correct. As explained before, RTT estimation is a necessary part of TCP. For this reason, the kernel always takes a microsecond-resolution time stamp when a packet is received/sent.

Linux re-uses the clock timestamp taken for the RTT estimation for the remainder of the packet processing step. This also avoids the extra clock access to add a timestamp to an outgoing TCP packet.

The entire timestamp option only requires 10 bytes of TCP option space in each packet, this is not a significant decrease in space available for packet payload.

common myths – timestamps are a security problem

Some security audit tools and (older) blog posts recommend to disable TCP
timestamps because they allegedly leak system uptime: This would then allow to estimate the patch level of the system/kernel. This was true in the past: The timestamp clock is based on a constantly increasing value that starts at a fixed value on each system boot. A timestamp value would give a estimate as to how long the machine has been running (uptime).

As of Linux 4.12 TCP timestamps do not reveal the uptime anymore. All timestamp values sent use a peer-specific offset. Timestamp values also wrap every 49 days.

In other words, connections from or to address “A” see a different timestamp than connections to the remote address “B”.

Run sysctl net.ipv4.tcp_timestamps=2 to disable the randomization offset. This makes analyzing packet traces recorded by tools like wireshark or tcpdump easier – packets sent from the host then all have the same clock base in their TCP option timestamp.  For normal operation the default setting should be left as-is.

Selective Acknowledgments

TCP has problems if several packets in the same window of data are lost. This is because TCP Acknowledgments are cumulative, but only for packets
that arrived in-sequence. Example:

  • Sender transmits segments s_1, s_2, s_3, … s_n
  • Sender receives ACK for s_2
  • This means that both s_1 and s_2 were received and the
    sender no longer needs to keep these segments around.
  • Should s_3 be re-transmitted? What about s_4? s_n?

The sender waits for a “retransmission timeout” or ‘duplicate ACKs’ for s_2 to arrive. If a retransmit timeout occurs or several duplicate ACKs for s_2 arrive, the sender transmits s_3 again.

If the sender receives an acknowledgment for s_n, s_3 was the only missing packet. This is the ideal case. Only the single lost packet was re-sent.

If the sender receives an acknowledged segment that is smaller than s_n, for example s_4, that means that more than one packet was lost. The
sender needs to re-transmit the next segment as well.

Re-transmit strategies

Its possible to just repeat the same sequence: re-send the next packet until the receiver indicates it has processed all packet up to s_n. The problem with this approach is that it requires one RTT until the sender knows which packet it has to re-send next. While such strategy avoids unnecessary re-transmissions, it can take several seconds and more until TCP has re-sent the entire window of data.

The alternative is to re-send several packets at once. This approach allows TCP to recover more quickly when several packets have been lost. In the above example TCP re-send s_3, s_4, s_5, .. while it can only be sure that s_3 has been lost.

From a latency point of view, neither strategy is optimal. The first strategy is fast if only a single packet has to be re-sent, but takes too long when multiple packets were lost.

The second one is fast even if multiple packet have to be re-sent, but at the cost of wasting bandwidth. In addition, such a TCP sender could have transmitted new data already while it was doing the unneeded re-transmissions.

With the available information TCP cannot know which packets were lost. This is where TCP Selective Acknowledgments (SACK) come in. Just like window scaling and timestamps, it is another optional, yet very useful TCP feature.

The SACK option

 
   TCP Sack-Permitted Option: Kind: 4, Length 2
   +---------+---------+
   | Kind=4  | Length=2|
   +---------+---------+

A sender that supports this extension includes the “Sack Permitted” option in the connection request. If both endpoints support the extension, then a peer that detects a packet is missing in the data stream can inform the sender about this.

 
   TCP SACK Option: Kind: 5, Length: Variable
                     +--------+--------+
                     | Kind=5 | Length |
   +--------+--------+--------+--------+
   |      Left Edge of 1st Block       |
   +--------+--------+--------+--------+
   |      Right Edge of 1st Block      |
   +--------+--------+--------+--------+
   |                                   |
   /            . . .                  /
   |                                   |
   +--------+--------+--------+--------+
   |      Left Edge of nth Block       |
   +--------+--------+--------+--------+
   |      Right Edge of nth Block      |
   +--------+--------+--------+--------+

A receiver that encounters segment_s2 followed by s_5…s_n, it will include a SACK block when it sends the acknowledgment for s_2:

 
                +--------+-------+
                | Kind=5 |   10  |
+--------+------+--------+-------+
| Left edge: s_5                 |
+--------+--------+-------+------+
| Right edge: s_n                |
+--------+-------+-------+-------+

This tells the sender that segments up to s_2 arrived in-sequence, but it also lets the sender know that the segments s_5 to s_n were also received. The sender can then re-transmit these two packets and proceed to send new data.

The mythical lossless network

In theory SACK provides no advantage if the connection cannot experience packet loss. Or the connection has such a low latency that even waiting one full RTT does not matter.

In practice lossless behavior is virtually impossible to ensure.
Even if the network and all its switches and routers have ample bandwidth and buffer space packets can still be lost:

  • The host operating system might be under memory pressure and drop
    packets. Remember that a host might be handling tens of thousands of packet streams simultaneously.
  • The CPU might not be able to drain incoming packets from the network interface fast enough. This causes packet drops in the network adapter itself.
  • If TCP timestamps are not available even a connection with a very small RTT can stall momentarily during loss recovery.

Use of SACK does not increase the size of TCP packets unless a connection experiences packet loss. Because of this, there is hardly a reason to disable this feature. Almost all TCP stacks support SACK – it is typically only absent on low-power IOT-alike devices that are not doing TCP bulk data transfers.

When a Linux system accepts a connection from such a device, TCP automatically disables SACK for the affected connection.

Summary

The three TCP extensions examined in this post are all related to TCP performance and should best be left to the default setting: enabled.

The TCP handshake ensures that only extensions that are understood by both parties are used, so there is never a need to disable an extension globally just because a peer might not support it.

Turning these extensions off results in severe performance penalties, especially in case of TCP Window Scaling and SACK. TCP timestamps can be disabled without an immediate disadvantage, however there is no compelling reason to do so anymore. Keeping them enabled also makes it possible to support TCP options even when SYN cookies come into effect.

Posted on Leave a comment

Backup and restore Toolboxes

Toolboxes started life often described as disposable containers – and that is still one of their major uses: install stuff, then try it out in the relative safety of a container, and lastly, cleanly dispose of it. Minimal risk, fuss and without pesky residual libraries and applications hanging around on the host long after you have finished.

So — why would you backup a Toolbox? Sometimes, they have more permanent uses, contain complex and lengthy installs, or are being used for critical applications. For example, Toolboxes can be used as a development environment, containing hardware associated drivers and applications. Or they could be used for an application you want to run in a container for which there is no Flatpak, or one that has requirements a Flatpak doesn’t satisfy. While they can be handy to use on Fedora Workstation, toolbox containers are often essential for Silverblue users since they offer an easy solution to installing applications that can’t successfully be installed by rpm-ostree. Or for applications that may not have a Flatpak version readily available. In the above situations a busted Toolbox can be a major headache. But if a backup exists, you can quickly restore a Toolbox or move it to another workstation.

The backup process uses Podman to create an image of an existing toolbox container, and save that image to an archive file. To restore the toolbox container, load the image from the archive file and then create a Toolbox from that image. The new toolbox container will be an identical copy of your backed up toolbox container.

It is important to note this process does not backup data, just what you have installed in the toolbox container. This includes packages installed from repositories or from a local rpm file using dnf. If you need to backup data, Podman’s commit command that will be used to capture an image of the toolbox container, has an option to include volumes attached to the container.

Creating a backup

To backup a toolbox container you will need it’s name and container ID which can be gotten by using toolbox list. For this example I am going to backup my golang development toolbox container, imaginatively named go.

$ toolbox list CONTAINER ID CONTAINER NAME CREATED STATUS IMAGE NAME
00ff783a102f go 5 weeks ago exited registry.fedoraproject.org/f32/fedora-toolbox:32

If the container’s status shows as running , you should stop it using podman container stop container_name. Although the commit command has a -p for pause option, make sure that the Toolbox is not running, which helps it initialize correctly when restored from backup.

$ podman container stop go

To create an image of the toolbox container use

podman container commit -p container_ID backup-image-name

Depending on the complexity of the Toolbox, this can take a little while.

 $ podman container commit -p 00ff783a102f go-backup

Now to confirm the image has been created type…

$ toolbox list

You should get output similar to what is below…

IMAGE ID IMAGE NAME CREATED
cfcb13046db7 localhost/go-backup:latest About a minute ago CONTAINER ID CONTAINER NAME CREATED STATUS IMAGE NAME
00ff783a102f go 5 weeks ago exited registry.fedoraproject.org/f32/fedora-toolbox:32

Now to save the backup image to a tar archive file using podman save -o backup-filename.tar backup-image-name.

$ podman save -o go.tar go-backup

Confirm the archive file, our toolbox container backup, was created.

$ ls go.tar 

Do some tidying up, remove the backup image and, if needed, remove the original Toolbox.

$ podman rmi go-backup $ toolbox rm go

Restore a backup

To create an image from the backup file that was made above, you do it with the command podman load -i backup_filename.

$ podman load -i go.tar

Then you can confirm the image was created with…

$ toolbox list IMAGE ID IMAGE NAME CREATED
cfcb13046db7 localhost/go-backup:latest 17 minutes ago

Now create a toolbox container from the restored image, with toolbox create –container container_name ––image image_name, specifying the full repository and version tag as the image name.

$ toolbox create --container go --image localhost/go-backup:latest

Confirm that the toolbox was created.

$ toolbox list IMAGE ID IMAGE NAME CREATED
cfcb13046db7 localhost/go-backup:latest 20 minutes ago CONTAINER ID CONTAINER NAME CREATED STATUS IMAGE NAME
34cef6b7e28d go 21 seconds ago configured localhost/go-backup:latest

Finally, you can test that the restored Toolbox works…

$ toolbox enter --container go

If you can enter the newly created toolbox container, you will see the toolbox prompt and will have successfully backed up and restored your Pet toolbox container.

Posted on Leave a comment

LaTeX typesetting, Part 3: formatting

This series covers basic formatting in LaTeX. Part 1 introduced lists. Part 2 covered tables. In part 3, you will learn about another great feature of LaTeX: the flexibility of granular document formatting. This article covers customizing the page layout, table of contents, title sections, and page style.

Page dimension

When you first wrote your LaTeX document you may have noticed that the default margin is slightly bigger than you may imagine. The margins have to do with the type of paper you specified, for example, a4, letter, and the document class: article, book, report, and so on. To modify the page margins there are a few options, one of the simplest options is using the fullpage package.

This package sets the body of the page such that the page is almost full.

Fullpage package documentation

The illustration below demonstrates the LaTeX default body compared to using the fullpage package.

Another option is to use the geometry package. Before you explore how the geometry package can manipulate margins, first look at the page dimensions as depicted below.

  1. one inch + \hoffset
  2. one inch + \voffset
  3. \oddsidemargin = 31pt
  4. \topmargin = 20pt
  5. \headheight = 12pt
  6. \headsep = 25pt
  7. \textheight = 592pt
  8. \textwidth = 390pt
  9. \marginparsep = 35pt
  10. \marginparwidth = 35pt
  11. \footskip = 30pt

To set the margin to 1 (one) inch using the geometry package use the following example

\usepackage{geometry}
\geometry{a4paper, margin=1in}

In addition to the above example, the geometry command can modify the paper size, and orientation. To change the size of the paper, use the example below:

\usepackage[a4paper, total={7in, 8in}]{geometry}

To change the page orientation, you need to add landscape to the geometry options as shown below:

\usepackage{geometery}
\geometry{a4paper, landscape, margin=1.5in
Landscape Orientation

Table of contents

By default, a LaTeX table of contents is titled “Contents”. There are times when you prefer to relabel the text to be “Table of Content”, change the vertical spacing between the ToC and your first section of chapter, or simply change the color of the text.

To change the text you add the following lines to your preamble, substitute english with your desired language :

\usepackage[english]{babel}
\addto\captionsenglish{
\renewcommand{\contentsname}
{\bfseries{Table of Contents}}}

To manipulate the virtual spacing between ToC and the list of figures, sections, and chapters, use the tocloft package. The two options used in this article are cftbeforesecskip and cftaftertoctitleskip.

The tocloft package provides means of controlling the typographic design of the ToC, List of Figures and List of Tables.

Tocloft package doucmentation

\usepackage{tocloft}
\setlength\ctfbeforesecskip{2pt}
\setlength\cftaftertoctitleskip{30pt}

cftbeforesecskip is the spacing between the sections in the ToC, while
cftaftertoctitleskip is the space between text “Table of Contents” and the first section in the ToC. The below image shows the differences between the default and the modified ToC.

Default ToC
Customized ToC

Borders

When using the package hyperref in your document, LaTeX section lists in the ToC and references including \url have a border, as shown in the images below.

To remove these borders, include the following in the preamble, In the previous section, “Table of Contents,” you will see that there are not any borders in the ToC.

\usepackage{hyperref}
\hypersetup{ pdfborder = {0 0 0}}

Title section

To modify the title section font, style, and/or color, use the package titlesec. In this example, you will change the font size, font style, and font color of the section, subsection, and subsubsection. First, add the following to the preamble.

\usepackage{titlesec}
\titleformat*{\section}{\Huge\bfseries\color{darkblue}}
\titleformat*{\subsection}{\huge\bfseries\color{darkblue}}
\titleformat*{\subsubsection}{\Large\bfseries\color{darkblue}}

Taking a closer look at the code, \titleformat*{\section} specifies the depth of section to use. The above example, uses up to the third depth. The {\Huge\bfseries\color{darkblue}} portion specifies the size of the font, font style and, font color

Page style

To customize the page headers and footers one of the packages, use fancyhdr. This example uses this package to modify the page style, header, and footer. The code below provides a brief description of what each option does.

\pagestyle{fancy} %for header to be on each page
\fancyhead[L]{} %keep left header blank
\fancyhead[C]{} %keep centre header blank
\fancyhead[R]{\leftmark} %add the section/chapter to the header right
\fancyfoot[L]{Static Content} %add static test to the left footer
\fancyfoot[C]{} %keep centre footer blank
\fancyfoot[R]{\thepage} %add the page number to the right footer
\setlength\voffset{-0.25in} %space between page border and header (1in + space)
\setlength\headheight{12pt} %height of the actual header.
\setlength\headsep{25pt} %separation between header and text.
\renewcommand{\headrulewidth}{2pt} % add header horizontal line
\renewcommand{\footrulewidth}{1pt} % add footer horizontal line

The results of this change are shown below:

Header
Footer

Tips

Centralize the preamble

If write many TeX documents, you can create a .tex file with all your preamble based on your document categories and reference this file. For example, I use a structure.tex as shown below.

$ cat article_structure.tex
\usepackage[english]{babel}
\addto\captionsenglish{
\renewcommand{\contentsname}
{\bfseries{\color{darkblue}Table of Contents}}
} % Relable the contents
%\usepackage[margin=0.5in]{geometry} % specifies the margin of the document
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{graphicx} % allows you to add graphics to the document
\usepackage{hyperref} % permits redirection of URL from a PDF document
\usepackage{fullpage} % formate the content to utilise the full page
%\usepackage{a4wide}
\usepackage[export]{adjustbox} % to force image position
%\usepackage[section]{placeins} % to have multiple images in a figure
\usepackage{tabularx} % for wrapping text in a table
%\usepackage{rotating}
\usepackage{multirow}
\usepackage{subcaption} % to have multiple images in a figure
%\usepackage{smartdiagram} % initialise smart diagrams
\usepackage{enumitem} % to manage the spacing between lists and enumeration
\usepackage{fancyhdr} %, graphicx} %for header to be on each page
\pagestyle{fancy} %for header to be on each page
%\fancyhf{}
\fancyhead[L]{}
\fancyhead[C]{}
\fancyhead[R]{\leftmark}
\fancyfoot[L]{Static Content} %\includegraphics[width=0.02\textwidth]{virgin_voyages.png}}
\fancyfoot[C]{} % clear center
\fancyfoot[R]{\thepage}
\setlength\voffset{-0.25in} %Space between page border and header (1in + space)
\setlength\headheight{12pt} %Height of the actual header.
\setlength\headsep{25pt} %Separation between header and text.
\renewcommand{\headrulewidth}{2pt} % adds horizontal line
\renewcommand{\footrulewidth}{1pt} % add horizontal line (footer)
%\renewcommand{\oddsidemargin}{2pt} % adjuct the margin spacing
%\renewcommand{\pagenumbering}{roman} % change the numbering style
%\renewcommand{\hoffset}{20pt}
%\usepackage{color}
\usepackage[table]{xcolor}
\hypersetup{ pdfborder = {0 0 0}} % removes the red boarder from the table of content
%\usepackage{wasysym} %add checkbox
%\newcommand\insq[1]{%
% \Square\ #1\quad%
%} % specify the command to add checkbox
%\usepackage{xcolor}
%\usepackage{colortbl}
%\definecolor{Gray}{gray}{0.9} % create new colour
%\definecolor{LightCyan}{rgb}{0.88,1,1} % create new colour
%\usepackage[first=0,last=9]{lcg}
%\newcommand{\ra}{\rand0.\arabic{rand}}
%\newcolumntype{g}{>{\columncolor{LightCyan}}c} % create new column type g
%\usesmartdiagramlibrary{additions}
%\setcounter{figure}{0}
\setcounter{secnumdepth}{0} % sections are level 1
\usepackage{csquotes} % the proper was of using double quotes
%\usepackage{draftwatermark} % Enable watermark
%\SetWatermarkText{DRAFT} % Specify watermark text
%\SetWatermarkScale{5} % Toggle watermark size
\usepackage{listings} % add code blocks
\usepackage{titlesec} % Manipulate section/subsection
\titleformat{\section}{\Huge\bfseries\color{darkblue}} % update sections to bold with the colour blue \titleformat{\subsection}{\huge\bfseries\color{darkblue}} % update subsections to bold with the colour blue
\titleformat*{\subsubsection}{\Large\bfseries\color{darkblue}} % update subsubsections to bold with the colour blue
\usepackage[toc]{appendix} % Include appendix in TOC
\usepackage{xcolor}
\usepackage{tocloft} % For manipulating Table of Content virtical spacing
%\setlength\cftparskip{-2pt}
\setlength\cftbeforesecskip{2pt} %spacing between the sections
\setlength\cftaftertoctitleskip{30pt} % space between the first section and the text ``Table of Contents''
\definecolor{navyblue}{rgb}{0.0,0.0,0.5}
\definecolor{zaffre}{rgb}{0.0, 0.08, 0.66}
\definecolor{white}{rgb}{1.0, 1.0, 1.0}
\definecolor{darkblue}{rgb}{0.0, 0.2, 0.6}
\definecolor{darkgray}{rgb}{0.66, 0.66, 0.66}
\definecolor{lightgray}{rgb}{0.83, 0.83, 0.83}
%\pagenumbering{roman}

In your articles, refer to the structure.tex file as shown in the example below:

\documentclass[a4paper,11pt]{article}
\input{/path_to_structure.tex}}
\begin{document}
…...
\end{document}

Add watermarks

To enable watermarks in your LaTeX document, use the draftwatermark package. The below code snippet and image demonstrates the how to add a watermark to your document. By default the watermark color is grey which can be modified to your desired color.

\usepackage{draftwatermark} \SetWatermarkText{\color{red}Classified} %add watermark text \SetWatermarkScale{4} %specify the size of the text

Conclusion

In this series you saw some of the basic, but rich features that LaTeX provides for customizing your document to cater to your needs or the audience the document will be presented to. With LaTeX, there are many packages available to customize the page layout, style, and more.

Posted on Leave a comment

Spam Classification with ML-Pack

Introduction

ML-Pack is a small footprint C++ machine learning library that can be easily integrated into other programs. It is an actively developed open source project and released under a BSD-3 license. Machine learning has gained popularity due to the large amount of electronic data that can be collected. Some other popular machine learning frameworks include TensorFlow, MxNet, PyTorch, Chainer and Paddle Paddle, however these are designed for more complex workflows than ML-Pack. On Fedora, ML-Pack is packaged by its lead developer Ryan Curtin. In addition to a command line interface, ML-Pack has bindings for Python and Julia. Here, we will focus on the command line interface since this may be useful for system administrators to integrate into their workflows.

Installation

You can install ML-Pack on the Fedora command line using

$ sudo dnf -y install mlpack mlpack-bin

You can also install the documentation, development headers and Python bindings by using …

$ sudo dnf -y install mlpack-doc \
mlpack-devel mlpack-python3

though they will not be used in this introduction.

Example

As an example, we will train a machine learning model to classify spam SMS messages. To keep this article brief, linux commands will not be fully explained, but you can find out more about them by using the man command, for example for the command first command used below, wget

$ man wget

will give you information that wget will download files from the web and options you can use for it.

Get a dataset

We will use an example spam dataset in Indonesian provided by Yudi Wibisono

 
$ wget https://drive.google.com/file/d/1-stKadfTgJLtYsHWqXhGO3nTjKVFxm_Q/view
$ unzip dataset_sms_spam_bhs_indonesia_v1.zip

Pre-process dataset

We will try to classify a message as spam or ham by the number of occurrences of a word in a message. We first change the file line endings, remove line 243 which is missing a label and then remove the header from the dataset. Then, we split our data into two files, labels and messages. Since the labels are at the end of the message, the message is reversed and then the label removed and placed in one file. The message is then removed and placed in another file.

$ tr 'r' 'n' < dataset_sms_spam_v1.csv > dataset.txt
$ sed '243d' dataset.txt > dataset1.csv
$ sed '1d' dataset1.csv > dataset.csv
$ rev dataset.csv | cut -c1 | rev > labels.txt
$ rev dataset.csv | cut -c2- | rev > messages.txt
$ rm dataset.csv
$ rm dataset1.csv
$ rm dataset.txt

Machine learning works on numeric data, so we will use labels of 1 for ham and 0 for spam. The dataset contains three labels, 0, normal sms (ham), 1, fraud (spam), and 2 promotion (spam). We will label all spam as 1, so promotions and fraud will be labelled as 1.

$ tr '2' '1' < labels.txt > labels.csv
$ rm labels.txt

The next step is to convert all text in the messages to lower case and for simplicity remove punctuation and any symbols that are not spaces, line endings or in the range a-z (one would need expand this range of symbols for production use)

$ tr '[:upper:]' '[:lower:]' < \
messages.txt > messagesLower.txt
$ tr -Cd 'abcdefghijklmnopqrstuvwxyz n' < \ messagesLower.txt > messagesLetters.txt
$ rm messagesLower.txt

We now obtain a sorted list of unique words used (this step may take a few minutes, so use nice to give it a low priority while you continue with other tasks on your computer).

$ nice -20 xargs -n1 < messagesLetters.txt > temp.txt
$ sort temp.txt > temp2.txt
$ uniq temp2.txt > words.txt
$ rm temp.txt
$ rm temp2.txt

We then create a matrix, where for each message, the frequency of word occurrences is counted (more on this on Wikipedia, here and here). This requires a few lines of code, so the full script, which should be saved as ‘makematrix.sh’ is below

#!/bin/bash
declare -a words=()
declare -a letterstartind=()
declare -a letterstart=()
letter=" "
i=0
lettercount=0
while IFS= read -r line; do labels[$((i))]=$line let "i++"
done < labels.csv
i=0
while IFS= read -r line; do words[$((i))]=$line firstletter="$( echo $line | head -c 1 )" if [ "$firstletter" != "$letter" ] then letterstartind[$((lettercount))]=$((i)) letterstart[$((lettercount))]=$firstletter letter=$firstletter let "lettercount++" fi let "i++"
done < words.txt
letterstartind[$((lettercount))]=$((i))
echo "Created list of letters" touch wordfrequency.txt
rm wordfrequency.txt
touch wordfrequency.txt
messagecount=0
messagenum=0
messages="$( wc -l messages.txt )"
i=0
while IFS= read -r line; do let "messagenum++" declare -a wordcount=() declare -a wordarray=() read -r -a wordarray <<> wordfrequency.txt echo "Processed message ""$messagenum" let "i++"
done < messagesLetters.txt
# Create csv file
tr ' ' ',' data.csv

Since Bash is an interpreted language, this simple implementation can take upto 30 minutes to complete. If using the above Bash script on your primary workstation, run it as a task with low priority so that you can continue with other work while you wait:

$ nice -20 bash makematrix.sh

Once the script has finished running, split the data into testing (30%) and training (70%) sets:

$ mlpack_preprocess_split \ --input_file data.csv \ --input_labels_file labels.csv \ --training_file train.data.csv \ --training_labels_file train.labels.csv \ --test_file test.data.csv \ --test_labels_file test.labels.csv \ --test_ratio 0.3 \ --verbose

Train a model

Now train a Logistic regression model:

$ mlpack_logistic_regression \
--training_file train.data.csv \
--labels_file train.labels.csv --lambda 0.1 \
--output_model_file lr_model.bin

Test the model

Finally we test our model by producing predictions,

$ mlpack_logistic_regression \
--input_model_file lr_model.bin \ --test_file test.data.csv \
--output_file lr_predictions.csv

and comparing the predictions with the exact results,

$ export incorrect=$(diff -U 0 lr_predictions.csv \
test.labels.csv | grep '^@@' | wc -l)
$ export tests=$(wc -l < lr_predictions.csv)
$ echo "scale=2; 100 * ( 1 - $((incorrect)) \
/ $((tests)))" | bc

This gives approximately 90% validation rate, similar to that obtained here.

The dataset is composed of approximately 50% spam messages, so the validation rates are quite good without doing much parameter tuning. In typical cases, datasets are unbalanced with many more entries in some categories than in others. In these cases a good validation rate can be obtained by mispredicting the class with a few entries. Thus to better evaluate these models, one can compare the number of misclassifications of spam, and the number of misclassifications of ham. Of particular importance in applications is the number of false positive spam results as these are typically not transmitted. The script below produces a confusion matrix which gives a better indication of misclassification. Save it as ‘confusion.sh’

#!/bin/bash
declare -a labels
declare -a lr
i=0
while IFS= read -r line; do labels[i]=$line let "i++"
done < test.labels.csv
i=0
while IFS= read -r line; do lr[i]=$line let "i++"
done < lr_predictions.csv
TruePositiveLR=0
FalsePositiveLR=0
TrueZerpLR=0
FalseZeroLR=0
Positive=0
Zero=0
for i in "${!labels[@]}"; do if [ "${labels[$i]}" == "1" ] then let "Positive++" if [ "${lr[$i]}" == "1" ] then let "TruePositiveLR++" else let "FalseZeroLR++" fi fi if [ "${labels[$i]}" == "0" ] then let "Zero++" if [ "${lr[$i]}" == "0" ] then let "TrueZeroLR++" else let "FalsePositiveLR++" fi fi done
echo "Logistic Regression"
echo "Total spam" $Positive
echo "Total ham" $Zero
echo "Confusion matrix"
echo " Predicted class"
echo " Ham | Spam "
echo " ---------------"
echo " Actual| Ham | " $TrueZeroLR "|" $FalseZeroLR
echo " class | Spam | " $FalsePositiveLR " |" $TruePositiveLR
echo ""

then run the script

$ bash confusion.sh

You should get output similar to

Logistic Regression
Total spam 183
Total ham 159
Confusion matrix

    Predicted class
    Ham Spam
Actual class Ham 128 26
Spam 31 157

which indicates a reasonable level of classification. Other methods you can try in ML-Pack for this problem include Naive Bayes, random forest, decision tree, AdaBoost and perceptron.

To improve the error rating, you can try other pre-processing methods on the initial data set. Neural networks can give upto 99.95% validation rates, see for example here, here and here. However, using these techniques with ML-Pack cannot be done on the command line interface at present and is best covered in another post.

For more on ML-Pack, please see the documentation.

Posted on Leave a comment

LaTeX typesetting part 2 (tables)

LaTeX offers a number of tools to create and customise tables, in this series we will be using the tabular and tabularx environment to create and customise tables.

Basic table

To create a table you simply specify the environment \begin{tabular}{columns}

\begin{tabular}{c|c} Release &;\;Codename \\ \hline Fedora Core 1 &;Yarrow \\ Fedora Core 2 &;Tettnang \\ Fedora Core 3 &;Heidelberg \\ Fedora Core 4 &;Stentz \\ \end{tabular}

In the above example “{c|c}” in the curly bracket refers to the position of the text in the column. The below table summarises the positional argument together with the description.

Position Argument
c Position text in the centre
l Position text left-justified
r Position text right-justified
p{width} Align the text at the top of the cell
m{width} Align the text in the middle of the cell
b{width} Align the text at the bottom of the cell

Both m{width} and b{width} requires the array package to be specified in the preamble.

Using the example above, let us breakdown the important points used and describe a few more options that you will see in this series

Option Description
&; Defines each cell, the ampersand is only used from the second column
\ This terminates the row and start a new row
| Specifies the vertical line in the table (optional)
\hline Specifies the horizontal line (optional)
*{num}{form} This is handy when you have many columns and is an efficient way of limiting the repetition
|| Specifies the double vertical line

Customizing a table

Now that some of the options available are known, let us create a table using the options described in the previous section.

\begin{tabular}{*{3}{|l|}}
\hline \textbf{Version} &;\textbf{Code name} &;\textbf{Year released} \\
\hline Fedora 6 &;Zod &;2006 \\ \hline Fedora 7 &;Moonshine &;2007 \\ \hline Fedora 8 &;Werewolf &;2007 \\
\hline
\end{tabular}

Managing long text

With LaTeX, if there are many texts in a column it will not be formatted well and does not look presentable.

The below example shows how long text is formatted, we will use “blindtext” in the preamble so that we can produce sample text.

\begin{tabular}{|l|l|}\hline Summary &;Description \\ \hline Test &;\blindtext \\
\end{tabular}

As you can see the text exceeds the page width; however, there are a couple of options to overcome this challenge.

  • Specify the column width, for example, m{5cm}
  • Utilise the tabularx environment, this requires tabularx package in the preamble.

Managing long text with column width

By specifying the column width the text will be wrapped into the width as shown in the example below.

\begin{tabular}{|l|m{14cm}|} \hline Summary &;Description \\ \hline Test &;\blindtext \\ \hline
\end{tabular}\vspace{3mm}

Managing long text with tabularx

Before we can leverage tabularx we need to add it in the preamble. Tabularx takes the following example

**\begin{tabularx}{width}{columns}** \begin{tabularx}{\textwidth}{|l|X|} \hline
Summary &; Tabularx Description\\ \hline
Text &;\blindtext \\ \hline
\end{tabularx}

Notice that the column that we want the long text to be wrapped has a capital “X” specified.

Multi-row and multi-column

There are times when you will need to merge rows and/or column. This section describes how it is accomplished. To use multi-row and multi-column add multi-row to the preamble.

Multirow

Multirow takes the following argument \multirow{number_of_rows}{width}{text}, let us look at the below example.

\begin{tabular}{|l|l|}\hline Release &;Codename \\ \hline Fedora Core 4 &Stentz \\ \hline \multirow{2}{*}{MultiRow} &;Fedora 8 \\ &;Werewolf \\ \hline
\end{tabular}

In the above example, two rows were specified, the ‘*’ tells LaTeX to automatically manage the size of the cell.

Multicolumn

Multicolumn argument is \multicolumn{number_of_columns}{cell_position}{text}, below example demonstrates multicolumn.

\begin{tabular}{|l|l|l|}\hline Release &;Codename &;Date \\ \hline Fedora Core 4 &;Stentz &;2005 \\ \hline \multicolumn{3}{|c|}{Mulit-Column} \\ \hline
\end{tabular}

Working with colours

Colours can be assigned to the text, an individual cell or the entire row. Additionally, we can configure alternating colours for each row.

Before we can add colour to our tables we need to include \usepackage[table]{xcolor} into the preamble. We can also define colours using the following colour reference LaTeX Colour or by adding an exclamation after the colour prefixed by the shade from 0 to 100. For example, gray!30

\definecolor{darkblue}{rgb}{0.0, 0.0, 0.55}
\definecolor{darkgray}{rgb}{0.66, 0.66, 0.66}

Below example demonstrate this a table with alternate colours, \rowcolors take the following options \rowcolors{row_start_colour}{even_row_colour}{odd_row_colour}.

\rowcolors{2}{darkgray}{gray!20}
\begin{tabular}{c|c} Release &;Codename \\ \hline Fedora Core 1 &;Yarrow \\ Fedora Core 2 &;Tettnang \\ Fedora Core 3 &;Heidelberg \\ Fedora Core 4 &;Stentz \\
\end{tabular}

In addition to the above example, \rowcolor can be used to specify the colour of each row, this method works best when there are multi-rows. The following examples show the impact of using the \rowcolours with multi-row and how to work around it.

As you can see the multi-row is visible in the first row, to fix this we have to do the following.

\begin{tabular}{|l|l|}\hline \rowcolor{darkblue}\textsc{\color{white}Release} &;\textsc{\color{white}Codename} \\ \hline \rowcolor{gray!10}Fedora Core 4 &;Stentz \\ \hline \rowcolor{gray!40}&;Fedora 8 \\ \rowcolor{gray!40}\multirow{-2}{*}{Multi-Row} &;Werewolf \\ \hline
\end{tabular}

Let us discuss the changes that were implemented to resolve the multi-row with the alternate colour issue.

  • The first row started above the multi-row
  • The number of rows was changed from 2 to -2, which means to read from the line above
  • \rowcolor was specified for each row, more importantly, the multi-rows must have the same colour so that you can have the desired results.

One last note on colour, to change the colour of a column you need to create a new column type and define the colour. The example below illustrates how to define the new column colour.

\newcolumntype{g}{>{\columncolor{darkblue}}l} 

Let’s break it down:

  • \newcolumntype{g}: defines the letter g as the new column
  • {>{\columncolor{darkblue}}l}: here we select our desired colour, and l tells the column to be left-justified, this can be subsitued with c or r
\begin{tabular}{g|l} \textsc{Release} &;\textsc{Codename} \\ \hline Fedora Core 4 &;Stentz \\ &Fedora 8 \\ \multirow{-2}{*}{Multi-Row} &;Werewolf \\ \end{tabular}\

Landscape table

There may be times when your table has many columns and will not fit elegantly in portrait. With the rotating package in preamble you will be able to create a sideways table. The below example demonstrates this.

For the landscape table, we will use the sidewaystable environment and add the tabular environment within it, we also specified additional options.

  • \centering to position the table in the centre of the page
  • \caption{} to give our table a name
  • \label{} this enables us to reference the table in our document
\begin{sidewaystable}
\centering
\caption{Sideways Table}
\label{sidetable}
\begin{tabular}{ll} \rowcolor{darkblue}\textsc{\color{white}Release} &;\textsc{\color{white}Codename} \\ \rowcolor{gray!10}Fedora Core 4 &;Stentz \\ \rowcolor{gray!40} &;Fedora 8 \\ \rowcolor{gray!40}\multirow{-2}{*}{Multi-Row} &;Werewolf \\ \end{tabular}\vspace{3mm}
\end{sidewaystable}

Lists in tables

To include a list into a table you can use tabularx and include the list in the column where the X is specified. Another option will be to use tabular but you must specify the column width.

List in tabularx

\begin{tabularx}{\textwidth}{|l|X|} \hline Fedora Version &;Editions \\ \hline Fedora 32 &;\begin{itemize}[noitemsep] \item CoreOS \item Silverblue \item IoT \end{itemize} \\ \hline
\end{tabularx}\vspace{3mm}

List in tabular

\begin{tabular}{|l|m{6cm}|}\hline Fedora Version &;Editions \\ \hline Fedora 32 &;\begin{itemize}[noitemsep] \item CoreOS \item Silverblue \item IoT \end{itemize} \\ \hline
\end{tabular}

Conclusion

LaTeX offers many ways to customise your table with tabular and tabularx, you can also add both tabular and tabularx within the table environment (\begin\table) to add the table name and to position the table.

The packages used in this series are:

\usepackage{fullpage}
\usepackage{blindtext} % add demo text
\usepackage{array} % used for column positions
\usepackage{tabularx} % adds tabularx which is used for text wrapping
\usepackage{multirow} % multi-row and multi-colour support
\usepackage[table]{xcolor} % add colour to the columns \usepackage{rotating} % for landscape/sideways tables

Additional Reading

This was an intermediate lesson on tables. For more advanced information about tables and LaTex in general, you can go to the LaTeX Wiki.

Posted on Leave a comment

Internet connection sharing with NetworkManager

NetworkManager is the network configuration daemon used on Fedora and many other distributions. It provides a consistent way to configure network interfaces and other network-related aspects on a Linux machine. Among many other features, it provides a Internet connection sharing functionality that can be very useful in different situations.

For example, suppose you are in a place without Wi-Fi and want to share your laptop’s mobile data connection with friends. Or maybe you have a laptop with broken Wi-Fi and want to connect it via Ethernet cable to another laptop; in this way the first laptop become able to reach the Internet and maybe download new Wi-Fi drivers.

In cases like these it is useful to share Internet connectivity with other devices. On smartphones this feature is called “Tethering” and allows sharing a cellular connection via Wi-Fi, Bluetooth or a USB cable.

This article shows how the connection sharing mode offered by NetworkManager can be set up easily; it addition, it explains how to configure some more advanced features for power users.

How connection sharing works

The basic idea behind connection sharing is that there is an upstream interface with Internet access and a downstream interface that needs connectivity. These interfaces can be of a different type—for example, Wi-Fi and Ethernet.

If the upstream interface is connected to a LAN, it is possible to configure our computer to act as a bridge; a bridge is the software version of an Ethernet switch. In this way, you “extend” the LAN to the downstream network. However this solution doesn’t always play well with all interface types; moreover, it works only if the upstream network uses private addresses.

A more general approach consists in assigning a private IPv4 subnet to the downstream network and turning on routing between the two interfaces. In this case, NAT (Network Address Translation) is also necessary. The purpose of NAT is to modify the source of packets coming from the downstream network so that they look as if they originate from your computer.

It would be inconvenient to configure manually all the devices in the downstream network. Therefore, you need a DHCP server to assign addresses automatically and configure hosts to route all traffic through your computer. In addition, in case the sharing happens through Wi-Fi, the wireless network adapter must be configured as an access point.

There are many tutorials out there explaining how to achieve this, with different degrees of difficulty. NetworkManager hides all this complexity and provides a shared mode that makes this configuration quick and convenient.

Configuring connection sharing

The configuration paradigm of NetworkManager is based on the concept of connection (or connection profile). A connection is a group of settings to apply on a network interface.

This article shows how to create and modify such connections using nmcli, the NetworkManager command line utility, and the GTK connection editor. If you prefer, other tools are available such as nmtui (a text-based user interface), GNOME control center or the KDE network applet.

A reasonable prerequisite to share Internet access is to have it available in the first place; this implies that there is already a NetworkManager connection active. If you are reading this, you probably already have a working Internet connection. If not, see this article for a more comprehensive introduction to NetworkManager.

The rest of this article assumes you already have a Wi-Fi connection profile configured and that connectivity must be shared over an Ethernet interface enp1s0.

To enable sharing, create a connection for interface enp1s0 and set the ipv4.method property to shared instead of the usual auto:

$ nmcli connection add type ethernet ifname enp1s0 ipv4.method shared con-name local

The shared IPv4 method does multiple things:

  • enables IP forwarding for the interface;
  • adds firewall rules and enables masquerading;
  • starts dnsmasq as a DHCP and DNS server.

NetworkManager connection profiles, unless configured otherwise, are activated automatically. The new connection you have added should be already active in the device status:

$ nmcli device
DEVICE         TYPE      STATE         CONNECTION
enp1s0         ethernet  connected     local
wlp4s0         wifi      connected     home-wifi

If that is not the case, activate the profile manually with nmcli connection up local.

Changing the shared IP range

Now look at how NetworkManager configured the downstream interface enp1s0:

$ ip -o addr show enp1s0
8: enp1s0 inet 10.42.0.1/24 brd 10.42.0.255 ...

10.42.0.1/24 is the default address set by NetworkManager for a device in shared mode. Addresses in this range are also distributed via DHCP to other computers. If the range conflicts with other private networks in your environment, change it by modifying the ipv4.addresses property:

$ nmcli connection modify local ipv4.addresses 192.168.42.1/24

Remember to activate again the connection profile after any change to apply the new values:

$ nmcli connection up local $ ip -o addr show enp1s0
8: enp1s0 inet 192.168.42.1/24 brd 192.168.42.255 ...

If you prefer using a graphical tool to edit connections, install the nm-connection-editor package. Launch the program and open the connection to edit; then select the Shared to other computers method in the IPv4 Settings tab. Finally, if you want to use a specific IP subnet, click Add and insert an address and a netmask.

Adding custom dnsmasq options

In case you want to further extend the dnsmasq configuration, you can add new configuration snippets in /etc/NetworkManager/dnsmasq-shared.d/. For example, the following configuration:

dhcp-option=option:ntp-server,192.168.42.1
dhcp-host=52:54:00:a4:65:c8,192.168.42.170

tells dnsmasq to advertise a NTP server via DHCP. In addition, it assigns a static IP to a client with a certain MAC.

There are many other useful options in the dnsmasq manual page. However, remember that some of them may conflict with the rest of the configuration; so please use custom options only if you know what you are doing.

Other useful tricks

If you want to set up sharing via Wi-Fi, you could create a connection in Access Point mode, manually configure the security, and then enable connection sharing. Actually, there is a quicker way, the hotspot mode:

$ nmcli device wifi hotspot [ifname $dev] [password $pw]

This does everything needed to create a functional access point with connection sharing. The interface and password options are optional; if they are not specified, nmcli chooses the first Wi-Fi device available and generates a random password. Use the ‘nmcli device wifi show-password‘ command to display information for the active hotspot; the output includes the password and a text-based QR code that you can scan with a phone:

What about IPv6?

Until now this article discussed sharing IPv4 connectivity. NetworkManager also supports sharing IPv6 connectivity through DHCP prefix delegation. Using prefix delegation, a computer can request additional IPv6 prefixes from the DHCP server. Those public routable addresses are assigned to local networks via Router Advertisements. Again, NetworkManager makes all this easier through the shared IPv6 mode:

$ nmcli connection modify local ipv6.method shared

Note that IPv6 sharing requires support from the Internet Service Provider, which should give out prefix delegations through DHCP. If the ISP doesn’t provides delegations, IPv6 sharing will not work; in such case NM will report in the journal that no prefixes are available:

policy: ipv6-pd: none of 0 prefixes of wlp1s0 can be shared on enp1s0

Also, note that the Wi-Fi hotspot command described above only enables IPv4 sharing; if you want to also use IPv6 sharing you must edit the connection manually.

Conclusion

Remember, the next time you need to share your Internet connection, NetworkManager will make it easy for you.

If you have suggestions on how to improve this feature or any other feedback, please reach out to the NM community using the mailing list, the issue tracker or joining the #nm IRC channel on freenode.

Posted on Leave a comment

LaTeX Typesetting – Part 1 (Lists)

This series builds on the previous articles: Typeset your docs with LaTex and TeXstudio on Fedora and LaTeX 101 for beginners. This first part of the series is about LaTeX lists.

Types of lists

LaTeX lists are enclosed environments, and each item in the list can take a line of text to a full paragraph. There are three types of lists available in LaTeX. They are:

  • Itemized: unordered or bullet
  • Enumerated: ordered
  • Description: descriptive

Creating lists

To create a list, prefix each list item with the \item command. Precede and follow the list of items with the \begin{<type>} and \end{<type>} commands respectively where <type> is substituted with the type of the list as illustrated in the following examples.

Itemized list

\begin{itemize} \item Fedora \item Fedora Spin \item Fedora Silverblue
\end{itemize}

Enumerated list

\begin{enumerate} \item Fedora CoreOS \item Fedora Silverblue \item Fedora Spin
\end{enumerate}

Descriptive list

\begin{description} \item[Fedora 6] Code name Zod \item[Fedora 8] Code name Werewolf
\end{description}

Spacing list items

The default spacing can be customized by adding \usepackage{enumitem} to the preamble. The enumitem package enables the noitemsep option and the \itemsep command which you can use on your lists as illustrated below.

Using the noitemsep option

Enclose the noitemsep option in square brackets and place it on the \begin command as shown below. This option removes the default spacing.

\begin{itemize}[noitemsep] \item Fedora \item Fedora Spin \item Fedora Silverblue
\end{itemize}

Using the \itemsep command

The \itemsep command must be suffixed with a number to indicate how much space there should be between the list items.

\begin{itemize} \itemsep0.75pt \item Fedora Silverblue \item Fedora CoreOS
\end{itemize}

Nesting lists

LaTeX supports nested lists up to four levels deep as illustrated below.

Nested itemized lists

\begin{itemize}[noitemsep] \item Fedora Versions \begin{itemize} \item Fedora 8 \item Fedora 9 \begin{itemize} \item Werewolf \item Sulphur \begin{itemize} \item 2007-05-31 \item 2008-05-13 \end{itemize} \end{itemize} \end{itemize} \item Fedora Spin \item Fedora Silverblue
\end{itemize}

Nested enumerated lists

\begin{enumerate}[noitemsep] \item Fedora Versions \begin{enumerate} \item Fedora 8 \item Fedora 9 \begin{enumerate} \item Werewolf \item Sulphur \begin{enumerate} \item 2007-05-31 \item 2008-05-13 \end{enumerate} \end{enumerate} \end{enumerate} \item Fedora Spin \item Fedora Silverblue
\end{enumerate}

List style names for each list type

Enumerated Itemized
\alph* $\bullet$
\Alph* $\cdot$
\arabic* $\diamond$
\roman* $\ast$
\Roman* $\circ$
$-$

Default style by list depth

Level Enumerated Itemized
1 Number Bullet
2 Lowercase alphabet Dash
3 Roman numerals Asterisk
4 Uppercase alphabet Period

Setting list styles

The below example illustrates each of the different itemiszed list styles.

% Itemize style
\begin{itemize} \item[$\ast$] Asterisk \item[$\diamond$] Diamond \item[$\circ$] Circle \item[$\cdot$] Period \item[$\bullet$] Bullet (default) \item[--] Dash \item[$-$] Another dash
\end{itemize}

There are three methods of setting list styles. They are illustrated below. These methods are listed by priority; highest priority first. A higher priority will override a lower priority if more than one is defined for a list item.

List styling method 1 – per item

Enclose the name of the desired style in square brackets and place it on the \item command as demonstrated below.

% First method
\begin{itemize} \item[$\ast$] Asterisk \item[$\diamond$] Diamond \item[$\circ$] Circle \item[$\cdot$] period \item[$\bullet$] Bullet (default) \item[--] Dash \item[$-$] Another dash
\end{itemize}

List styling method 2 – on the list

Prefix the name of the desired style with label=. Place the parameter, including the label= prefix, in square brackets on the \begin command as demonstrated below.

% Second method
\begin{enumerate}[label=\Alph*.] \item Fedora 32 \item Fedora 31 \item Fedora 30
\end{enumerate}

List styling method 3 – on the document

This method changes the default style for the entire document. Use the \renewcommand to set the values for the labelitems. There is a different labelitem for each of the four label depths as demonstrated below.

% Third method
\renewcommand{\labelitemi}{$\ast$}
\renewcommand{\labelitemii}{$\diamond$}
\renewcommand{\labelitemiii}{$\bullet$}
\renewcommand{\labelitemiv}{$-$}

Summary

LaTeX supports three types of lists. The style and spacing of each of the list types can be customized. More LaTeX elements will be explained in future posts.

Additional reading about LaTeX lists can be found here: LaTeX List Structures

Posted on Leave a comment

Freeplane: the Swiss Army knife for your brain

A previous Fedora Magazine article covered tracking your time and tasks. Another introduced some mind mapping tools. There you learned that mind mapping is a visual technique for structuring and organizing thoughts and ideas. This article covers another mind mapping app you can use in Fedora: Freeplane.

Freeplane is a free and open source software application that supports thinking, sharing information and getting things done. Freeplane runs on any operating system that has a current version of Java installed.

Installing Freeplane

Freeplane is not currently packaged in the Fedora repositories, so you will need to install it from the project’s website.

  1. Go to the project’s Sourceforge site and click Download to download the file
  2. Open a terminal and extract the file (note that the version you download may be different): unzip freeplane_bin-1.8.5.zip
  3. Move the extracted contents to the /opt directory: sudo mv freeplane-1.8.5 /opt/freeplane

The configuration file is located in: ~/.config/freeplane. You can launch Freeplane by running /opt/freeplane/freeplane.sh from a terminal, but if you want to launch it from the desktop environment you can create a desktop file.

Open your favorite text editor and save the contents below to ~/.local/share/applications/freeplane.desktop.

[Desktop Entry]
Version=1.0
Name=Freeplane
Icon=/opt/freeplane/freeplane.svg Exec=/opt/freeplane/freeplane.sh
Terminal=false
Icon=freeplane
Type=Application
MimeType=text/x-troff-mm; Categories=Office;
GenericName=Freeplane
Comment=A free tool to organise your information
Keywords=Mindmaps; Knowledge management; Brainstorming;

Next, update the desktop file database with update-desktop-database ~/.local/share/applications

Now you can launch Freeplane from your desktop environment.

Using Freeplane

At its first startup, Freeplane’s main window includes an example mind map with links to documentation about all the different things you can do with Freeplane.

Start your First Mind Mapping

You have a choice of templates when you create a new mind map. The standard template works for most cases. Start typing the idea and your text will replace the center text.

Press the Insert key to add a branch (or node) off the center with a blank field where you can fill in something associated with the idea. Press Insert again to add another node connected to the first one.

Press Enter on a node to add a node parallel to that one.

All keyboard shortcuts are in the Freeplane documentation.

Freeplane Plug-ins

Plug-ins can be used to extend and customize the use of the app. Two important ones are:

  1. Freeplane GTD+: A generic task management add-on, with a special focus in supporting the Getting Things Done (GTD) methodology
  2. Study Planner: helps organize learning

To install a new add-on to Freeplane, find the add-on you want on the Freemind add-ons directory.

  • Download the desired add-on
  • In Freeplane, select Tools > Add-ons
  • Click the Search button
  • Find and select the file you just downloaded
  • Click Install
  • Depending on the add-on, you may have additional questions to answer
  • Restart Freeplane to use the new add-on

Integrating mind mapping in your everyday life

Mind mapping is a very powerful method that can be of great assistance in many aspects of life.

  • Learning Linux or any certification
  • Learning a computer language
  • Learning a human language
  • Even earning a degree

Whatever the objective this will always help to keep the ideas together and organized.

Personally I’m earning a few Linux Professional Institute certifications. The image below shows a mind map I am creating as I go through the systemd materials.

Conclusion

Now you have a start on how you can use Freeplane. Freeplane gives you all the tools you’ll need to create great, vibrant, and useful mind maps. Share how you use it in the comments.

Posted on Leave a comment

How to generate an EPUB file on Fedora

It is becoming more popular to read content on smartphones. Every phone comes with its own ebook reader. Believe or not, it is very easy to create your own ebook files on Fedora.

This article shows two different methods to create an EPUB. The epub format is one of the most popular formats and is supported by many open-source applications.

Most people will ask “Why bother creating an EPUB file when PDFs are so easy to create?” The answer is: “Have you ever tried reading a sheet of paper when you can only see a small section at a time?” In order to read a PDF you have to keep zooming and moving around the document or scale it down to a small size to fit the screen. An EPUB file, on the other hand, is designed to fit many different screen types.

Method 1: ghostwriter and pandoc

This first method creates a quick ebook file. It uses a Markdown editor named ghostwriter and a command-line document conversion tool named pandoc.

You can either search for them and install them from the Software Center or you can install them from the terminal. If you are going to use the terminal to install them, run this command: sudo dnf install pandoc ghostwriter.

For those who are not aware of what Markdown is, here is a quick explanation. It is a simple markup language created a little over 15 years ago. It uses simple syntax to format plain text. Markdown files can then be converted to a whole slew of other document formats.

ghostwriter

Now for the tools. ghostwriter is a cross-platform Markdown editor that is easy to use and does not get in the way. pandoc is a very handy document converting tool that can handle hundreds of different formats.

To create your ebook, open ghostwriter, and start writing your document. If you have used Markdown before, you may be used to making the title of your document Heading 1 by putting a pound sign in front of it. Like this: # My Man Jeeves. However, pandoc will not recognize that as the title and put a big UNTITLED at the top of your ebook. Instead put a % in front of your title. For example, % My Man Jeeves. Sections or chapters should be formatted as Heading 2, i.e. ## Leave It to Jeeves. If you have subsections, use Heading 3 (###).

Once your document is complete, click File -> Export (or press Ctrl + E). In the dialog box, select between several options for the Markdown converter. If this is the first time you have used ghostwriter, the Sundown converter will be picked by default. From the dialog box, select pandoc. Next click Export. Your EPUB file is now created.

ghostwriter export dialog box

Note: If you get an error saying that there was an issue with pandoc, turn off Smart Typography and try again.

Method 2: calibre

If you want a more polished ebook, this is the method that you are looking for. It takes a few more steps, but it’s worth it.

First, install an application named calibre. calibre is not just an ebook reader, it is an ebook management system. You can either install it from the Software Center or from the terminal via sudo dnf install calibre.

In this method, you can either write your document in LibreOffice, ghostwriter, or another editor of your choice. Make sure that the title of the book is formatted as Heading 1, chapters as Heading 2, and sub-sections as Heading 3.

Next, export your document as an HTML file.

Now add the file to calibre. Open calibre and click “Add books“. It will take calibre a couple of seconds to add the file.

Once the file is imported, edit the file’s metadata by clicking on the “Edit metadata” button. Here you can fill out the title of the book and the author’s name. You can also upload a cover image (if you have one) or calibre will generate one for you.

Next, click the “Convert books” button. In the new dialog box, select the “Look & Feel” section and the “Layout” tab. Check the “Remove spacing between paragraphs” option. This will tighten up the contents as indent each paragraph.

Now, set up the table of contents. Select the “Table of Contents” section. There are three options to focus on: Level 1 TOC, Level 2 TOC, and Level 3 TOC. For each, click the wand at the end. In this new dialog box, select the HTML tag that applies to the table of contents entry. For example, select h1 for Level 1 TOC and so on.

Next, tell calibre to include the table of contents. Select the “EPUB output” section and check the “Insert Inline Table of Contents“. To create the epub file, click “OK“.

Now you have a professional-looking ebook file.

Posted on Leave a comment

Use FastAPI to build web services in Python

FastAPI is a modern Python web framework that leverage the latest Python improvement in asyncio. In this article you will see how to set up a container based development environment and implement a small web service with FastAPI.

Getting Started

The development environment can be set up using the Fedora container image. The following Dockerfile prepares the container image with FastAPI, Uvicorn and aiofiles.

FROM fedora:32
RUN dnf install -y python-pip \ && dnf clean all \ && pip install fastapi uvicorn aiofiles
WORKDIR /srv
CMD ["uvicorn", "main:app", "--reload"]

After saving this Dockerfile in your working directory, build the container image using podman.

$ podman build -t fastapi .
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/fastapi latest 01e974cabe8b 18 seconds ago 326 MB

Now let’s create a basic FastAPI program and run it using that container image.

from fastapi import FastAPI app = FastAPI() @app.get("/")
async def root(): return {"message": "Hello Fedora Magazine!"}

Save that source code in a main.py file and then run the following command to execute it:

$ podman run --rm -v $PWD:/srv:z -p 8000:8000 --name fastapi -d fastapi
$ curl http://127.0.0.1:8000
{"message":"Hello Fedora Magazine!"

You now have a running web service using FastAPI. Any changes to main.py will be automatically reloaded. For example, try changing the “Hello Fedora Magazine!” message.

To stop the application, run the following command.

$ podman stop fastapi

Building a small web service

To really see the benefits of FastAPI and the performance improvement it brings (see comparison with other Python web frameworks), let’s build an application that manipulates some I/O. You can use the output of the dnf history command as data for that application.

First, save the output of that command in a file.

$ dnf history | tail --lines=+3 > history.txt

The command is using tail to remove the headers of dnf history which are not needed by the application. Each dnf transaction can be represented with the following information:

  • id : number of the transaction (increments every time a new transaction is run)
  • command : the dnf command run during the transaction
  • date: the date and time the transaction happened

Next, modify the main.py file to add that data structure to the application.

from fastapi import FastAPI
from pydantic import BaseModel app = FastAPI() class DnfTransaction(BaseModel): id: int command: str date: str

FastAPI comes with the pydantic library which allow you to easily build data classes and benefit from type annotation to validate your data.

Now, continue building the application by adding a function that will read the data from the history.txt file.

import aiofiles from fastapi import FastAPI
from pydantic import BaseModel app = FastAPI() class DnfTransaction(BaseModel): id: int command: str date: str async def read_history(): transactions = [] async with aiofiles.open("history.txt") as f: async for line in f: transactions.append(DnfTransaction( id=line.split("|")[0].strip(" "), command=line.split("|")[1].strip(" "), date=line.split("|")[2].strip(" "))) return transactions

This function makes use of the aiofiles library which provides an asyncio API to manipulate files in Python. This means that opening and reading the file will not block other requests made to the server.

Finally, change the root function to return the data stored in the transactions list.

@app.get("/")
async def read_root(): return await read_history()

To see the output of the application, run the following command

$ curl http://127.0.0.1:8000 | python -m json.tool
[
{ "id": 103, "command": "update", "date": "2020-05-25 08:35"
},
{ "id": 102, "command": "update", "date": "2020-05-23 15:46"
},
{ "id": 101, "command": "update", "date": "2020-05-22 11:32"
},
....
]

Conclusion

FastAPI is gaining a lot a popularity in the Python web framework ecosystem because it offers a simple way to build web services using asyncio. You can find more information about FastAPI in the documentation.

The code of this article is available in this GitHub repository.


Photo by Jan Kubita on Unsplash.