Linux, FreeBSD, and Unix types – Page 13

Posted on March 29, 2019 by — Leave a comment

SREs Wish Automation Solved All Their Problems

Although the SRE job role is often defined as being about automation, the reality is that 59 percent of SREs agree there is too much toil (defined as manual, repetitive, tactical work that scales linearly) in their organization. Based on 188 survey responses from people holding SRE job roles, Catchpoint’s second annual SRE Report surprisingly found that almost half (49 percent) of the SREs believe their organization has not used automation to reduce toil.

Often being inspired by DevOps, SREs have high expectations for automation. Yet, there are key differences between the two and SRE responsibilities are much closer to those associated with systems administrators. SREs have the capability to automation and innovate but are often burdened by IT operations historical focus on incident management and reliability.

Handling Complex Memory Situations

Jérôme Glisse felt that the time had come for the Linux kernel to address seriously the issue of having many different types of memory installed on a single running system. There was main system memory and device-specific memory, and associated hierarchies regarding which memory to use at which time and under which circumstances. This complicated new situation, Jérôme said, was actually now the norm, and it should be treated as such.

The physical connections between the various CPUs and devices and RAM chips—that is, the bus topology—also was relevant, because it could influence the various speeds of each of those components.

Jérôme wanted to be clear that his proposal went beyond existing efforts to handle heterogeneous RAM. He wanted to take account of the wide range of hardware and its topological relationships to eek out the absolute highest performance from a given system.

Solus 4 Linux Gaming Report: A Great Nvidia, Radeon And Steam User Experience

This article is the third in a series on Linux-powered gaming that aims to capture the various nuances in setup, as well as uncover potential performance variations between nine different desktop Linux operating systems.

Solus is a fascinating Linux distribution. It’s built from scratch, falls under the category of rolling release and by default ships with the Budgie desktop environment — which was also developed by the Solus Project. Other desktop environment ISOs like Gnome and MATE are available.

Solus, which recently updated to version 4.0, is aimed at home desktop users and Linux beginners. It made a positive first impression on me, so I’ll be covering it outside of this Linux Gaming Report in the near future.

Kubernetes 1.14: Production-level support for Windows Nodes, Kubectl Updates, Persistent Local Volumes GA

We’re pleased to announce the delivery of Kubernetes 1.14, our first release of 2019!

Kubernetes 1.14 consists of 31 enhancements: 10 moving to stable, 12 in beta, and 7 net new. The main themes of this release are extensibility and supporting more workloads on Kubernetes with three major features moving to general availability, and an important security feature moving to beta.

More enhancements graduated to stable in this release than any prior Kubernetes release. This represents an important milestone for users and operators in terms of setting support expectations. In addition, there are notable Pod and RBAC enhancements in this release, which are discussed in the “additional notable features” section below.

Let’s dive into the key features of this release:

Production-level Support for Windows Nodes

Up until now Windows Node support in Kubernetes has been in beta, allowing many users to experiment and see the value of Kubernetes for Windows containers. Kubernetes now officially supports adding Windows nodes as worker nodes and scheduling Windows containers, enabling a vast ecosystem of Windows applications to leverage the power of our platform. Enterprises with investments in Windows-based applications and Linux-based applications don’t have to look for separate orchestrators to manage their workloads, leading to increased operational efficiencies across their deployments, regardless of operating system.

Can Better Task Stealing Make Linux Faster?

Oracle Linux kernel developer Steve Sistare contributes this discussion on kernel scheduler improvements.

Load balancing via scalable task stealing

The Linux task scheduler balances load across a system by pushing waking tasks to idle CPUs, and by pulling tasks from busy CPUs when a CPU becomes idle. Efficient scaling is a challenge on both the push and pull sides on large systems. For pulls, the scheduler searches all CPUs in successively larger domains until an overloaded CPU is found, and pulls a task from the busiest group. This is very expensive, costing 10’s to 100’s of microseconds on large systems, so search time is limited by the average idle time, and some domains are not searched. Balance is not always achieved, and idle CPUs go unused.

I have implemented an alternate mechanism that is invoked after the existing search in idle_balance() limits itself and finds nothing. I maintain a bitmap of overloaded CPUs, where a CPU sets its bit when its runnable CFS task count exceeds 1. The bitmap is sparse, with a limited number of significant bits per cacheline. This reduces cache contention when many threads concurrently set, clear, and visit elements. There is a bitmap per last-level cache. When a CPU becomes idle, it searches the bitmap to find the first overloaded CPU with a migratable task, and steals it. This simple stealing yields a higher CPU utilization than idle_balance() alone, because the search is cheap, costing 1 to 2 microseconds, so it may be called every time the CPU is about to go idle. Stealing does not offload the globally busiest queue, but it is much better than running nothing at all.

Results

Stealing improves utilization with only a modest CPU overhead in scheduler code. In the following experiment, hackbench is run with varying numbers of groups (40 tasks per group), and the delta in /proc/schedstat is shown for each run, averaged per CPU, augmented with these non-standard stats:

%find – percent of time spent in old and new functions that search for idle CPUs and tasks to steal and set the overloaded CPUs bitmap.
steal – number of times a task is stolen from another CPU. Elapsed time improves by 8 to 36%, costing at most 0.4% more find time.

CPU busy utilization is close to 100% for the new kernel, as shown by the green curve in the following graph, versus the orange curve for the baseline kernel:

Stealing improves Oracle database OLTP performance by up to 9% depending on load, and we have seen some nice improvements for mysql, pgsql, gcc, java, and networking. In general, stealing is most helpful for workloads with a high context switch rate.

The code

As of this writing, this work is not yet upstream, but the latest patch series is at https://lkml.org/lkml/2018/12/6/1253. If your kernel is built with CONFIG_SCHED_DEBUG=y, you can verify that it contains the stealing optimization using

 # grep -q STEAL /sys/kernel/debug/sched_features && echo Yes Yes

If you try it, note that stealing is disabled for systems with more than 2 NUMA nodes, because hackbench regresses on such systems, as I explain in https://lkml.org/lkml/2018/12/6/1250 .However, I suspect this effect is specific to hackbench and that stealing will help other workloads on many-node systems. To try it, reboot with kernel parameter sched_steal_node_limit = 8 (or larger).

Future work

After the basic stealing algorithm is pushed upstream, I am considering the following enhancements:

If stealing within the last-level cache does not find a candidate, steal across LLC’s and NUMA nodes.
Maintain a sparse bitmap to identify stealing candidates in the RT scheduling class. Currently pull_rt_task() searches all run queues.
Remove the core and socket levels from idle_balance(), as stealing handles those levels. Remove idle_balance() entirely when stealing across LLC is supported.
Maintain a bitmap to identify idle cores and idle CPUs, for push balancing.

This article originally appeared at Oracle Developers Blog.

Posted on March 28, 2019 by — Leave a comment

Linux Release Roundup: Applications and Distros Released This Week

This is a continually updated article that lists various Linux distribution and Linux-related application releases of the week.

At It’s FOSS, we try to provide you with all the major happenings of the Linux and Open Source world. But it’s not always possible to cover all the news, specially the minor releases of a popular application or a distribution.

Hence, I have created this page, which I’ll be continually updating with the links and short snippets of the new releases of the current week. Eventually, I’ll remove releases older than 2 weeks from the page.

How to Install NTP Server and Client(s) on Ubuntu 18.04 LTS

NTP or Network Time Protocol is a protocol that is used to synchronize all system clocks in a network to use the same time. When we use the term NTP, we are referring to the protocol itself and also the client and server programs running on the networked computers. NTP belongs to the traditional TCP/IP protocol suite and can easily be classified as one of its oldest parts.

When you are initially setting up the clock, it takes six exchanges within 5 to 10 minutes before the clock is set up. Once the clocks in a network are synchronized, the client(s) update their clocks with the server once every 10 minutes. This is usually done through a single exchange of message(transaction). These transactions use port number 123 of your system.

In this article, we will describe a step-by-step procedure on how to:

Install and configure the NTP server on a Ubuntu machine.
Configure the NTP Client to be time synced with the server.

We have run the commands and procedures mentioned in this article on a Ubuntu 18.04 LTS system.

KubeCon + CloudNativeCon North America

The Cloud Native Computing Foundation’s flagship conference gathers adopters and technologists from leading open source and cloud native communities in San Diego, California from November 18-21, 2019. Join Kubernetes, Prometheus, Envoy, CoreDNS, OpenTracing, Fluentd, gRPC, containerd, rkt, CNI, Jaeger, Notary, TUF, Vitess, NATS, Linkerd, Helm, Harbor and etcd as the community gathers for four days to further the education and advancement of cloud native computing.

Learn more

Posted on March 27, 2019 by — Leave a comment

Linux Security Summit Europe

The Linux Security Summit (LSS) is a technical forum for collaboration between Linux developers, researchers, and end users with the primary aim of fostering community efforts in analyzing and solving Linux security challenges.

LSS is where key Linux security community members and maintainers gather to present and discuss their work and research to peers, joined by those who wish to keep up with the latest in Linux security development and who would like to provide input to the development process.

Learn more