Posted on Leave a comment

Van Jones & Linus Torvalds to Keynote at Open Source Summit North America

We’ve announced additional keynotes at Open Source Summit North America, including:

  • Van Jones, President & Founder of the nonprofit, Dream CorpsCNN ContributorBest-Selling Author; Human Rights, Education and Clean Energy Advocate
  • Austen Collins, Founder & CEOServerless Inc.
  • Linus Torvalds, creator of Linux and Git, in conversation with Dirk Hohndel, VP & Chief Open Source Officer, VMware

View the Full Schedule >>

The event also features the Open Collaboration Conference where ecosystem leaders learn to navigate open source transformation with sessions covering compliance, community leadership and open source program office management in the new TODO track. The Diversity Empowerment Summit is also featured, which highlights the ways in which the community can benefit from expanding diversity and inclusion practices.

Register now to save $150 through July 21.

Register Now >>

Read more at The Linux Foundation

Posted on Leave a comment

Stack vs Heap. What’s the Difference and Why Should I Care?

I’m four months into the curriculum at Holberton School and we’ve solved multiple problems using the malloc, realloc, calloc and free functions in the C programming language. What better way to build a solid foundation of how memory gets allocated then to write a technical post on the stack versus the heap?

This article explains in depth:

What are the five segments of memory?

What is the stack?

What is the heap?

How does understanding the two make you a better software engineer?

What are the five segments of memory?

When we write applications, files, or any logic that is typed in an editor and executed on the computer, the computer has to allocate memory for the program to run. The memory that is assigned to a program or application in a computer can be divided into five parts. The amount of memory that get’s assigned to an application depends on the computer’s architecture and will vary across most devices, but the variable that remains constant is the five parts of an application’s memory which are the heap, stack, initialized data segment, uninitialized data segment, and the text segment.

The initialized data segmentconsists of all the global and static variables that are initialized when a file gets compiled. The uninitialized data segment consists of all global and static variables that are initialized to zero or do not have explicit initialization in source code.

At Holberton, most of the time we are not concerned about the uninitialized data segment because when we compile our programs with gcc, we use the flags, -Wall -Wextra -pedantic -Werror and we use an internal stylistic checker called betty which treats warning as errors when uninitialized variables are present. Having unused variables in our programs gets flagged and is not a best practice. The text segment, also known as the code segment, contains the machine instructions which make up your program. The text segment is often read-only and prevents a program from accidentally modifying its instructions.

What is the stack?

The stack is a segment of memory where data like your local variables and function calls get added and/or removed in a last-in-first-out (LIFO) manner. When you compile a program, the compiler enters through the main function and a stack frame is created on the stack. A frame, also known as an activation record is the collection of all data on the stack associated with one subprogram call. The main function and all the local variables are stored in an initial frame.

Program vs Stack usage

In the picture above, we have one stack frame on the stack that holds the main function, along with the local a, b and sum variables. After using the printf() function the frame we created along with the local variables are only accessible in memory for the duration of the frame are no longer accessible after returning the 0 value from the function.

What happens with the stack when we call multiple functions? To illustrate the stack in it’s LIFO manner, let’s solve a problem using recursion. When we call multiple functions in our application, we use multiple stack frames in a last-in-first-out approach meaning that the last stack frame we’ve created on the stack is the first stack that will be released after the function is done executing its logic. Let’s go over an example of printing out the name “Holberton” recursively and show how our code affects the stack memory segment.

Yes, I have a whiteboard on the back of my door at my house.

When we compile our code using gcc _putchar.c 0-puts_recursion.c 0-main.c , the compiler enters our program through int main(void) and creates a frame with the function int main(void) and _puts_recursion("Holberton") living on that frame as illustrated on the image above. When the compiler runs into the _puts_recursion() function, it calls that function and creates another stack frame on top of the previous stack frame where int main(void) lives. We are now in our second stack frame in our program and have entered in the _puts_recursion(char *s)function where *s is equal to 'H' and is only accessible in that stack frame. Because 'H' does not equal '\0' , we will continue with our function calls and execute the _putchar('H') function and enter into the same function _puts_recursion(++s). The argument ++s moves the memory address of the *s one byte because the size of a char is 1 byte on our machine, and now _puts_recursion is calling the function as _puts_recrusion('o') . Each time the _puts_recursion function is called, a new stack frame is put on the stack until we hit the terminating condition which is if (*s == '\0').

Every time a new stack frame is created, the stack pointer moves with it until it reaches the terminating condition. A stack pointer is a small register that stores the address of the last program request in a frame. When we hit the terminating condition, we execute our logic, then start to unwind the stack or pop off stack frames in the last-in-first-out manner until we reach out return(0) logic in the int main(void) function in our first stack frame.

If you don’t have a terminating case for the recursive example above, the stack will continue to grow in size adding additional stack frames on-top of each other, moving the stack pointer upward on each call, against the heap, which will be explained in the next section. In a recursive function , if there is no valid terminating condition, the stack will grow until you’ve completed consumed all the memory that’s been allocated for your program by the operating system. When the stack pointer exceeds the stack bound, you have a condition called stack overflow. Bad things happen when you have a stack overflow.

Let’s first refer back to the other four segments of your application’s memory which were the uninitialized and initialized data segments, text segment and stack segment. These four segments have a constant memory size during compilation. The memory size for these four segments is predetermined by your operating system before compiling your programs. When software engineers write programs that consume large amounts of memory from a machine, they have to consider where and how much memory is being consumed in their application.

The max stack size is constant and predetermined before a program is compiled. At Holberton, we use a Linux Ubuntu/Trusty64 distributions. To find information about the stack size and other neat limits, type the command below into your terminal.

ulimit -a

Where ulimit is a function that gets and sets user limits and the -a flag lists all the current limits.

Stack size is 8.192MB of memory.

If the stack is limited in size and a program needs more memory for it to execute, where can a software engineer pull memory from for his/her application? This is where the heap comes into play.

What is the heap?

The heap is the segment of memory that is not set to a constant size before compilation and can be controlled dynamically by the programmer. Think of the heap as a “free pool” of memory you can use when running your application. The size of the heap for an application is determined by the physical constraints of your RAM (Random access memory) and is generally much larger in size than the stack.

We use memory from the heap when we don’t know how much space a data structure will take up in our program, when we need to allocate more memory than what’s available on the stack, or when we need to create variables that last the duration of our application. We can do that in the C programming language by using mallocrealloccalloc and/or free. Check out the example below.

Allocating 4000 bytes of memory to our program, then releasing it.

We allocate memory from the heap using the malloc() function. The argument we want to include in malloc is the amount of memory we want to allocate to our application, in bytes. Malloc returns a void pointer that is type casted into an integer pointer that now points to the first address in memory for our 4000 byte long memory. We can now store information in those memory addresses and do as we please to that information for the duration of our program or for the duration of our function because we have a pointer that references the first memory address from the newly allocated heap memory.

If you aren’t intentionally creating variables that last the duration of your application from the heap, you always want to release the memory back to the machine using the free() function. If you don’t release the memory using the free() function, you have memory that will persist throughout your program. If we do not release the memory from our program before terminating the application, our application has memory leaks. If your application has enough memory leaks, it can consume more memory than is physically available and can cause programs to crash. This is why we use a program called valgrindValgrind is easy to use and checks for memory leaks.

Valgrind being used. 4,000 bytes allocated. 0 bytes leaks

Another thing to consider while using the heap, the pointer variables created on the heap are accessible by any function, anywhere in your program, as long as the memory is still persistent and hasn’t been free.

How does understanding the stack and heap make you a better software engineer?

If you understand the advantages and disadvantages of using the stack vs the heap for your application, then it gives you a strategic advantage for creating scalable programs. You, the programmer, have to decide when to use memory from the stack vs heap based on each problem you are trying to solve.

If you have a variable like an array or struct that needs to be stored in a large block memory, needs to persist throughout the lifetime of your application and could change in size throughout the duration of your program, then you should allocate it from the heap.

If you need to create helper functions with variables that only persist within the lifetime of the function, then you should allocate memory from the stack. Memory from the stack is easier to keep track of because the memory is only locally available in the function call which does not persist after the function is completed and is managed by the CPU.

Photo credit: Gribble Lab

Questions, comments or concerns, feel free to comment below, follow me or find me on Twitter @ NTTL_LTTN.

References:

My Code School. (February 23rd, 2013). Pointers and dynamic memory — stack vs heap. . Retrieved from https://www.youtube.com/watch?v=_8-ht2AKyH4

Paul Gribble (2012). C Programming Boot Camp — 7. Memory: Stack vs Heap. [Blog post]. Retrieved from https://www.gribblelab.org/CBootCamp/7_Memory_Stack_vs_Heap.html#orgheadline1

GeeksforGeeks. Memory Layout of C Programs. [Blog post]. Retrieved from https://www.geeksforgeeks.org/memory-layout-of-c-program/

Sandra Henry-Stocker. (November 18th, 2012). NETWORK WORLD — Setting limits with ulimit. [Blog post]. Retrieved from https://www.networkworld.com/article/2693414/operating-systems/setting-limits-with-ulimit.html

Valgrind Developers (2000–2017). Valgrind. Retrieved from http://valgrind.org/

Die.net Linux Documentation. Retrieved from https://linux.die.net/

Posted on Leave a comment

Google Becomes a Platinum Member of The Linux Foundation

Demonstrating its commitment to open source software, we are thrilled to announce that Google is now a Platinum Member of The Linux Foundation. Google has been an active and committed contributor to the open source community for many years, releasing and contributing to more than 10,000 open source projects to date. Some of The Linux Foundation communities Google supports include Cloud Foundry, Node.js Foundation, Open API Initiative and Cloud Native Computing Foundation, which it helped found with its Kubernetes contribution.

“The Linux Foundation is a fixture in the open source community. By working closely with the organization, we can better engage with the community-at-large and continue to build a more inclusive ecosystem where everyone can benefit,” said Sarah Novotny, head of open source strategy, Google Cloud. 

Read more at The Linux Foundation

Posted on Leave a comment

CIP: Keeping the Lights On with Linux

Modern civil infrastructure is all around us — in power plants, radar systems, traffic lights, dams, weather systems, and so on. Many of these infrastructure projects exist for decades, if not longer, so security and longevity are paramount.

And, many of these systems are powered by Linux, which offers technology providers more control over these issues. However, if every provider is building their own solution, this can lead to fragmentation and duplication of effort. Thus, the primary goal of Civil Infrastructure Platform (CIP) is to create an open source base layer for industrial use-cases in these systems, such as embedded controllers and gateway devices.

“We have a very conservative culture in this area because once we create a system, it has to be supported for more than ten years; in some cases for over 60 years. That’s why this project was created, because every player in this industry had the same issue of being able to use  Linux for a long time,” says Yoshitake Kobayashi is Technical Steering Committee Chair of CIP.

CIP’s concept is to create a very fundamental system to use open source software on controllers. This base layer comprises the Linux kernel and a small set of common open source software like libc, busybox, and so on.  Because longevity of software is a primary concern, CIP chose Linux kernel 4.4, which is the LTS release of the kernel maintained by Greg Kroah-Hartman.

Collaboration

Since CIP has an upstream first policy, the code that they want in the project must be in the upstream kernel. To create a proactive feedback loop with the kernel community, CIP hired Ben Hutchings as the official maintainer of CIP. Hutchings is known for the work he has done on Debian LTS release, which also led to an official collaboration between CIP and the Debian project.

Under the newly forged collaboration, CIP will use Debian LTS to build the platform. CIP will also help Debian Long Term Support (LTS) to extend the lifetime of all Debian stable releases. CIP will work closely with Freexian, a company that offers commercial services around Debian LTS. The two organizations will focus on interoperability, security, and support for open source software for embedded systems. CIP will also provide funding for some of the Debian LTS activities.

“We are excited about this collaboration as well as the CIP’s support of the Debian LTS project, which aims to extend the support lifetime to more than five years. Together, we are committed to long-term support for our users and laying the ‘foundation’ for the cities of the future.” said Chris Lamb, Debian Project Leader.

Security

Security is the biggest concern, said Kobayashi. Although most of the civil infrastructure is not connected to the Internet for obvious security reasons (you definitely don’t want a nuclear power plant to be connected to the Internet), there are many other risks.

Just because the system itself is not connected to the Internet, that doesn’t mean it’s immune to all threats. Other systems — like user’s laptops — may connect to the Internet and then be plugged into the local systems. If someone receives a malicious file as an attachment with email, it can “contaminate” the internal infrastructure.

Thus, it’s critical to keep all software running on such controllers up to date and fully patched. To ensure security, CIP has also backported many components of the Kernel Self Protection project. CIP also follows one of the strictest cybersecurity standards — IEC 62443 — which defines processes and tests to ensure the system is more secure.

Going forward

As CIP is maturing, it’s extending its collaboration with providers of Linux. In addition to collaboration with Debian and freexian, CIP recently added Cybertrust Japan Co, Ltd., a supplier of enterprise Linux operating system, as a new Silver member.

Cybertrust joins other industry leaders, such as Siemens, Toshiba, Codethink, Hitachi, Moxa, Plat’Home, and Renesas, in their work to create a reliable and secure Linux-based embedded software platform that is sustainable for decades to come.

The ongoing work of these companies under the umbrella of CIP will ensure the integrity of the civil infrastructure that runs our modern society.

Learn more at the Civil Infrastructure Platform website.

Posted on Leave a comment

Linux Professionals Hard to Find, Say Hiring Managers

It’s a very good time to be a Linux professional. Linux is back on top as the most in-demand open source skill and hiring these professionals has become a higher priority for 83% of hiring managers this year compared to 76% in 2017, according to the newly released 2018 Open Source Jobs Report.  

That’s not surprising when you consider how popular cloud and container technologies have become, as well as DevOp practices, all of which typically run on Linux. What’s also not surprising is that Linux professionals are in high demand:

  • 87% of hiring managers experience difficulties recruiting enough open source talent. This is similar to last year, when 89% said it was a challenge finding the right mix of experience and skills.

  • 44% of respondents rated it very difficult to recruit open source pros, a percentage that jumped from 34% in 2017.

At the same time, 52% of hiring managers say they are planning to hire more open source professionals in the next six months than they did in the previous six months. And, hiring of open source professionals will also increase more than hiring for other areas of the business for 60% of hiring managers, the report found. That’s down slightly from last year when 58% projected more hiring in six months and 67% predicted more open source hiring than other areas of the business.

This high demand has prompted many companies to pay premiums above base salary, especially for professionals with skills in cybersecurity, big data and process management.

And companies are finding that supporting open source projects can be a valuable recruiting and retention tool. This year, 57% of hiring managers reported that their organization contributes to open source projects, up from 50% in 2017. Nearly half (48%) of hiring managers report that their organization has decided to financially support or contribute code to an open source project specifically with the goal of recruiting developers who work on that project.

The role of the economy

The strong global economy is encouraging half of hiring managers to hire more staff this year, up from 43% in 2017. Only 6% said the economy is leading them to decrease their open source hiring.

The report found 55% of open sources pros say it would be easier to find a new open source job — a slight increase compared to 52% in 2017 and 50% in 2016. Only 19% reported not receiving a recruitment call in past six months. This is a significant decline from the 27% who said they didn’t receive a recruitment call in 2016 and 2017 job surveys.

The overall unemployment rate for tech professionals is on the decline – in April it dropped to 1.9%, compared to 3% one year ago.  

The 2018 Open Source Jobs Report is an annual partnership between The Linux Foundation and IT career site Dice. This year’s survey includes responses from more than 750 hiring managers at corporations, small and medium businesses (SMBs), government agencies and staffing firms worldwide, plus more than 6,500 open source professionals.

Download the complete Open Source Jobs Report now.

Posted on Leave a comment

Last Chance to Speak at Open Source Summit and ELC + OpenIoT Summit Europe – Submit by July 1

Submit a proposal to speak at Open Source Summit Europe & ELC + OpenIoT Summit Europe, taking place October 22-24, 2018, in Edinburgh, UK, and share your knowledge and expertise with 2,000+ open source technologists and community leaders. Proposals are being accepted through 11:59pm PDT, Sunday, July 1.

This year’s tracks and content will cover the following areas at Open Source Summit Europe:

  • Cloud Native Apps/Serverless/Microservices
  • Infrastructure & Automation (Cloud/Cloud Native/DevOps)
  • Linux Systems
  • Artificial Intelligence & Data Analytics
  • Emerging Technologies & Wildcard (Networking, Edge, IoT, Hardware, Blockchain)
  • Community, Compliance, Governance, Culture, Open Source Program Management (Open Collaboration Conference track)
  • Diversity & Inclusion (Diversity Empowerment Summit)
  • Innovation at Apache/Apache Projects
  • TODO / Open Source Program Management

View the full list of suggested topics for Open Source Summit Europe.

Suggested Embedded Linux Conference (ELC) Topics:

Read more at The Linux Foundation

Posted on Leave a comment

Open Source Guides for the Enterprise Now Available in Chinese

The popular Open Source Guides for the Enterprise, developed by The Linux Foundation in collaboration with the TODO Group, are now available in Chinese. This set of guides provides industry-proven best practices to help organizations successfully leverage open source.

“Making these resources available to Chinese audiences in their native language will encourage even greater adoption of and participation with open source projects,” said Chris Aniszczyk, CTO of Cloud Native Computing Foundation and co-founder of the TODO Group. The guides span various stages of the open source project lifecycle, from initial planning and formation to winding down a project.

The 10 guides now available in Mandarin include topics such as:

  • Creating an Open Source Program by Chris Aniszczyk, Cloud Native Computing Foundation; Jeff McAffer, Microsoft; Will Norris, Google; and Andrew Spyker, Netflix
  • Using Open Source Code by Ibrahim Haddad, Samsung Research America
  • Participating in Open Source Communities by Stormy Peters, Red Hat; and Nithya Ruff, Comcast
  • Recruiting Open Source Developers by Guy Martin, Autodesk; Jeff Osier-Mixon, Intel Corporation; Nithya Ruff; and Gil Yehuda, Oath
  • Measuring Your Open Source Program’s Success by Christine Abernathy, Facebook; Chris Aniszczyk; Joe Beda, Heptio; Sarah Novotny, Google; and Gil Yehuda

The translated guides were launched at the LinuxCon + ContainerCon + CloudOpen China conference in Beijing, where The Linux Foundation also welcomed Chinese Internet giant Tencent as a Platinum Member.

This post originally appeared at The Linux Foundation.

Posted on Leave a comment

Python 3: Sometimes Immutable Is Mutable and Everything Is an Object

What is Python?

Python is an interpreted, interactive object-oriented programming language; it incorporated modules, classes, exceptions, dynamic typing and high level data types. Python is also powerful when it comes to clear syntax. It is a high-level general-purpose programming language that can be applied to many different classes of problems — with a large standard library that encapsulates string processing (regular expressions, Unicode, calculating differences between files), Internet protocols (HTTP, FTP, SMTP, XML-RPC, POP, IMAP, CGI programming), software engineering (unit testing, logging, profiling, parsing Python code), and operating system interfaces (system calls, filesystems, TCP/IP sockets). Here are some of Python’s features:

  • An interpreted (as opposed to compiled) language. Contrary to C, for example, Python code does not need to be compiled before executing it. In addition, Python can be used interactively: many Python interpreters are available, from which commands and scripts can be executed.
  • A free software released under an open-source license: Python can be used and distributed free of charge, even for building commercial software.
  • Multi-platform: Python is available for all major operating systems, Windows, Linux/Unix, MacOS X, most likely your mobile phone OS, etc.
  • A very readable language with clear non-verbose syntax
  • A language for which a large variety of high-quality packages are available for various applications, from web frameworks to scientific computing.
  • A language very easy to interface with other languages, in particular C and C++.
  • Some other features of the language are illustrated just below. For example, Python is an object-oriented language, with dynamic typing (an object’s type can change during the course of a program).

What does it mean to be an object-oriented language?

Python is a multi-paradigm programming language. Meaning, it supports different programming approach. One of the popular approach to solve a programming problem is by creating objects. This is known as Object-Oriented Programming (OOP).

An object has two characteristics:
1) attributes
2) behavior

Let’s take an example:

Dog is an object:
a) name, age, color are data
b) singing, dancing are behavior

We call data as attributes and behavior as methods in object oriented programming. Again:

Data → Attributes & Behavior → Methods

The concept of OOP in Python focuses on creating reusable code. This concept is also known as DRY (Don’t Repeat Yourself). In Python, the concept of OOP follows some basic principles:

Inheritance — A process of using details from a new class without modifying existing class.
Encapsulation — Hiding the private details of a class from other objects.
Polymorphism — A concept of using common operation in different ways for different data input.

Class

A class is a blueprint for the object.

We can think of class as an sketch of a dog with labels. It contains all the details about the name, colors, size etc. Based on these descriptions, we can study about the dog. Here, dog is an object.

The example for class of dog can be :

class Dog: pass

Here, we use class keyword to define an empty class Dog. From class, we construct instances. An instance is a specific object created from a particular class.

A Class is the blueprint from which individual objects are created. In the real world we often find many objects with all the same type. Like cars. All the same make and model (have an engine, wheels, doors, …). Each car was built from the same set of blueprints and has the same components.

Object

Think of an object in Python as a block of memory, and a variable is just something that points/references to that block of memory. All the information relevant to your data is stored within the object itself. And the variable stores the address to that object. So it actually doesn’t matter if you reassign a variable pointing to an integer to point to a different data type.

>>> a = 1
>>> a = "I am a string now"
>>> print(a)
I am a string now

Every object has its own identity/ID that stores its address in memory. Every object has a type. An object can also hold references to other objects. For example, an integer will not have references to other objects but if the object is a list, it will contain references to each object within this list. We will touch up on this when we look at tuples later.

The built-in function id() will return an object’s id and type() will return an object’s type:

>>> list_1 = [1, 2, 3]
# to access this object's value
>>> list_1 [1, 2, 3]
# to access this object's ID
>>> id(list_1) 140705683311624
# to access object's data type
>>> type(list_1) <class 'list'>

So, an object (instance) is an instantiation of a class. When class is defined, only the description for the object is defined. Therefore, no memory or storage is allocated.

The example for object of class Dog can be:

obj = Dog()

Here, obj is object of class Dog.

Suppose we have details of Dog. Now, we are going to show how to build the class and objects of Dog.

class Dog:
#class attribute species = "animal"
# instance attribute def __init__(self, name, age): self.name = name self.age = age
# instantiate the Dog class
blu = Dog("Blu", 10)
woo = Dog("Woo", 15)
# access the class attributes
print("Blu is an {}".format(blu.__class__.species))
print("Woo is also an {}".format(woo.__class__.species))
# access the instance attributes
print("{} is {} years old".format( blu.name, blu.age))
print("{} is {} years old".format( woo.name, woo.age))

When we run the program, the output will be:

Blu is an animal
Woo is also an animal
Blu is 10 years old
Woo is 15 years old

In the above program, we create a class with name Dog. Then, we define attributes. The attributes are a characteristic of an object.

Then, we create instances of the Dog class. Here, blu and woo are references (value) to our new objects.

Then, we access the class attribute using __class __.species. Class attributes are same for all instances of a class. Similarly, we access the instance attributes using blu.name and blu.age. However, instance attributes are different for every instance of a class.

Let’s try to understand how value and identity are affected if you use operators “==” and “is”

The “==” operator compares values whereas “is” operator compares identities. Hence, a is b is similar to id(a) == id(y), but two different objects may share the same value, but they will never share the same identity.

Example:

>>> a = ['blu', 'woof']
>>> id(a)
1877152401480
>>> b = a
>>> id(b)
1877152401480
>>> id(a) == id(b)
True
>>> a is b
True
>>> c = ['blu', 'woof']
>>> a == c
True
>>> id(c)
1877152432200
>>> id(a) == id(c)
False

Hashability

What is a hash?

According to Python , “An object is hashable if it has a hash value which never changes during its lifetime”, if and only if the object is immutable.

A hash is an integer that depends on an object’s value, and objects with the same value always have the same hash. (Objects with different values will occasionally have the same hash too. This is called a hash collision.) While id() will return an integer based on an object’s identity, the hash() function will return an integer (the object’s hash) based on the hashable object’s value:

>>> a = ('cow', 'bull')
>>> b = ('cow', 'bull')
>>> a == b
True
>>> a is b
False
>>> hash(a)
6950940451664727300
>>> hash(b)
6950940451664727300
>>> hash(a) == hash(b)
True

Immutable objects can be hashable, mutable objects can’t be hashable.This is important to know, because (for reasons beyond the scope of this post) only hashable objects can be used as keys in a dictionary or as items in a set. Since hashes are based on values and only immutable objects can be hashable, this means that hashes will never change during the object’s lifetime.

Hashability will be covered more under the mutable vs immutable object section, as sometimes a tuple can be mutable and how does that change values and understanding of mutable objects and immutable objects.

To summarize, EVERYTHING is an object in Python the only difference is some are mutable and some immutable. Wait but what kind of objects are possible in Python and which ones are mutable and which ones aren’t?

Objects of built-in types like (bytes, int, float, bool, str, tuple, unicode, complex) are immutable. Objects of built-in types like (list, set, dict, array, bytearray) are mutable. Custom classes are mutable. To simulate immutability in a class, one should override attribute setting and deletion to raise exceptions.

Now how would a newbie know which variables are mutable objects and which ones are not? For this we use 2 very handy built-in functions called id() and type()

What is id() and type()?

Syntax to use id()
id(object)

As we can see the function accepts a single parameter and is used to return the identity of an object. This identity has to be unique and constant for this object during the lifetime. Two objects with non-overlapping lifetimes may have the same id() value. If we relate this to C, then they are actually the memory address, here in Python it is the unique id. This function is generally used internally in Python.

Examples:

The output is the identity of the object passed. This is random but when running in the same program, it generates unique and same identity. 
Input : id(2507)
Output : 140365829447504
Output varies with different runs
Input : id("Holberton")
Output : 139793848214784

What is an Alias?

>>> a = 1
>>> id(a)
1904391232
>>> b = a #aliasing a
>>> id(b)
1904391232
>>> b
1

An alias is a second name for a piece of data. Programmers use/ create aliases because it’s often easier and faster to refer data than to copy it. If the data that is being created and assigned is immutable then aliasing does not matter as the data won’t change, but there will be a lot of bugs if the data is mutable as it will lead to some issues like see below —

>>> a = 1
>>> id(a)
1904391232
>>> b = a #aliasing a
>>> id(b)
1904391232
>>> b
1
>>> a = 2
>>> id(2)
1904391264
>>> id(b)
1904391232
>>> b
1
>>> a
2

as it can be seen a now points to 2 and id is different as compared to b which is still pointing to 1. In Python, aliasing happens whenever one variable’s value is assigned to another variable, because variables are just names that store references to values.

type() method returns class type of the argument(object) passed as parameter. type() function is mostly used for debugging purposes.

Two different types of arguments can be passed to type() function, single and three argument. If single argument type(obj) is passed, it returns the type of given object.

Syntax :

type(object)

We can find out what class an object belongs to using the built-in type()function:

>>> Blue = [1, 2, 3]
>>> type(Blue)
<class 'list'>
>>> def my_func(x)
... x = 89
>>> type(my_func)
<class 'function'>

Now that we can compare variables to see their type and id’s, we can dive in deeper to understand how mutable and immutable objects work.

Mutable Objects vs. Immutable Objects

Not all Python objects handle changes the same way. Some objects are mutable, meaning they can be altered. Others are immutable; they cannot be changed but rather return new objects when attempting to update. What does this mean when writing Python code?

The following are some mutable objects:

  • list
  • dict
  • set
  • bytearray
  • user-defined classes (unless specifically made immutable)

The following are some immutable objects:

  • int
  • float
  • decimal
  • complex
  • bool
  • string
  • tuple
  • range
  • frozenset
  • bytes

The distinction is rather simple: mutable objects can change, whereas immutable objects cannot. Immutable literally means not mutable.

A standard example are tuple and list: A tuple is filled on creation, and then is frozen – its content cannot change anymore. To a list, one can append elements, set elements and delete elements at any time. Although keep in mind exceptions: tuple is an immutable list whereas frozenset is an immutable set. Quoting stackoverflow answer-Tuples are indeed an ordered collection of objects, but they can contain duplicates and unhashable objects, and have slice functionality frozensets aren’t indexed, but you have the functionality of sets – O(1) element lookups, and functionality such as unions and intersections. They also can’t contain duplicates, like their mutable counterparts.

Let’s create a dictionary with immutable objects for keys —

>>> a = {‘blu’: 42, True: ‘woof’, (‘x’, ‘y’, ‘z’): [‘hello’]}
>>> a.keys()
dict_keys([‘blu’, True, (‘x’, ‘y’, ‘z’)])

As seen above keys in a are immutable, hashable objects, but if you try to call hash() on a mutable object(such as sets), or trying to use a mutable object for a dictionary key, an error will be raised:

>>> spam = {['hello', 'world']: 42}
Traceback (most recent call last): File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list' >>> d = {'a': 1}
>>> spam = {d: 42}
Traceback (most recent call last): File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'

So, tuples, being immutable objects, can be used as dictionary keys?

>>> spam = {('a', 'b', 'c'): 'hello'}
Traceback (most recent call last): File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

As seen above, if a tuple contains a mutable object, according to the previous explanation about hashability it cannot be hashed. So, immutable objects can be hashable, but this doesn’t necessarily mean they’re alwayshashable. And remember, the hash is derived from the object’s value.

This is an interesting corner case: a tuple (which should be immutable) that contains a mutable list cannot be hashed. This is because the hash of the tuple depends on the tuple’s value, but if that list’s value can change, that means the tuple’s value can change and therefore the hash can change during the tuple’s lifetime.

So far it is now understood that some tuples are hashable — immutable but some other tuple are not hashable — mutable. According to official Python documentation immutable and mutable are defined as — “An object with a fixed value” and “Mutable objects can change their value”. This can possibly mean that mutability is a property of objects, hence it makes sense that some tuples will be mutable while others won’t be.

>>> a = ('dogs', 'cats', [1, 2, 3])
>>> b = ('dogs', 'cats', [1, 2, 3])
>>> a == b
True
>>> a is b
False
>>> a[2].append(99)
>>> a
('dogs', 'cats', [1, 2, 3, 99])
>>> a == b
False

In this example, the tuples a and b have equal (==) values but are different objects, so when list is changed in tuple a the values get changed as a is not longer == b and did not change values of b. This example states that tuples are mutable.

While Python tends towards mutability, there are many use-cases for immutability as well. Here are some straightforward ones:

  • Mutable objects are great for efficiently passing around data. Let’s say object anton and berta have access to the same listanton adds “lemons” to the list, and berta automatically has access to this information.
    If both would use a tuple, anton would have to copy the entries of his shopping-tuple, add the new element, create a new tuple, then send that to berta. Even if both can talk directly, that is a lot of work.
  • Immutable objects are great for working with the data. So berta is going to buy all that stuff – she can read everything, make a plan, and does not have to double check for changes. If next week, she needs to buy more stuff for the same shopping-tuple, berta just reuses the old plan. She has the guarantee that anton cannot change anything unnoticed.
    If both would use a listberta could not plan ahead. She has no guarantee that “lemons” are still on the list when she arrives at the shop. She has no guarantee that next week, she can just repeat what was appropriate last week.

You should generally use mutable objects when having to deal with growing data. For example, when parsing a file, you may append information from each line to a list. Custom objects are usually mutable, buffering data, adjusting to new conditions and so on. In general, whenever something can change, mutable objects are much easier.

Immutable objects are sparingly used in python — usually, it is implicit such as using int or other basic, immutable types. Often, you will be using mutable types as de-facto immutable – many lists are filled at construction and never changed. There is also no immutable dict. You should enforce immutability to optimise algorithms, e.g. to do caching.

Interestingly enough, python’s often-used dict requires keys to be immutable. It is a data structure that cannot work with mutable objects, since it relies on some features being guaranteed for its elements.

Mutable example

>>> my_list = [10, 20, 30]
>>> print(my_list)
[10, 20, 30]
>>> my_list = [10, 20, 30]
>>> my_list[0] = 40
>>> print(my_list)
[40, 20, 30]

Immutable example

>>> tuple_ = (10, 20, 30)
>>> print(tuple_)
[10, 20, 30]
>>> tuple_ = [10, 20, 30]
>>> tuple_[0] = 40
>>> print(tuple_)
Traceback (most recent call last): File "test.py", line 3, in < module > my_yuple[0] = 40
TypeError: 'tuple' object does not support item assignment

If you want to write most efficient code, you should be the knowing difference between mutable and immutable in python. Concatenating string in loops wastes lots of memory , because strings are immutable, concatenating two strings together actually creates a third string which is the combination of the previous two. If you are iterating a lot and building a large string, you will waste a lot of memory creating and throwing away objects. Use list compression join technique.

Python handles mutable and immutable objects differently. Immutable are quicker to access than mutable objects. Also, immutable objects are fundamentally expensive to “change”, because doing so involves creating a copy. Changing mutable objects is cheap.

Interning, integer caching and everything called: NSMALLPOSINTS & NSMALLNEGINTS

Easy things first —

NSMALLNEGINTS is in the range -5 to 0 and NSMALLPOSINTS is in the 0 to 256 range. These are macros defined in Python — earlier versions ranged from -1 to 99, then -5 to 99 and finally -5 to 256. Python keeps an array of integer objects for “all integers between -5 and 256”. When creating an int in that range, it is actually just getting a reference to the existing object in memory.

If x = 42, what happens actually is Python performing a search in the integer block for the value in the range -5 to 256. Once x falls out of the scope of this range, it will be garbage collected (destroyed) and be an entirely different object. The process of creating a new integer object and then destroying it immediately creates a lot of useless calculation cycles, so Python preallocated a range of commonly used integers.

There are exception to immutable objects as stated above by making a tuple “mutable”. As it is known a new object is created each time a variable makes a reference to it, it does happen slightly differently for a few things –

a) Strings without whitespaces and less than 20 characters
b) Integers between -5 to 256 (including both as explained above)
c) empty immutable objects (tuples)

These objects are always reused or interned. This is due memory optimization in Python implementation. The rationale behind doing this is as follows:

  1. Since programmers use these objects frequently, interning existing objects saves memory.
  2. Since immutable objects like tuples and strings cannot be modified, there is no risk in interning the same object.

So what does it mean by “interning”?

interning allows two variables to refer to the same string object. Python automatically does this, although the exact rules remain fuzzy. One can also forcibly intern strings by calling the intern()function. Guillo’s articleprovides an in-depth look into string interning.

Example of string interning with more than 20 characters or whitespace will be new objects:

>>> a = "Howdy! How are you?"
>>> b = "Howdy! How are you?"
>>> a is b
False

but, if a string is less than 20 char and no whitespace it will look somewhat like this:

>>> a = "python"
>>> b = "python"
>>> a is b
True

As a and b refer to the same objects.

Let’s move on to integers now.

As explained above in macro definition integer caching is happening because of preload python definition of commonly used integers. Hence, variables referring to an integer within the range would be pointing to the same object that already exists in memory:

>>> a = 256
>>> b = 256
>>> a is b
True

This is not the case if the object referred to is outside the range:

>>> a = 1024
>>> b = 1024
>>> a is b
False

Lastly, let’s talk about empty immutable objects:

>>> a = ()
>>> b = ()
>>> a is b
True

Here a and b refer to the same object in memory as it is an empty tuple, but this changes if the tuple is not empty.

>>> a = (1, 2)
>>> b = (1, 2)
>>> a == b
True
>>> a is b
False

Passing mutable and immutable objects into functions:

Immutable and mutable objects or variables are handled differently while working with function arguments. In the following diagram, variables aband name point to their memory locations where the actual value of the object is stored.

Major Concepts of Function Argument Passing in Python

Arguments are always passed to functions by object in Python. The caller and the function code blocks share the same object or variable. When we change the value of a function argument inside the function code block scope, the value of that variable also changes inside the caller code block scope regardless of the name of the argument or variable. This concept behaves differently for both mutable and immutable arguments in Python.

In Python, integerfloatstring and tuple are immutable objects. listdict and set fall in the mutable object category. This means the value of integerfloatstring or tuple is not changed in the calling block if their value is changed inside the function or method block but the value of listdict or set object is changed.

Python Immutable Function Arguments

Python immutable objects, such as numberstuple and strings, are also passed by reference like mutable objects, such as list, set and dict. Due to state of immutable (unchangeable) objects if an integer or string value is changed inside the function block then it much behaves like an object copying. A local new duplicate copy of the caller object inside the function block scope is created and manipulated. The caller object will remain unchanged. Therefore, caller block will not notice any changes made inside the function block scope to the immutable object. Let’s take a look at the following example.

Python Immutable Function Argument — Example and Explanation

def foo1(a):
# function block
a += 1
print(‘id of a:’, id(a)) # id of y and a are same
return a
# main or caller block
x = 10
y = foo1(x)
# value of x is unchanged
print(‘x:’, x)
# value of y is the return value of the function foo1
# after adding 1 to argument ‘a’ which is actual variable ‘x’
print(‘y:’, y)
print(‘id of x:’, id(x)) # id of x
print(‘id of y:’, id(y)) # id of y, different from x

Result:

id of a: 1456621360
x: 10
y: 11
id of x: 1456621344
id of y: 1456621360
Posted on Leave a comment

Why Open Source Matters to Alibaba

Alibaba has more than 150 open source projects and is a long-time contributor to many others,  Wei Cao, a Senior Staff Engineer at Alibaba, says that sharing knowledge and receiving feedback from the community helps Alibaba refine their projects. We spoke with Wei Cao — who is the head of Alibaba Cloud Database Department and leads the R&D of Alibaba RDS,POLARDB products — to learn more about the company’s open source focus and about some of the database-related projects they contribute to.

Linux.com: Why is open source so important for Alibaba?

Wei Cao: At present, Alibaba has more than 150 open source projects. We work on the open source projects with the aim to contribute to the industry and solve real-life problems. We share our experiences with the rest of the open source enthusiasts.

As a long-time contributor to various other open source projects, Alibaba and Alibaba Cloud have fostered a culture that encourages our teams to voluntarily contribute to various open source projects, either by sharing experiences or helping others to solve problems. Sharing and contributing to the community altogether is in the DNA of Alibaba’s culture.

When we first started to use open sources projects like MySQL, Redis, PostgreSQL, we received a lot of help from the community. Now we would like to give back to the same communities by sharing our accumulated knowledge and receive feedback from the community so that we can refine our projects.

We believe this truly represents the essence of open source development, where everyone can build on each other’s knowledge. We are dedicated to making our technology inclusive through continuously contributing to bug-fixing and patch optimization of different open source projects.

Linux.com: Can you tell us what kind of culture is within Alibaba to encourage its developers to consume and contribute to Open Source project?

Wei Cao: Alibaba has always had a culture of integrity, partnership, sharing and mutual assistance. At the same time, we always believe that more people participating in the community can promote the industry better and also make us more profitable. Therefore, our staff members are willing to pay close attention to open source projects in the community. They keep using open source projects and accumulating experience to give feedback on projects and jointly promote the development of the industry.

Linux.com: Can you tell us what kind of open source projects you are using in your company?

Wei Cao: Our database products use many open source projects such as MySQL, Redis, PostgreSQL, etc. Our teams have done feature and performance enhancement and optimization, depending on various use-cases. We have done compression for IoT and security improvements for financial industries.

Linux.com: Can you tell us about the open source projects that you have created?

Wei Cao: We will be releasing a new open source project, called Mongo-Shake, at the LC3 Conference. Based on MongoDB’s oplog, Mongo-Shake is a universal platform for services.

It reads the Oplog operation logs of a MongoDB cluster and replicates MongoDB data, and subsequently implements specific requirements through operation logs. Logs can provide a lot of scene-based applications.

Through the operation logs, we provide log data subscriptions to consume PUB/SUB functions and can be flexibly connected to adapt to different scenarios (such as log subscription, data center synchronization, Cache asynchronous elimination, etc.) through SDK, Kafka, MetaQ, etc. Cluster data synchronization is a core application scenario. Synchronization is achieved through playback after grabbing oplogs. Its Application Scenario includes:

  • Asynchronous replication of MongoDB data between clusters eliminates the need for double write costs.

  • Mirror backup of MongoDB cluster data. (Not support in this open source version)

  • Log offline analysis.

  • Log subscription.

  • Cache synchronization.

  • Through the results of the log analysis, it is known which caches can be eliminated and which caches can be preloaded to prompt the cache to be updated.

  • Monitor base on log.

Linux.com: Can you tell us about the major open source projects you contribute to?

Wei Cao: We have contributed many database-related open source projects. In addition, we have released open source projects, like AliSQL and ApsaraCache, which are widely used in Alibaba.

AliSQL: AliSQL is a MySQL branch, developed by Alibaba Cloud database team, and is servicing Alibaba’s business and Alibaba Cloud’s RDS. AliSQL version is verified to run many Alibaba workloads and is  widely used within Alibaba cloud. The latest AliSQL also merged many useful

AliSQL does a lot of enhancement in the features and performance based on MySQL. It has more than 300 patches, We have added many monitor indicators, features, and optimized it for different user cases. enhancements from the other branches like Percona, MariaDB, WebScaleSQL, and also contains a lot of patches with Alibaba’s experiences.

In general test cases, AliSQL has 70% performance improvement over official MySQL version, according to R&D team’s sysbench benchmarks. In comparison with MySQL, AliSQL offers:

  • Better support for TokuDB, more monitoring and performance optimization.

  • CPU time statistics for SQL queries.

  • Sequence support.

  • Add Column Dynamically .

  • ThreadPool support. And a lot of Bugfix and performance improvements.

The founder of MySQL/MariaDB, Michael Widenius “Monty” has praised Alibaba for open sourcing AliSQL. We got a lot of help from the open source community in the early development of AliSQL.

Now open source AliSQL is the best contribution we have made to this community. We hope to continue our open source journey in future. Full cooperation with the open source community can make the MySQL/MariaDB ecosystem more robust.

ApsaraCache: ApsaraCache is based on the Redis 4.0, with additional features and performance enhancements. In comparison to Redis, ApsaraCache’s performance is independent of data size. It’s related to scenarios. It also has better performance in cases such as short connections, full memory recovery, and time-consuming instruction execution.

Multi protocol support

ApsaraCache supports both Redis and Memcached protocol with no client code need to be modified. ApsaraCache supports Memcached protocol and users can persist data by using ApsaraCache in Memcached mode just like Redis.

Reusing Redis architecture, we have developed new features of Memcache such as support for persistence, disaster tolerance, backup recovery, slow log audit, information statistics and other functions.

Ready for production

ApsaraCache has proven to be very stable and efficient during 4 years of technical grinding and tens of thousands of practical testing of production environment.

The major improvements in ApsaraCache are:

  • Disaster depth reinforcement refactors the kernel synchronization mechanism to solve the problem of full synchronization of native kernel caused by copy interrupt under weak network condition.

  • Compatible with the Memcached protocol, it supports dual copy of Memcached and offers more reliable Memcached service.

  • In short connection scenario, ApsaraCache makes 30% performance increase compared with the vanilla version.

  • ApsaraCache’s function of thermal upgrade can complete the thermal update of an instance within 3ms and solve the problem of frequent kernel upgrading on users.

  • AOF reinforcement, and solve the problem of Host stability caused by frequent AOF Rewrite.

  • ApsaraCache health detection mechanism.

This article was sponsored by Alibaba and written by Linux.com.

Posted on Leave a comment

How to Check Disk Space on Linux from the Command Line

Quick question: How much space do you have left on your drives? A little or a lot? Follow up question: Do you know how to find out? If you happen to use a GUI desktop (e.g., GNOME, KDE, Mate, Pantheon, etc.), the task is probably pretty simple. But what if you’re looking at a headless server, with no GUI? Do you need to install tools for the task? The answer is a resounding no. All the necessary bits are already in place to help you find out exactly how much space remains on your drives. In fact, you have two very easy-to-use options at the ready.

In this article, I’ll demonstrate these tools. I’ll be using Elementary OS, which also includes a GUI option, but we’re going to limit ourselves to the command line. The good news is these command-line tools are readily available for every Linux distribution. On my testing system, there are a number of attached drives (both internal and external). The commands used are agnostic to where a drive is plugged in; they only care that the drive is mounted and visible to the operating system.

With that said, let’s take a look at the tools.

df

The df command is the tool I first used to discover drive space on Linux, way back in the 1990s. It’s very simple in both usage and reporting. To this day, df is my go-to command for this task. This command has a few switches but, for basic reporting, you really only need one. That command is df -H. The -H switch is for human-readable format. The output of df -H will report how much space is used, available, percentage used, and the mount point of every disk attached to your system (Figure 1).

What if your list of drives is exceedingly long and you just want to view the space used on a single drive? With df, that is possible. Let’s take a look at how much space has been used up on our primary drive, located at /dev/sda1. To do that, issue the command:

df -H /dev/sda1

The output will be limited to that one drive (Figure 2).

You can also limit the reported fields shown in the df output. Available fields are:

  • source — the file system source

  • size — total number of blocks

  • used — spaced used on a drive

  • avail — space available on a drive

  • pcent — percent of used space, divided by total size

  • target — mount point of a drive

Let’s display the output of all our drives, showing only the size, used, and avail (or availability) fields. The command for this would be:

df -H --output=size,used,avail

The output of this command is quite easy to read (Figure 3).

The only caveat here is that we don’t know the source of the output, so we’d want to include source like so:

df -H --output=source,size,used,avail

Now the output makes more sense (Figure 4).

du

Our next command is du. As you might expect, that stands for disk usage. The du command is quite different to the df command, in that it reports on directories and not drives. Because of this, you’ll want to know the names of directories to be checked. Let’s say I have a directory containing virtual machine files on my machine. That directory is /media/jack/HALEY/VIRTUALBOX. If I want to find out how much space is used by that particular directory, I’d issue the command:

du -h /media/jack/HALEY/VIRTUALBOX

The output of the above command will display the size of every file in the directory (Figure 5).

So far, this command isn’t all that helpful. What if we want to know the total usage of a particular directory? Fortunately, du can handle that task. On the same directory, the command would be:

du -sh /media/jack/HALEY/VIRTUALBOX/

Now we know how much total space the files are using up in that directory (Figure 6).

You can also use this command to see how much space is being used on all child directories of a parent, like so:

du -h /media/jack/HALEY

The output of this command (Figure 7) is a good way to find out what subdirectories are hogging up space on a drive.

The du command is also a great tool to use in order to see a list of directories that are using the most disk space on your system. The way to do this is by piping the output of du to two other commands: sort and head. The command to find out the top 10 directories eating space on a drive would look something like this:

du -a /media/jack | sort -n -r | head -n 10

The output would list out those directories, from largest to least offender (Figure 8).

Not as hard as you thought

Finding out how much space is being used on your Linux-attached drives is quite simple. As long as your drives are mounted to the Linux system, both df and du will do an outstanding job of reporting the necessary information. With df you can quickly see an overview of how much space is used on a disk and with du you can discover how much space is being used by specific directories. These two tools in combination should be considered must-know for every Linux administrator.

And, in case you missed it, I recently showed how to determine your memory usage on Linux. Together, these tips will go a long way toward helping you successfully manage your Linux servers.

Learn more about Linux through the free “Introduction to Linux” course from The Linux Foundation and edX.