What is a multi-user Operating system ? When the OS allows multiple people to use the computer at the same time without affecting other's stuff, it becomes a multi-user OS. Like wise Linux is also belongs to above mentioned category. There can be having multiple users, groups with their own personal files and preferences. So, this article will be helpful for you in below actions.
Manage User's Passwords ( Set Password policies, Expiration, further modifications )
Manage Groups ( Create/Delete user groups )
From this article we will discuss mostly useful Linux commands with their syntax's.
How to create a user
1) useradd : Add a user
syntax : useradd
eg : We will create a user named ""Jesica". The command is useradd jesica . First i switch to root user with sudo su command as i am a sudo user.
You can see when we created the user in root account, it just added the user without asking the password for the newly created user. So now we will create a password for the user jesica.
2) passwd : set a password for users
syntax : passwd
Here, i set a password for jesica. I set the password also as "jesica".You can use your own. The password you are writing will not be displayed for security reasons. As my password only having 6 characters, we get a message saying password is shorter than 8 characters. Those are password policies. We will discuss later in this article.
* Now we have created a new user with command useradd and set a password with passwd command. This is done in CentOS. But in some other linux distributions, adduser command will be used instead of useradd.
* If you are a normal user, you have to be a super user to add a new user. So you have to use the commands as sudo useradd and sudo passwd .
Where all of these users are residing ?
We discussed these stuff in "Linux File System Hierarchy" article. As /root directory is root user's home directory, normal user's home directory is /home. Inside of /home directory all the user's profiles are stored. You can use the command ls /home to check who are currently in your OS. Check the below image, which shows my users in my OS.
What is /etc/passwd file ?
When you created a user with command useradd without any options, there are some configuration file which are changing. Those are as below
/etc/passwd
/etc/shadow
/etc/groups
/etc/gshadow
Output of the above files are as below according to my OS.
1. /etc/passwd file
2. /etc/shadow file
3. /etc/group file
When we created a new user with useradd command without any options, /etc/passwd file sets reasonable defaults for all field in that file for the new user. It is just a text file which contains useful information about the users like username, user id, group id, user's home directory path, shell and etc.
If we discuss about the fields in /etc/passwd file, eg : student:x:1000:1000:student:/home/student:/bin/bash
1. student : This is the username. To login we use this name.
2. x : This is the password. This is an encrypted password stored in /etc/shadow file. You can see the password record in /etc/shadow file for user student in the above image.
3. 1000 : This is the user id. Each an every user should have UID. This is zero for root user and 1-99 is for predefined user accounts and 100-999 is for system administrative accounts. Normal users are having User IDs starting from 1000. Extra - Also you can use command id for viewing user details.
4. 1000 : Primary group ID ( GID ). see /etc/group file on left side.
5. student : Comment field
6. /home/student : User's home directory
7. /bin/bash : The shell used by the user
* Summary of the above
When a user created, new profile will be created in /home/username by default
Hidden files like .bashrc , .bash_profile , .bash_logout will be copied to user's home directory. Environmental variables for the user is set by those hidden files and they will be covered in future articles.
A separate groups will be created for each user with their name.
Useradd command with some options
1.) If accidentally user's home directory is not created with useradd command.
If you want to create a user without the home directory, useradd -M panda.
2.) If you want to move your home directory to a separate directory
In the above command you have to use useradd command and then -d option for changing the default home directory path and /boo is the new home directory. Last put the username. You can see the below image. /etc/passwd file has a different home directory entry for user boo, Because we changed it's home directory.
3.) Add a comment for the user when adding
In /etc/passwd file :
4.) Create a user by your own UID, useradd -u
5.) Create a user by your own UID and GID, useradd -u -g
6.) Create a user adding to a different groups, useradd -G There groups can be one or more and should be separated with a comma (,) the groups.
7.) To create a user, but disable shell login useradd -s /sbin/nologin With the above command, we can disable shell interaction with the user. But the account is active.
How to remove an account
3. userdel : Remove a user
syntax : userdel
eg : userdel -r
* When deleting the user, go with option -r. Why is it ? With -r option, it removes user with it's home directory. If removed without -r option, user's home directory will not be deleted.
How to modify an user account
4. usermod : Modify a user
syntax : usermod
* Here we can use all the options used in useradd command. Below are some options which is not discussed above.
1.) How to change the user's name
usermod -l
2.) To lock a user
usermod -L
3.) To unlock a user
usermod -U
4.) To change the group of a user
usermod -G
5.) To append a group to a user
usermod -aG
* Here appending means adding groups without removing the already existing groups. But if we use without -a, it removes the existing groups and join to new groups. This is relevant under primary groups and supplementary groups.
What is a group ?
Group is a collection of one or more users in Linux OS. Same as users, groups also have a group name and a id ( GID ). The group details can be found in /etc/group file. There are two types of main groups in Linux OS. Those are Primary groups and Supplementary groups. Every user once created is getting a new groups with the user's account name. That is the primary group and Supplementary groups are groups having one or more users inside.
How to create a group
4. groupadd : create a linux group
syntax : groupadd
Few examples
1.) To create a group named "student"
groupadd student
2.) Define a different group id ( GID )
groupadd -g 5000 student
How to modify an existing group
5. groupmod : modify a group
syntax : groupmod
To change the name of the group, groupmod -n To change the group if, groupmod -g
How to delete an existing group
6. groupdel : delete a group
syntax : groupdel
How to manage user passwords using password policy ?
As we discussed above, while /etc/passwd file stores user details, /etc/shadow file stores user's password details. I attached an image of /etc/shadow file in the above. Here we use a term named Password aging. From that we use command chage edit the password aging policy. Look at the below image.
Refer the above image and the options are as below.
chage -d 0 : Forcefully request the user to change the password in the next login.
chage -E Year-Month-Date : To expire an user account ( It should be in format YYYY-MM-DD )
chage -M 90 : Set password policy for requesting password should be renewed in every 90 days
chage -m 7 : Minimum days should be 7 to wait for changing the password again.
* Inactive days are set to define from how many days the account will be kept inactive after password expiration. If the user didn't change the password within inactive period, the account will be expired.
chage -l : To display user's current settings for password policy.
The default values for all of the above values ( password expiration days, inactive days and etc ) will be in the configuration file, /etc/login.defs text file. Including User account ID , Group Account ID configurations also can be seen there. You can change the values in the /etc/login.defs file as your requirement.
Now you have learned mostly needed stuff in Linux Users and Groups. This is not a small topic. There are a lots of commands you need to refer under this topic.
You can see our previous posts with related topics
I get this question quite often, but I struggle explaining it, especially in a few simple words. Anyway, this is a very interesting topic because things are very complicated when it comes to UNIX vs Linux. There are business related things, licenses, policies, government influence etc.
Due to Unix being an operating system and Linux being a kernel, they are different in nature and they have different purposes, they aren’t easily comparable. You can’t summarize the differences and what they are in a single sentence. But don’t worry. After this lesson of history and both of their features and purposes, you will get the “big picture” and everything will be nice and clear. You can jump to the end of the post, in the conclusion, if you want a quick read through.
Multix
Let’s jump to the late 1960s. Personal computers at the time were designed to do single specific tasks. For example, there was a computer for calculating a monthly salary, or a computer to do word processing in a library etc. Each of them was running a program specifically designed for that particular hardware and the task it was meant to do. The programs that were written for one computer vendor (or manufacturer like IBM or Apple) cannot be executed on a computer developed by a different vendor. Those computers cannot handle the execution of multiple programs at a time, but only one. So if a user wanted to listen to some music while writing a document, that was impossible. To overcome those issues the Multics (also known as Multix) operating system was developed. Initially as a collaborative project between MIT, General Electrics and Bell Labs. This is the root, the OS that laid the fundamentals of every new one including Windows, MacOS, Android, Linux-based operating systems and many more.
Multics (Multiplexed Information and Computing Service) is a time-sharing operating system. This means that many programs can share the hardware resources and switch on finite time intervals. In other words, the idea behind time-sharing operating systems is the mechanism that works as follows:
One program is using the hardware (CPU, RAM memory etc.) for some time, let say 20ns (nanoseconds), then it is stopped.
Now the hardware resources are available for another program for an equal amount of time, 20ns.
Due to the very small intervals (very fast switching) there is an illusion that multiple programs are running concurrently. The very same principle is present in every modern operating system.
In addition to time-sharing capabilities, Multics was designed with the idea of a modular hardware structure and software architecture. It is consistent with many small “building blocks”. Each block can be independently swapped with another one that does the same function but maybe in a different way. The final result is building a system that can grow by time and reusing the blocks instead of reimplementing them. So when there is a hardware change, only a few blocks are being updated, the rest are being reused. If the very same feature is required by multiple programs they can share a common implementation. For example, many programs can share the same implementation for transforming a word to lowercase thus saving time, effort, frustration among developers, and money. Those are the fundamentals of Multics.
Besides all the goodness of Multics, Dennis Ritchie and Ken Thompson (at the time employed in Bell Labs) were not satisfied with all aspects of the project. Mostly by the size and the complexity introduced to achieve the goals. In their spare time they started working on a similar hobby project (actually reimplementation of Multics) named Unics (Uniplexed Information and Computing Service) also known as Unix. As you can see the name Unics is influenced by Multics with the only difference being swapping the word “multiplexed” with “uniplexed”. The reason for this swap is the technical disadvantages of the Unics project at the very beginning. It could not handle the execution of multiple programs simultaneously, only one single program at a time, so uniplexed was used. It is worth mentioning that the Unics project was intended only for internal use inside Bell Labs and it was developed without any organizational backing.
Since we reached the birth of Unics, it’s time for a small recap:
Multics development started in the late 1960s
Multics goals are still valuable as of today. It’s time-sharing (multi-tasking)
Complaints about the size and complexity
In the early 1970s, Unics development begins but on a smaller scale to overcome the disadvantages of Multics. It is a hobby project of Dennis Ritchie and Ken Thompson.
Let’s continue with more details about Unics and its development.
Unix
Unics was initially written in assembly language. Because of this, most of the code was hardcoded for specific hardware and not easily portable to other computers. No better alternative was available at the time.
Meanwhile, the C programming language is released (also by Dennis Ritchie and Ken Thompson). The intention of this programming language is to be used for writing portable programs. It is achieved by requiring a relatively simple compiler, efficiently mapping to machine instructions, requiring minimal run-time support etc. For non-technical people, this is truly amazing.
At this moment in time, there is Unics, but it’s not portable, and there’s a new programming language that offers portability. Sounds like a wise idea – Unics to be rewritten in C. In the mid-1970s, Unics is being rewritten in C, introducing portability, but there was a legal issue preventing public release and wide use.
From a business and legal perspective things are quite interesting. There is a giant telecommunication company named AT&T that owns the previously mentioned Bell Labs research center. Due to the nature of the business and how available the technology was at that time, AT&T was considered as a government controlled monopoly. To simplify things, prices of the telecommunication services are controlled by the government so they can not skyrocket high, but also AT&T cannot go to bankruptcy due to the guaranteed income stated by the government. The point is that Bell Labs have a stable source of income (founded by AT&T) and can afford to allocate resources to whatever task they want with little to no worry about the cost. Almost complete freedom which is quite good for a research center.
Because of the monopoly issues and other legal stuff, AT&T was forbidden to enter into the computer market, only telecommunication services were allowed. All they could do was to license the source code of the Unics. It is worth mentioning that the source code is being distributed through other research centers and universities for further research, development, and collaboration, but under the corresponding license terms.
Later, there is a separation between Bell Labs and AT&T. Since the government controlled monopoly is on AT&T, Bell Labs is free after the separation so no legal issues are present anymore.
System V and BSD
By the 1980s, AT&T released a commercial version of Unics named System 5 (known as System V).
In meantime, while AT&T was working on System 5, at Berkeley University of California, the development of previously shared code from Bell Labs continues and very similar Unics operating system is being developed and released as BSD (Berkeley Software Distribution).
It’s time for a recap:
Initial development of Unics is being done at Bell Labs Unics source code is shared among universities and other researches Separation of Bell Labs and AT&T AT&T continues with the development of their own version of Unics named System 5 At Berkeley University of California, development of the previously shared source code is continued and another operating system is released as BSD (Berkeley Software Distribution).
So by the mid-80s, we already have two different Unics distros (System 5 and BSD) evolved by their own, but sharing a common predecessor.
There is no such thing as “the real” or “the genuine” Unics operating system. Аs time passes there are even more variants of what was available at those two branches.
HP branches out developing an operating system named HP-UX (Hawelt Packard Unix). Sun branches out with an operating system named Solaris. IBM branches out and continues developing their version named AIX.
It is worth mentioning that all of these branch outs are being done to provide some unique features in order for a given vendor to offer a better product on the market. For example, the networking stack is firstly available on the BSD branch, but later cross-ported to all other branches. Almost every nice feature was cross ported at some time to all other branches. To overcome the issues while cross porting features, and to optimize reusability at a higher level, the POSIX (Portable Operating System Interface X) is being introduced by the IEEE Computer Society in 1988. This is a standard that if followed by the vendors, compatibility between operating systems will be guaranteed, thus programs will be executable on other operating systems with no modifications being required.
Although reusability is present to some degree, the addition of new features requires a lot of work, thus makes development slower and harder. This is due to the inheritance of the terms and conditions of the AT&T license under which the Unics source code was distributed. To eliminate all the legal issues about sharing the source code, people working on the BSD branch started replacing the original source file inherited from AT&T with their own implementation, but releasing it with the BSD license that is more liberal in terms of reusability, modifications, distribution etc. The idea is to release the Unics operating system without any restrictions. Today, this is known as free software. Free as in freedom to study, modify and distribute the modified version without any legal actions against the developer.
This idea was not welcomed by AT&T, so there was a lawsuit. It turns out that there is no violation of any kind. The trend of replacing files continues and BSD version 4.4 (also known as BSD Lite) was released free from any source code originating from AT&T.
One more recap:
Many branch outs.
POSIX standard
It turns out that many features are being cross ported sooner or later.
Hard to say what is the “root” or “genuine” Unics operating system anymore. Everything is being branched from the same predecessor and every feature cross ported thus everything is more or less a variation of the same OS.
Due to legal issues that come from the contents of the AT&T license, development was hard and redundancy was common.
BSD started removing all the files originating from AT&T and providing source files that are free for modification and redistribution.
Now it is time to mention the GNU project.
GNU
(GNU’s Not Unix), a free software, mass collaboration project announced in 1983. Its aim is to provide users freedom and control in their use of their computers and electronic devices.
Can you spot the similar idea with what people behind BSD are doing already?
Both are somehow related to the term free software. but with a very big difference in how free software should be treated and that is obvious by comparing the GPL license (released by GNU) and BSD license. Basically, it comes down to:
The BSD License is less restrictive. It says do whatever you want with the source code. No restrictions of any kind.
The GPL License is more restrictive but in a good way. It puts emphasis on preventing the use of open source code (GPL licensed) in proprietary closed source applications. It states that if any GPL licensed source code is being used, the source of your code must be released under the same license too. Basically, with the GPL license, you can take whatever you want, but you must give back whatever you produce, thus raising the amount of available free software.
As a comparison, the BSD license does not state that whatever is being produced must be released as free software too. It can be released as proprietary closed source software without sharing any of the source code.
In addition to the license, the GNU project is developing a lot of software that is required in order to have a fully functional operating system. Some of their tools are GNU C library, GNU Compiler Collection (GCC), GNOME desktop environment etc. All of which are currently used in popular Linux distros.
Having all this in mind let’s talk about Linux by briefly explaining what it is.
Linux
Linux is not an operating system like BSD. Linux is a Kernel.
But what is the difference between a Kernel and an Operating System?
An operating system is a collection of many things working as a whole, fully functional, complete product.
A kernel is only a piece of the whole operating system.
In terms of the Linux kernel, it can be said that it is nothing more than a bunch of drivers. Even though there is a bit more, for this purpose, we will ignore the rest.
Now, what are drivers? A Driver is a program that can handle the utilization of a specific piece of hardware.
Short recap:
A Driver is a program that handles the utilization of a specific piece of hardware.
Linux is just a bunch of drivers (and something more that will be ignored for now)
Linux is a kernel.
A Kernel is a piece of an Operating System.
I assume we are all clear by now so we can begin with the Linux history lesson.
Its origins are in Finland in the 1990s, about 20 years later than Unics. Linus Torvalds at that time was a student and was influenced by the GNU license and Minix (Unics based operating system for education). He liked many things about Unics operating systems, but also disliked some of them. As a result, he started working on his own operating system utilizing many of the GNU tools already available. The end result is that Linus developed only a Kernel (the drivers). Sometimes the Linux-based operating systems are referred to as GNU-Linux operating systems because, without GNU tools, the Linux Kernel is useless in real life.
It can be said that Linux, to some point, is just a reimplementation of what was available as the Unics operating system (BSD, System 5…) but with a license that puts more emphasis on keeping the software free by enforcing modifications to be contributed back, thus available for studying, modifications, and further distribution.
The time-sharing capabilities that allow multitasking, the C programing language for providing portability, modular software design that allows swapping a single peace when needed and reusing the rest, and other stuff are inherited from Unics. Those are the fundamentals mentioned at the very beginning of this post. But not sharing any source code with Unix.
It is worth mentioning that Linux was intended to be a small school project. Many computer scientists were interested in trying it out of curiosity.
While Linux was still young, the lawsuit between BSD and AT&T was ongoing. Due to the uncertainty in BSD’s features, many companies that utilized BSD moved to Linux as a very similar alternative with more stable features. Linux was also one single source of code while the BSD source was distributed on many independent branches (BSD, Solaris, HP-UX, AIX etc.)
From the perspective of a company, requiring an operating system for their product (Wi-Fi routers, cable TV boxes etc.) Linux was a better choice. Having a single branch guarantees that all the features merged in the one feature will be available right away. Having a single branch is simpler for maintenance too. On the BSD side, due to the independent development, those new features still required some sort of cross porting which sometimes breaks something else.
This is the historical reason of why Linux gained great popularity even in the early stages of its development, while still not being on pair with BSD and lacking many features.
Unics vs Unix, Multics vs Multix
Did you notice that sometimes the term Unics is used instead of Unix?
The fun fact is that the original name of the project is Unics, but somehow people started calling it Unix. There are many stories about why Unix becomes a popular name, but no one can tell for sure. Now the name Unix is accepted as the official name of the project.
The very same is happening with Multics, with time everyone was calling it Multix even though it was not its official name.
Conclusion – Unix vs Linux
A timeline of Unix-like OSes
At this moment we know the history and the evolution of the operating systems, we do know why all these branch outs occurred, we do know how government policy can influence things. The whole story can be summarised as:
Unix was an operating system back in the 1960s and 1970s while being developed in Bell Labs. With all that branching mentioned above, and the cross porting features between branches, it is simply a chaotic situation and hard to say what is the genuine Unix anymore.
It can be said that the most genuine Unix operating systems are System 5 and BSD.
System 5 is developed by AT&T as a continuation of the work done at Bell Labs after their separation.
The most popular direct ancestor of Unix is the BSD project. It took all the source code from what was developed in Bell Labs, then replaced any source code released under a restrictive license and continued it as free distribution.
Other popular distributions of today are Free BSD, Open BSD, Net BSD, but many more are available.
Linux, on the other hand, does not share any code with Unix (from Bell Labs), it just follows the same principle of utilizing small building blocks to produce something of bigger value. This is mostly known as writing a small program that does one thing and does it well. Later, those programs are combined with mechanisms known as “pipes” and “redirection”, so the output of one program becomes the input to another program and as the data flows, something of bigger values is achieved as a final result.
In terms of Licenses, Unics has a very restrictive license policy when developed. Later, it’s forked under free licenses (BSD). Linux, on the other hand, is using the GPL license from the very beginning.
Both are following the POSIX standard so program compatibility is guaranteed.
Both are using the same shell for interfacing with the kernel. It is Bash by default.
BSD is distributed as a whole system.
Linux-based operating systems are made with the Linux Kernel in combination with GNU software and many other smaller utilities that fulfill each other to accomplish the goal.
Popular Linux Distributions: Ubuntu, Mint, CentOS, Debian, Fedora, Red Hat, Arch Linux, and many more. There are hundreds of distros nowadays, some of them even optimized for a specific purpose, like gaming or for old computers.
Even though we stated that there is one single source – the Linux Kernel, there are many Linux Distributions (Linux based operating systems). This may be confusing for someone so I will explain this just in case:
Every Linux distribution (distro) ships different versions of the Linux Kernel or the tools, or simply utilizing different building blocks. For example, Ubuntu is using SystemD as an init system, but Slackware is using SysV as the equivalent. There is nothing wrong with both, they do the same thing with some differences and there is a use case when one is better than the other,
Another example is that there are users who prefer to always have the latest version of the software, they use rolling release Linux based operating systems like Arch Linux. Other may prefer a stable environment with no major changes in 5 or more years, Ubuntu LTS (Long Term Support) version is ideal for this use case, which is why is widely used in servers along with CentOS.
As you can see there are even more similarities of both. Linux based operating systems are in the same “chaotic” situation too. There is no such thing as the real or the genuine Linux based operating system. There are many of them, but at least they do share the same source of the Linux Kernel,
It is worth mentioning that programs written for Linux based operating systems or bash commands that are following the POSIX standards can be executed on any Unix based operating systems too. Thus all major software like Firefox, or the GNOME desktop environment is available everywhere without requiring any modifications.
Another fun fact not mentioned before is that even the Mac OS (used in Apple computers) is considered as a BSD derivative. Not every release, but some of them are.
As you can see, in reality, things are even more complicated and interesting.
The passwords for user accounts often need to be changed. This is mostly done by the users themselves, but often, they have to be overridden by the administrator to
control any illegitimate activity by any of the users. Because of these reasons, Linux provides a wide range of options for user account password management.
We have discussed some of these useful options below:
Self password change:
The password of the user itself can be changed using the passwd command provided by Linux. This is how you can change the password of the user you’re logged in with. Just open up the command line, and type in: passwd This will open up a prompt asking for the current password, and then the new password, and its repeated confirmation. The passwords aren’t shown in the terminal, so that they are not visible to any person that might be around the system.
password
Sample output:
pulkit@hostname:~$ passwd
Changing password for pulkit.
(current) UNIX password:
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
pulkit@hostname:~$
Changing the password of another user
This requires root access, as only the root can add, remove or change the password of any other user on the system. You will need to know the administrator password. Therefore, the command becomes:
sudo passwd
Sample output:
pulkit@hostname:~$ sudo passwd testuser
[sudo] password for pulkit:
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
pulkit@hostname:~$
Or if you’re logged in with the “root” user you can just use the command without “sudo”.
Checking password status
Seeing the status of a password is also made easy in Linux. There are a lot of criteria and properties that a password may have, and these are the commands that can be used to view those:
99999 : Password expiry maximum age (99999 basically means never)
7 : Password expiry warning period
-1 : Inactivity period (-1 means never)
This output is a bit cryptic. There is another command that displays this information in a better way. The syntax is as follows:
chage -l
Sample output:
pulkit@hostname:~$ chage -l pulkit
Last password change : Apr 15, 2019
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : 99999
Number of days of warning before password expires : 7
pulkit@hostname:~$
Deleting a password
This option sets an account essentially password-less, so that anyone can log into it. This is not useful in most personal computers, but for a home PC, that’s what most people use, or for a system that needs to be open to anyone who attempts to use it, this option is essential. This command also requires root access. To delete the password of a user, use the following syntax:
sudo passwd -d
Sample output:
pulkit@hostname:~$ sudo passwd -d testuser
[sudo] password for pulkit:
passwd: password expiry information changed.
pulkit@hostname:~$
Force a user to change their password
This is a very useful feature, especially for Linux administrators. What this command basically does is expire the password of the mentioned user, so that the user has to forcefully change the existing password at their next login. This obviously requires root access as well. The command to be entered is this:
sudo passwd --expire
Sample output:
pulkit@hostname:~$ sudo passwd --expire testuser
passwd: password expiry information changed.
su allows you to log in as another user.
pulkit@hostname:~$ su testuser
Password:
You are required to change your password immediately (root enforced)
Changing password for testuser.
(current) UNIX password:
Enter new UNIX password:
Retype new UNIX password:
testuser@hostname:/home/pulkit$
You can test these commands out on a Linux server at Vultr.
Conclusion
That is all for the major operations regarding changing user passwords in Linux. Some of these options are exclusively for the root user, but that makes sense even for personal systems, as user management is often required in case of password loss, or something else of the sort. The root user is allowed to overpower any other user. The root account is not something to be played around with. You must always use the root account carefully.
Another common use-case is in servers. The upside to using a server is that in some cases you can still access the server via a web console, even if you locked yourself out of the server.
These instructions work for any Linux distro, including Ubuntu, CentOS, Debian, Fedora, etc.
Linux-based operating systems (often called Linux Distributions, or just Distros) are quite popular among programmers and developers since their announcement in the 90s. The Linux kernel itself is designed to be flexible and open for modifications and contributions, thus it can run on any hardware. The same principle is applied to almost the whole software stack above the kernel that constitutes the Linux Distribution as a complete product. In general, it is designed from programmers for programmers and freely available to everyone.
All of the Linux-based distributions share common code – the Linux Kernel itself, but many different methods of software distribution to the end users appeared. Some of them tend to provide a stable environment while others tend to provide the very latest software available all the time.
Stable vs Rolling Distros for Development
Linux-based distributions that are focused on stability achieve it by freezing the software as much as possible. An updated version of some software will be distributed to end users only in critical situations. Most of the time those are security updates and bug fixes but for sure no new features are being added until the end of life for that release is reached. In short, if it is not broken don’t fix it.
There are Linux distributions that tend to offer the latest of everything. “Latest must be greatest” is in their genes. Usually, the term rolling distribution is used as a description for these distros. It is very common to receive an update every hour or two. Although this looks like an awesome concept, in practice it is shown to bring some sort of instability with it and in some cases, even breaking the system.
There are pros and cons in both. Stable distributions prefer safety rather than features. This is important if you are working on a product that must be running 24/7 and must be error-free. They are often used for servers, data centers, and home users. Developers choose this type of distribution when they need to provide long-term support for the product or if the developing of the product requires an extended amount of time, like 5 or more years.
For programmers, each of their programs relies on features offered by some other program. Those are called dependencies. A stable environment can guarantee that no bugs will appear overnight due to changes in some of the dependencies. In other words, if there is a bug, don’t waste time trying to find it somewhere else rather than in your own code. This saves a lot of time and frustration.
The cons of the stable environment is that most of the time they are lacking all the cool new features. Sometimes this forces developers to utilize a harder route to achieve the desired point and nothing else. But sometimes this means that end product will run slower in production due to missing optimizations in some of the dependencies, or it will be less secure due to unpatched exploits etc.
Rolling Distributions prefer and lean towards new features and bleeding-edge software as compared to stable ones. Those are preferred among programmers and developers working on continuous integration. For example, if a program is tightly coupled with many dependencies.
Being informed of every change in your dependencies forces developers to resolve minor conflicts on a daily basis. This opens an opportunity to immediately inform third parties that they are breaking your code. Sometimes it is much easier to release a fix on their side instead in your own. Asking them to undo all the patches released in the last few years because just now you are aware of having a problem with them is a no-go. Simply, they won’t accept the undo request, so everything must be fixed on your side.
Also, you have a chance to develop your product with the latest and greatest tools available at the time. This can improve the performance, reduce the cost, can introduce a new and easier way of doing things (new API) etc.
It’s worth noting that constantly throwing updates can break things. Most of the time it is about not simultaneously updating to a new release when needed. Some apps are intended to work with the exact version of another app so breaking this creates undesired behavior. In this case, both apps must be updated at the same time which is not the case in a rolling release model.
Best Linux Distros for Programming Compared
Now, to the main part, choosing the best Linux distro for you. This is an overview and comparison of the best Linux distros for programming.
1. Ubuntu
Ubuntu is the most popular Linux Distribution among all of them. It is used by programmers and most of the home users too.
There is one major release every two years. This is called an LTS (Long Term Support) release. Those are stable thus receiving only bug fixes and security updates in next 5 years. As the release model prefers stability its underlying layers are mostly stable (unchanged) during this 5 years period. The latest LTS release as of writing is Ubuntu 18.04 LTS.
There is one non-LTS release every 6 months supported for a period of 9 months. Those releases are not considered as stable. Big and significant changes can occur in every release. Sometimes those releases are caring packages that break dependencies with a previous release. It is like a playground for merging software and continuously searching for incompatibilities in a desire to provide the best fit solution for the next LTS release. Developing software in this kind of unpredictable environment is not clever. But in real life, it is not so frightening. Even non-LTS releases offer a crash-prone environment. Many home users are using them as a daily driver with no issues at all. They see a benefit from having more recent software than what is available in the latest LTS release for example.
As you can see it is a mix of everything. You can have 5 years or stability, or stability for 9 months depending on what fits the best. Even mixing packages is possible but not recommended. A user of an LTS release can obtain a newer version of the same software from a more recent non-LTS release. This is handy as a one-time workaround but it is like a tempered bomb waiting to break the system. Pooling recent packages will continue until some incompatibility occurs. It is better to switch to a non-LTS version instead.
It is worth mentioning that Ubuntu is the place where developers and home users meat each other. Therefore, Ubuntu is the starting point for a company offering a product or a service on a Linux-based operating system. Here they find an environment that is stable and familiar to the developers but also many target users. In addition, it is best to develop the software in the native environment as the product or service that’s going to be deployed and used in production. Sounds like a perfect balance.
Ubuntu is one of the most popular Linux distro for servers, and most people use it as their main distro with their Cloud hosting.
Some of the companies that love Ubuntu and that are offering their products or services on Ubuntu as a first choice are: Nvidia, Google, Dell, STMicroelectronics etc. Most companies that sell Linux laptops offer Ubuntu as the first choice for a pre-installed distro.
Nvidia is offering the CUDA toolkit natively on Ubuntu as a first choice. Only the LTS releases are officially tested and supported, thus Ubuntu is the best fit if you rely on CUDA for your project. But it is not exclusive. The CUDA toolkit is available on non-LTS releases and many other Linux-based distributions, but without support or guarantees that things will behave as expected.
Google is the company behind Android. They offer developing Android applications on Windows, Mac OS, and Linux-based distributions. Ubuntu is their first choice. Android Studio (IDE) and all other tools are tested on LTS releases of Ubuntu before distribution to end users.
STMicroelectronics is a company producing ARM-based CPUs for embedded devices. Developing software for their CPUs is possible on Windows, Mac OS and Linux-based distributions. They support Open STM 32 for developing a free and cross-platform IDE, System WorkBench. Again, LTS releases of Ubuntu is their first choice for a Linux Distribution.
Dell is known for their laptops, ultra-books, PCs and monitors. Their products are mostly offered with Windows preinstalled and Ubuntu for some of them. The Dell XPS 13 Developer Edition is a small, light, fast, and beautiful, and runs an Ubuntu LTS release by default.
There are more companies that offer and use Ubuntu, but this should give you an idea of how software and development companies incorporate Ubuntu.
2. Arch Linux
Arch Linux is just the opposite of Ubuntu. It is a rolling release Linux Distribution. There are constantly new updates. Every hour or two something new arrives in your system. It is a perfect working environment for some. As we mentioned earlier this type of software distribution is best suitable for developers working on software that is highly coupled with some or many dependencies. They will receive an updated version of their dependencies with almost no delay. But this comes at a price for sure. The instability of the system offers no guarantees for the origins of the new bugs.
Also, Arch Linux is hard to install. An advanced user can do it in no more than 15 minutes, but it is almost impossible for a newcomer to succeed. It requires a lot of knowledge because there is nothing preconfigured, there is no default, everything is custom instead. A pure mechanism for distributing software and nothing more, it is up to the user to install and configure things according to their personal requirements. This is why many people use Arch Linux as a lightweight Linux distro, by installing a lightweight window manager/desktop environment, and only the essential software. As you can already see Arch Linux provides a perfectly configured environment for every developer that knows how to utilize it.
Every Arch Linux is unique thus each of them encounters unique obstacles. This is what makes it special and loved among programmers. Just by using it on daily basis you grow. There is a giant and thorough wiki page. It’s one of the best wikis you can find with very detailed and strict explanations and guides for configuring stuff and encouraging the use of what is said to be good practice. Its necessity can be seen just when you try to install it (as we mentioned earlier, it is hard if not following the wiki the first time). Reading documentation may seem like wasting time but it is an essential skill for every programmer. Just by reading good documentation developers also learn how to write good documentation.
Tinkering here and there with the operating system itself will teach you how one works so you can build your own later. It’s an important skill to have, especially if you end up working with embedded devices in your career. Every day you can read about some unexpected issues on the forum and very clever workarounds for each of them. Just being aware of what might go wrong makes a developer produce code with better quality if paying attention.
The best thing about Arch Linux is its huge repository of available software. Personally, I can’t think of something that I need that is unavailable. Although the software is there, because of the very different configurations among users, the quality of provided software can be lower than expected. It is not unusual if users need to get their hands dirty doing some minor manual intervention. It’s brilliant for improving skills but some can struggle with the maintenance at the beginning.
It’s worth to be mentioned that there are no devices that come with Arch Linux preinstalled. It is painful to do so since until the device reaches the customer, the software is out of date and performing one giant update is very likely to break the system (while constant minor updates don’t). Even if some vendors do, advanced users will find it uncomfortable and will change it anyway.
3. Fedora
Fedora is another popular Linux Distribution among programmers. It is just in the middle between Ubuntu and Arch Linux. It is more stable than Arch Linux, but it is rolling faster than what Ubuntu does.
There is one major release every 6 months supported for 13 months. Basically, 13 months of a stable environment is just fine. Also, 6 months of delay between the next big update is fine too. No software is growing so fast so it is good even for those who want to experience and work with the latest stuff in a stable environment while still doing their integration job without issues. Excellent balance as Ubuntu does but with a smaller amount of home users.
In terms of software availability, there is no such broader range as in Ubuntu or Arch Linux. If you are looking for proprietary software the situation is even worse. There isn’t any official support for it. But if you are working with open source software instead Fedora is excellent.
The people behind Fedora embrace free and open software and do the best for it but it is a big no go for any proprietary stuff. You can’t find Java, DVD codecs, Flash Player etc. Of course, all those are available in some private repositories with weaker license policies but they are not officially supported so no guarantees for any incompatibilities or misbehavior. It is a big issue if you are working on a project that costs money or that is expected to have a big impact because you don’t want to rely on unreliable sources. You want support instead. On Ubuntu for example, companies do offer support for their proprietary software.
There are several Fedora “Spins”, which are similar to Ubuntu flavors. It’s basically Fedora with software pre-installed for a specific purpose, but the main difference is the desktop environment. We featured the Games Spin of Fedora in our Best Linux Gaming Distros list.
Do not forget that the Fedora project is founded by Red Hat Linux distribution which is targeting the enterprise sector and offer paid support for it. Fedora is like a playground but a good one. At some point, the Fedora release will become a Red Hat release. Everybody has some benefits. Big companies receive a rock solid and stable system with years-long support (from Red Hat) while casual users receive a big amount of free software and a stable environment that is more recent than Ubuntu (from Fedora).
Just like Arch Linux, there are no devices with Fedora preinstalled because of very short time between major releases. In 6 months there is no time for manufacturers to produce and sell the device.
There are hundreds of Linux distros out there, each more different than the other. Though the 3 distros we mentioned are great for developers, you may find a better fit in a different distro. For example, if you’re developing an application that’s supposed to run on a server, you may need to use a server distro like Ubuntu or CentOS. So do your research and you may find a better one for you.
Overview of The Best Tools for Programmers on Linux
Out of the box, no distro comes with IDEs and toolkits pre-installed, neither Windows nor Mac OS do, so developers have to install them manually. Only a simple text editor like gedit or nano (command line text editor) can be found preinstalled. Some popular IDEs are: Eclipse, QT Creator, NetBeans etc. but many developers dislike IDEs in general and use simpler text editors like Sublime, Atom or Vim instead.
Eclipse is the most commonly used IDE. It supports multiple programming languages like C, C++, Java etc. Its basic features can be extended by various plugins. This allows a company to develop a complete IDE for their product just by writing a small plugin and relying on Eclipse for everything else.
Until 2016, Eclipse was actively supported by Google as the recommended IDE for Android applications development. Later Google migrated to InteliJ IDE and abandoned Eclipse, but users continued developing plugins like gradile-android-eclipse and still providing easy to use Android IDE based on Eclipse.
Eclipse is the recommended IDE for CUDA development on Linux-based operating systems (Visual Studio for Windows). Nvidia is distributing a slightly customized version of Eclipse called Eclipse Nsight. For them, It is much easier to provide an IDE by reusing components, but also it is even easier for developers when they don’t have to endure the hassle of building custom toolchains even for simple “Hello World” examples.
System Workbench is an IDE for programming ARM-based CPUs. This IDE is free and built by the community but also supported and recommended by STMicroelectronic, a company that produces ARM-based CPUs. The IDE can be downloaded standalone, or by adding a plugin on top of existing Eclipse installation.
The latest stable version of Eclipse is 4.7, named Oxygen. As Arch Linux tends to have the latest of everything, version 4.7 is available in the repositories. On Ubuntu 16.04, which is latest stable release at the moment. version 3.18 of Eclipse is available while Fedora offers version 4.7 just like Arch Linux.
QT Creator is another very popular IDE. It is developed by the QT Company as an IDE for the QT framework. Although targeting a single C++ framework, because the nature of the language it is commonly used for developing non-QT applications with plain C or C++. It cannot be extended like Eclipse but it is much faster because it is written in C++. Also, it provides better theming options than Eclipse that blends well with the native desktop environment.
On Arch Linux users can obtain the very latest version 4.6 but on Ubuntu, we are stuck with version 3.5 while on Fedora it’s version 4.5.
Netbeans is an IDE for developing in C, C++, Java, PHP, Node JS, Angular JS etc. It is most popular among PHP developers and Web developers. Also, some tend to use it for Android application development with the NBAndroid plugin. There are many other plugins that enable a better integration with various technologies like WordPress, Ruby, Ruby on rails etc. Much like Eclipse.
The latest stable version of Netbeans is 8.2 and the same is available in the Arch Linux repositories. Ubuntu users can obtain version 8.1 while Fedora users must do a manual installation by downloading Netbeans from the website and eventually manually resolving conflicts and dependency issues if any. Just a warning than officially supported software is always of a better quality.
Sublime is one of the most famous text editors available. Sublime is mature, supports extensions and has autocomplete and code highlight for almost any kind of programming. Even though it is just a text editor, its extensions can easily add every feature that is expected from a modern IDE. Those are the main selling points. Once you get familiar with it, you are going to use it for everything.
Ubuntu doesn’t ship Sublime in their repositories, but Sublime developers offer a packaged version in their private repository, just follow the instructions and obtain the very latest version available from their website. Arch Linux also doesn’t distribute it in the official repositories but it is available from the AUR (packaged from users, also unofficially). Fedora requires manual installation too, see their website for instructions.
Atomtext editor is an alternative to Sublime. It is free as in freedom and it is based on Electron. Although it has the very same features set and capabilities it tends to be heavier than Sublime so some developers are simply rejecting it.
As with Sublime, Atom on Ubuntu and Fedora can be installed manually by following instructions on the website while on Arch Linux they distribute version 1.25.
Developers who are doing most of the work in a terminal use the Vim text editor, especially on servers. It is a free and extensible command line text editor. It is an improved version of the older Vi text editor. One can use Vim for developing in any programming language or toolkit. Vim is the most customizable of them all by adding additional plugins. It is the most keyboard-friendly text editor available too. Some developers find using the mouse hurting their shoulders and muscles thus being able to do anything with only a keyboard is like being blessed.
Vim behaves like a person. Stand-alone is nothing, but plugins make It evolve to anything. Many actions are just like having a conversation with the editor. For example, to delete next three words after the cursor you just type “d3w” (Delete 3 words) and done, or to move the cursor 4 rows down type “4j” and done. There are many many more similar shortcuts that allow developers to do things faster and easier compared to other text editors or IDEs.
Both Arch Linux and Fedora distribute version 8.0 of Vim, while Ubuntu version 7.6.
Introduction to the Linux Shell for Development
Beside IDEs and text editors, Linux-based operating systems are popular because of the Shell. Ba Shell, known as bash, is most commonly found preinstalled on every major Linux distribution but it’s not the only one available. There are zsh, tcsh, ksh etc. They all do the same job but with minor differences that are not part of this introduction. The thing about shells is that they are an environment for interaction with the system. Often shells are used for automating things.
Some tasks through the development lifecycle are repetitive and require time synchronization in terms of executing the next task when the previous is done. Very common in embedded development like build kernel, then waiting until it is done to start building the image, then again waiting to start the transferring the image to the device, and waiting one more time until transfer complete to boot finally boot the device.
The point is that no one wants to sit in front of the PC or sever waiting for a job to finish then manually executing the next one. It is nicer for developers if they can just write the code and let someone or something else to manage task execution in the right order. It’s about not hurting your eyes while paying attention to the execution status. A simple five-line shell script can automate all of this. In addition, the shell can even send a notification to your smartphone when all the jobs are done or if there is a crash.
Another use case when the shell is important is automating the crashes. Knowing that each build generates text output and that the output can be redirected to shell script we can automate the crash handling process. Thus, if compiling fails due to a missing header, a shell script can search the file system to find the location of the header and check if that location is included in our build. If not, then alter the content of a single file and add the path to the header. Now the shell script can simply inform the developer of the crash, what actions were taken if any and retry to compile. This is handy for long-lasting projects.
Windows vs Linux for Programmers
Nevertheless, developing on Windows requires installing more additional software. For example, for Android development, device drivers are required. Sometimes drivers might crash or cannot be installed or don’t work on recent versions of Windows (if it is an older device). But a good programmer must have many devices around and test the program on each one of them. This can complicate the setup of the working environment quite a bit.
On Linux distros, this is a very smooth process. All the drivers are already present in the Linux kernel (with just a few exceptions) so no additional installation is required beside the IDE. Just plug in a device and you are ready to go. As smooth as that.
Another use case is when developers have to obtain support for multiple products at the same time. This is fine until two products enforce the existence of software that cannot coexist. For example, version 3.1 of a given program and version 4.2 of the same program but for the other project.
On Windows, installing a newer version often requires deleting the older version. Even if the older version is not automatically deleted, environmental variables are being automatically modified so pooling wrong dependencies or pooling a dependency twice might occur.
On Linux distros, this can be resolved quite easily. Just extract one version in one folder, extract another version in another folder and you are halfway done. Second part is either change the global environment variable to point to only one path, or alter the content of the variable but with a tighter scope. Thus, the same variable will have a different property for different compilations. Isn’t this great? Not just resolving the coexistence problem, in addition, a developer can even run two compiles at the same time without issues.
Conclusion – Best Linux Distro for Programmers
In general, Linux-based operating systems offer a more than excellent environment for developers. It just takes some time to learn the cool stuff. No matter which distribution you choose you won’t regret doing it. Just pay attention to the method of software distribution. Choose what suits you best and the projects you are working on. If not sure, just choose Ubuntu, overall it is the best-balanced Linux distribution.
Users are arranged into different groups on all Linux-based systems. The whole idea behind this is that it makes the administration of the system easier, as the users can be arranged into different groups, and can be granted different permissions as required by the system administrator. The same user can be included in different groups, and the same file can be given different permissions based on the groups.
This article is about how to add a user to a group.
The instructions will work well on most Linux distros, including Ubuntu, CentOS, Debian, Fedora, Linux Mint, etc.
Different scenarios when adding a user to a group
Adding a user to a group has many factors to consider. Some are:
Existence of the user – The commands are usually different depending on whether the user already exists on the system,
Group category – The main group that the user belongs to is called the primary group. Generally, this group has the same name as that of the user. The other groups that the user belongs to are called the secondary groups. There are other groups too, that a user is not a part of, at all.
User permissions – This is a major factor, as only the super users can add any user to any given group. This permission limits the users in terms of which groups and which users they can edit.
Keeping all these factors in mind, we are only presenting two commands in order to add users to groups. But this is being shown considering that the user entering these commands is a super user/root (can perform sudo). These are the commands:
To add new users to groups
First, and the only exception, to add new users to groups:
sudo useradd -G
The id command shows basic information about a user on the system. Therefore, to prove that the user doesn’t exist at first:
pulkit@hostname:~$ id testuser
id: ‘testuser’: no such user
pulkit@hostname:~$
All the remaining scenarios – when a user already exists
Now the only condition is that the user should already exist on the system. All the remaining scenarios can be worked out with this command:
sudo gpasswd -a
If you want to remember this command, then you can create a new user before adding the user to the system. This can be done with:
sudo adduser
Sample:
pulkit@hostname:~$ sudo adduser testuser
Adding user `testuser' ...
Adding new group `testuser' (1001) ...
Adding new user `testuser' (1001) with group `testuser' ...
Creating home directory `/home/testuser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for testuser
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
pulkit@hostname:~$ sudo gpasswd -a testuser pulkit
Adding user testuser to group pulkit
pulkit@hostname:~$
Now checking if this worked:
pulkit@hostname:~$ id testuser
uid=1001(testuser) gid=1001(testuser) groups=1001(testuser),1000(pulkit)
pulkit@hostname:~$
Removing users from a group
This is not a part of adding users to groups, obviously, but an essential command to know if you’re managing groups. The command uses the same format as seen with the gpasswd command:
sudo gpasswd -d
Sample:
pulkit@hostname:~$ sudo gpasswd -d testuser pulkit
[sudo] password for pulkit:
Removing user testuser from group pulkit
pulkit@hostname:~$ id testuser
uid=1001(testuser) gid=1001(testuser) groups=1001(testuser)
pulkit@hostname:~$
This removes the user from the given group, as shown.
You can test these commands out on a Linux server at Vultr.
Conclusion
Adding users to groups on Linux is an easy task, if you know what commands to use. You can essentially do everything using just one command, as long as you know how to use it. The gpasswd command is the simplest command to use for the task. The useradd or usermodcommands can be used as well, but they have a comparatively more complex syntax, and therefore not recommended for beginners.
We hope this article served its purpose. Let us know if you have any questions in the comments below.
is extremely important – now more than ever. If you start to do research, however, you’ll find a debate going on about which operating system is the safest. These days, more IT professionals and companies are preaching the benefits of Linux systems. There are definitely some security advantages to the platform. But like everything in the computer world, so much comes down to user training. Even if you have a very secure platform, a virus can still be a problem. So let’s take a look at Linux and some of the advanced security measures you need to take.
A Less Frequent Target
Linux is safer than macOS for the same reason macOS is safer than Windows. Simply put, Linux has lower market share. So hackers are generally less interested in attacking it. The vast majority of hacks aren’t targeted against one company in particular. Instead, hackers attack a large number of computers and servers hoping that viruses and malware will land in some of them. Since Windows controls 75 percent of the market, it makes sense for hackers to target that operating system.
Creating Malware is Much More Difficult
Linux has never been easier to use. Even complete newbies can learn to operate the newest OS in a few minutes. One of the key differences between Linux and Windows / macOS is that you don’t just download and open attachments. Instead, before executing attachments, you need to set it up and make sure each user has the right permissions before they can open a specific file. These extra steps prevent internal users from inadvertently running virus files.
Linux Has a Strong Community of Developers
Linux is open source. This means there’s a huge group of developersalways checking for issues and determining fixes. Whereas it may take months for big companies like Apple and Microsoft to identify a problem and then issue a security patch. Linux users can repair it themselves and have the code issued in minutes. This minimizes downtime and the potential threat of viruses.
Backups are an incredibly important aspect of a system administrator’s job. Without good backups and a well-planned backup policy and process, it is a near certainty that sooner or later some critical data will be irretrievably lost.
All companies, regardless of how large or small, run on their data. Consider the financial and business cost of losing all of the data you need to run your business. There is not a business today ranging from the smallest sole proprietorship to the largest global corporation that could survive the loss of all or even a large fraction of its data. Your place of business can be rebuilt using insurance, but your data can never be rebuilt.
By loss, here, I don't mean stolen data; that is an entirely different type of disaster. What I mean here is the complete destruction of the data.
Even if you are an individual and not running a large corporation, backing up your data is very important. I have two decades of personal financial data as well as that for my now closed businesses, including a large number of electronic receipts. I also have many documents, presentations, and spreadsheets of various types that I have created over the years. I really don't want to lose all of that.
So backups are imperative to ensure the long-term safety of my data.
Backup options
There are many options for performing backups. Most Linux distributions are provided with one or more open source programs specially designed to perform backups. There are many commercial options available as well. But none of those directly met my needs so I decided to use basic Linux tools to do the job.
In my article for the Open Source Yearbook last year, Best Couple of 2015: tar and ssh, I showed that fancy and expensive backup programs are not really necessary to design and implement a viable backup program.
Since last year, I have been experimenting with another backup option, the rsync command which has some very interesting features that I have been able to use to good advantage. My primary objectives were to create backups from which users could locate and restore files without having to untar a backup tarball, and to reduce the amount of time taken to create the backups.
This article is intended only to describe my own use of rsync in a backup scenario. It is not a look at all of the capabilities of rsync or the many ways in which it can be used.
The rsync command
The rsync command was written by Andrew Tridgell and Paul Mackerras and first released in 1996. The primary intention for rsync is to remotely synchronize the files on one computer with those on another. Did you notice what they did to create the name there? rsync is open source software and is provided with almost all major distributions.
The rsync command can be used to synchronize two directories or directory trees whether they are on the same computer or on different computers but it can do so much more than that. rsync creates or updates the target directory to be identical to the source directory. The target directory is freely accessible by all the usual Linux tools because it is not stored in a tarball or zip file or any other archival file type; it is just a regular directory with regular files that can be navigated by regular users using basic Linux tools. This meets one of my primary objectives.
One of the most important features of rsync is the method it uses to synchronize preexisting files that have changed in the source directory. Rather than copying the entire file from the source, it uses checksums to compare blocks of the source and target files. If all of the blocks in the two files are the same, no data is transferred. If the data differs, only the block that has changed on the source is transferred to the target. This saves an immense amount of time and network bandwidth for remote sync. For example, when I first used my rsync Bash script to back up all of my hosts to a large external USB hard drive, it took about three hours. That is because all of the data had to be transferred. Subsequent syncs took 3-8 minutes of real time, depending upon how many files had been changed or created since the previous sync. I used the time command to determine this so it is empirical data. Last night, for example, it took just over three minutes to complete a sync of approximately 750GB of data from six remote systems and the local workstation. Of course, only a few hundred megabytes of data were actually altered during the day and needed to be synchronized.
The following simple rsync command can be used to synchronize the contents of two directories and any of their subdirectories. That is, the contents of the target directory are synchronized with the contents of the source directory so that at the end of the sync, the target directory is identical to the source directory.
rsync -aH sourcedir targetdir
The -a option is for archive mode which preserves permissions, ownerships and symbolic (soft) links. The -H is used to preserve hard links. Note that either the source or target directories can be on a remote host.
Now let's assume that yesterday we used rsync to synchronized two directories. Today we want to resync them, but we have deleted some files from the source directory. The normal way in which rsync would do this is to simply copy all the new or changed files to the target location and leave the deleted files in place on the target. This may be the behavior you want, but if you would prefer that files deleted from the source also be deleted from the target, you can add the --delete option to make that happen.
Another interesting option, and my personal favorite because it increases the power and flexibility of rsync immensely, is the --link-dest option. The --link-dest option allows a series of daily backups that take up very little additional space for each day and also take very little time to create.
Specify the previous day's target directory with this option and a new directory for today. rsync then creates today's new directory and a hard link for each file in yesterday's directory is created in today's directory. So we now have a bunch of hard links to yesterday's files in today's directory. No new files have been created or duplicated. Just a bunch of hard links have been created. Wikipedia has a very good description of hard links. After creating the target directory for today with this set of hard links to yesterday's target directory, rsync performs its sync as usual, but when a change is detected in a file, the target hard link is replaced by a copy of the file from yesterday and the changes to the file are then copied from the source to the target.
There are also times when it is desirable to exclude certain directories or files from being synchronized. For this, there is the --exclude option. Use this option and the pattern for the files or directories you want to exclude. You might want to exclude browser cache files so your new command will look like this.
Note that each file pattern you want to exclude must have a separate exclude option.
rsync can sync files with remote hosts as either the source or the target. For the next example, let's assume that the source directory is on a remote computer with the hostname remote1 and the target directory is on the local host. Even though SSH is the default communications protocol used when transferring data to or from a remote host, I always add the ssh option. The command now looks like this.
This is the final form of my rsync backup command.
rsync has a very large number of options that you can use to customize the synchronization process. For the most part, the relatively simple commands that I have described here are perfect for making backups for my personal needs. Be sure to read the extensive man page for rsync to learn about more of its capabilities as well as the options discussed here.
Performing backups
I automated my backups because – “automate everything.” I wrote a BASH script that handles the details of creating a series of daily backups using rsync. This includes ensuring that the backup medium is mounted, generating the names for yesterday and today's backup directories, creating appropriate directory structures on the backup medium if they are not already there, performing the actual backups and unmounting the medium.
I run the script daily, early every morning, as a cron job to ensure that I never forget to perform my backups.
No backup regimen would be complete without testing. You should regularly test recovery of random files or entire directory structures to ensure not only that the backups are working, but that the data in the backups can be recovered for use after a disaster. I have seen too many instances where a backup could not be restored for one reason or another and valuable data was lost because the lack of testing prevented discovery of the problem.
Just select a file or directory to test and restore it to a test location such as /tmp so that you won't overwrite a file that may have been updated since the backup was performed. Verify that the files' contents are as you expect them to be. Restoring files from a backup made using the rsync commands above simply a matter of finding the file you want to restore from the backup and then copying it to the location you want to restore it to.
I have had a few circumstances where I have had to restore individual files and, occasionally, a complete directory structure. Most of the time this has been self-inflicted when I accidentally deleted a file or directory. At least a few times it has been due to a crashed hard drive. So those backups do come in handy.
The last step
But just creating the backups will not save your business. You need to make regular backups and keep the most recent copies at a remote location, that is not in the same building or even within a few miles of your business location, if at all possible. This helps to ensure that a large-scale disaster does not destroy all of your backups.
A reasonable option for most small businesses is to make daily backups on removable media and take the latest copy home at night. The next morning, take an older backup back to the office. You should have several rotating copies of your backups. Even better would be to take the latest backup to the bank and place it in your safe deposit box, then return with the backup from the day before.
Basic rsync commands are usually enough to manage your Linux backups, but a few extra options add speed and power to large backup sets.
It seems clear that backups are always a hot topic in the Linux world. Back in 2017, David Both offered Opensource.com readers tips on "Using rsync to back up your Linux system," and earlier this year, he published a poll asking us, "What's your primary backup strategy for the /home directory in Linux?" In another poll this year, Don Watkins asked, "Which open source backup solution do you use?"
My response is rsync. I really like rsync! There are plenty of large and complex tools on the market that may be necessary for managing tape drives or storage library devices, but a simple open source command line tool may be all you need.
Basic rsync
I managed the binary repository system for a global organization that had roughly 35,000 developers with multiple terabytes of files. I regularly moved or archived hundreds of gigabytes of data at a time. Rsync was used. This experience gave me confidence in this simple tool. (So, yes, I use it at home to back up my Linux systems.)
The basic rsync command is simple.
rsync -av SRC DST
Indeed, the rsync commands taught in any tutorial will work fine for most general situations. However, suppose we need to back up a very large amount of data. Something like a directory with 2,000 sub-directories, each holding anywhere from 50GB to 700GB of data. Running rsync on this directory could take a tremendous amount of time, particularly if you're using the checksum option, which I prefer.
Performance is likely to suffer if we try to sync large amounts of data or sync across slow network connections. Let me show you some methods I use to ensure good performance and reliability.
Advanced rsync
One of the first lines that appears when rsync runs is: "sending incremental file list." If you do a search for this line, you'll see many questions asking things like: why is it taking forever? or why does it seem to hang up?
Here's an example based on this scenario. Let's say we have a directory called /storage that we want to back up to an external USB device mounted at /media/WDPassport.
If we want to back up /storage to a USB external drive, we could use this command:
rsync -cav /storage /media/WDPassport
The c option tells rsync to use file checksums instead of timestamps to determine changed files, and this usually takes longer. In order to break down the /storage directory, I sync by subdirectory, using the find command. Here's an example:
find /storage -type d -exec rsync -cav {} /media/WDPassport \;
This looks OK, but if there are any files in the /storage directory, they will not be copied. So, how can we sync the files in /storage? There is also a small nuance where certain options will cause rsync to sync the . directory, which is the root of the source directory; this means it will sync the subdirectories twice, and we don't want that.
Long story short, the solution I settled on is a "double-incremental" script. This allows me to break down a directory, for example, breaking /home into the individual users' home directories or in cases when you have multiple large directories, such as music or family photos.
Here is an example of my script:
HOMES="alan" DRIVE="/media/WDPassport"
for HOME in $HOMES; do cd /home/$HOME rsync -cdlptgov --delete . /$DRIVE/$HOME find . -maxdepth 1 -type d -not -name "." -exec rsync -crlptgov --delete {} /$DRIVE/$HOME \; done
The first rsync command copies the files and directories that it finds in the source directory. However, it leaves the directories empty so we can iterate through them using the find command. This is done by passing the d argument, which tells rsync not to recurse the directory.
-d, --dirs transfer directories without recursing
The find command then passes each directory to rsync individually. Rsync then copies the directories' contents. This is done by passing the r argument, which tells rsync to recurse the directory.
-r, --recursive recurse into directories
This keeps the increment file that rsync uses to a manageable size.
Most rsync tutorials use the a (or archive) argument for convenience. This is actually a compound argument.
-a, --archive archive mode; equals -rlptgoD (no -H,-A,-X)
The other arguments that I pass would have been included in the a; those are l, p, t, g, and o.
-l, --links copy symlinks as symlinks -p, --perms preserve permissions -t, --times preserve modification times -g, --group preserve group -o, --owner preserve owner (super-user only)
The --delete option tells rsync to remove any files on the destination that no longer exist on the source. This way, the result is an exact duplication. You can also add an exclude for the .Trash directories or perhaps the .DS_Store files created by MacOS.
-not -name ".Trash*" -not -name ".DS_Store"
Be careful
One final recommendation: rsync can be a destructive command. Luckily, its thoughtful creators provided the ability to do "dry runs." If we include the noption, rsync will display the expected output without writing any data.
rsync -cdlptgovn --delete . /$DRIVE/$HOME
This script is scalable to very large storage sizes and large latency or slow link situations. I'm sure there is still room for improvement, as there always is. If you have suggestions, please share them in the comments.
Linux has come a long way since 1991. These events mark its evolution.
1. Linus releases Linux
Linus Torvalds initially released Linux to the world in 1991 as a hobby. It didn't remain a hobby for long!
2. Linux distributions
In 1993, several Linux distributions were founded, notably Debian, Red Hat, and Slackware. These were important because they demonstrated Linux's gains in market acceptance and development that enabled it to survive the tumultuous OS wars, browser wars, and protocol wars of the 1990s. In contrast, many established, commercial, and proprietary products did not make it past the turn of the millennium!
3. IBM's big investment in Linux
In 2000, IBM announced it would invest US$1 billion dollars in Linux. In his CNN Money article about the investment, Richard Richtmyer wrote: "The announcement underscores Big Blue's commitment to Linux and marks significant progress in moving the alternative operating system into the mainstream commercial market."
4. Hollywood adopts Linux
In 2002, it seemed the entire Hollywood movie industry adopted Linux. Disney, Dreamworks, and Industrial Light & Magic all began making movies with Linux that year.
5. Linux for national security
In 2003, another big moment came with the US government's acceptance of Linux. Red Hat Linux was awarded the Department of Defense Common Operating Environment (COE) certification. This is significant because the government—intelligence and military agencies in particular—have very strict requirements for computing systems to prevent attacks and support national security. This opened the door for other agencies to use Linux. Later that year, the National Weather Service announced it would replace outdated systems with new computers running Linux.
6. The systems I managed
This "moment" is really a collection of my personal experiences. As my career progressed in the 2000s, I discovered several types of systems and devices that I managed were all running Linux. Some of the places I found Linux were VMware ESX, F5 Big-IP, Check Point UTM Edge, Cisco ASA, and PIX. This made me realize that Linux was truly viable and here to stay.
7. Ubuntu
In 2004, Canonical was founded by Mark Shuttleworth to provide an easy-to-use Linux desktop—Ubuntu Linux—based on the Debian distribution. I think Ubuntu Linux helped to expand the desktop Linux install base. It put Linux in front of many more people, from casual home users to professional software developers.
8. Google Linux
Google released two operating systems based on the Linux kernel: the Android mobile operating system in mid-2008 and Chrome OS, running on a Chromebook, in 2011. Since then, millions of Android mobile phones and Chromebooks have been sold.
9. The cloud is Linux
In the past 10 years or so, cloud computing has gone from a grandiose vision of computing on the internet to a reinvention of how we use computers personally and professionally. The big players in the cloud space are built on Linux, including Amazon Web Services, Google Cloud Services, and Linode. Even in cases where we aren't certain, such as Microsoft Azure, running Linux workloads is well supported.
10. My car runs Linux
And so will yours! Many automakers began introducing Linux a few years ago. This led to the formation of the collaborative open source project called Automotive Grade Linux. Major car makers, such as Toyota and Subaru, have joined together to develop Linux-based automotive entertainment, navigation, and engine-management systems.
Share your favorite
This is my subjective list pulled from archives of Linux articles and events throughout my career, so there may be other more notable moments that I am overlooking. Share in the comments.
If you’ve come here looking to fix an errant recursive chmod or chown command on an RPM-based Linux system, then here is the quick solution. Run the following commands using root privileges:
rpm --setugids -a rpm --setperms -a
The
--setugids
option to the rpm command sets user/group ownership of files in a given package. By using the
-a
option we’re telling rpm to do this on all the packages. The
--setperms
option sets the permissions of files in the given package.
If this fixes your issue, great! If not, or you want to be thorough, continue reading.
Why Would You Need To Fix the Permissions and User/Group Ownership of Files
The most common reason you’ll need to follow the procedure below is to recover from a chmod or chown command that didn’t do what you initially intended it to do. Using this procedure can save you from having to perform a complete system restore or a complete system reinstall.
In any case, perhaps someone else accidentally executed a recursive chmod or chown command on part or even the entire file system. Even if the mistake is noticed and the command is stopped by typing Control-C as quickly as possible, many files could have been changed in that short period of time and you won’t be able to immediately tell which files were changed.
Problems Caused by Incorrect Permissions and Ownerships of Files
Having improper file permissions or ownerships can cause processes and services to behave in unexpected ways, stop working immediately, or prevent them from starting once they’ve been stopped.
For example, if the user running the web server process can’t read the files it’s supposed to serve, then the service it provides is effectively broken.
If a service is already running, it probably doesn’t need to read its configuration file(s) again as it has that information in memory. However, if it can’t read its configuration when it attempts to start, it simply isn’t going to start.
Also, when some services start, they create a lock file to indicate that the service is running. When the service stops, it deletes the lock file. However, if the permissions on that lock file are changed while the service is running such that the service can’t delete the file, then of course the lock file won’t get deleted. This will prevent the service from starting again as it thinks it’s already running due to the presence of the lock file.
Perhaps the file that actually needs to be executed no longer has execute permissions. Needless to say, that will definitely keep a service from starting.
If you have a service such as a database that writes data, it needs the proper permissions to write data to file, create new files, and so on.
Those are some common issues you can run into when file permissions and ownerships are not set properly.
Examples of Errant chmod and chown Commands
A common way a chmod or chown command can go wrong is by using recursion while making a typing mistake or providing an incorrect path. For example, let’s say you’ve created some configuration files in the /var/lib/pgsql directory as the root user. You want to make sure all those files are owned by the postgres user, so you intend to run this command:
chown -R postgres /var/lib/pgsql
However, you accidentally add a space between the leading forward slash and var, making the actual command executed this one:
chown -R postgres / var/lib/pgsql
Oh what a difference a space can make! Now, every file on the system is owned by the postgres user.
The reason is because chown rightly interpreted the first forward slash ( “/” ) as an absolute path to operate upon and “var/lib/pgsql” as a relative path to operate on. The chown command, and any Linux command for that matter, only does what you tell it to do. It can’t read your mind. It doesn’t know that you intended to only supply the one path of /var/lib/pgsql.
Fixing File Ownerships and Permissions with the RPM Command
Continuing with our example, you should be able to execute the following command with root privileges and return to a fairly stable state:
rpm --setugids -a
This command will return the owner and group membership for every file that was installed via an RPM package. Changing the ownership of a file can cause the set-user-ID (SUID) or set-group-ID (GUID) permission bits to be cleared. Because of this, we need to restore the permissions on the files as well:
rpm --setperms -a
Now every file that is known by rpm will have the same permissions as when it was initially installed.
By the way, use this same process to fix an errant chmod command, too. Be sure to use the same order of the commands due the SUID and GUID issues that could arise. IE, run rpm with the
--setperms
options last.
Fixing File Ownerships and Permissions for Files Not Known by RPM
Not all the files on the system are going to be part of an RPM package. Most data, either transient or permanent, will live outside of an RPM package. Examples include temporary files, files used to store database data, lock files, web site files, some configuration files, and more depending on the system in question.
At least check the most important services that the system provides. For example, if you are working on a database server, make sure the database service starts correctly. If it’s a web server, make sure the web server service is functioning.
Here is the pattern:
systemctl restart SERVICE_NAME
If the service does not start, determine the reason by looking at the logs and messages:
journalctl -xe
Fix any issues and try again until the service starts.
Example:
systemctl restart postfix # The service fails to start. journalctl -xe # The error message is “fatal: open lock file /var/lib/postfix/master.lock: cannot open file: Permission denied” # Fix the obvious error. rm /var/lib/postfix/master.lock # Make sure there aren't other files that may have permissions or ownership issues in that directory. ls -l /var/lib/postfix # There are no other files. # Try to start the service again. systemctl start postfix # No errors are reported. The service is working! Lets double-check: systemctl status postfix
You can check which services are in a failed stated by using the following command.
systemctl list-units --failed
Let’s say you reboot the system and want to make sure everything started ok. Then run the above command and troubleshoot each service as needed.
Also, if you have good service monitoring in place, check there. Your monitors should report if any service isn’t functioning appropriately and you can use this information to track down issues and fix them as needed.
A List of Directories that Are Not in the RPM Database
Here are some common places to look for files that live outside of an RPM package:
If user home directories were changed do to a recursive chmod or chown command, then they need to be fixed. If the ownership has changed, we can make an assumption that each home directory and all of its contents should be owned by the corresponding user. For example, “/home/jason” should be owned by the “jason” user and any files in “/home/jason” should be owned by the “jason” user, too. Here’s a quick script to make this happen:
cd /home for U in * do chown -R ${U} ${U} done
Be careful with the chown command because we don’t want to create another mess!
It could be the case that some files in a given home directory should not be owned by the user. If you think this might be the case, your best course of action is to restore the home directories from backups. Speaking of which…
Why Not Just Restore from Backup?
If you have a good and recent backup, restoring that backup might be a great option. If the server in question doesn’t actually store data, then it would be a perfect candidate for a restore as you won’t lose any data.
Performing a restore can give you the peace of mind that all the files on the system have the proper permissions and ownership. After you’ve rigorously checked the services, the chances of any missed files causing operational issues is low. Nevertheless, there is a possibility of an issue arising at a later date. A restore reduces this probability even further.
You could also use a hybrid approach where you run through the above process and selectively restore parts of the system.
The downside of performing a restore is that it can be slower that using the process outlined above. It’s much quicker to change the permissions on a 1 TB file than it is to restore that file.
Of course, if you don’t have a backup that you can restore then you will have to follow a process like the one outlined above.