Displaying items by tag: software




Guys, we are going to learn about the Package installation in Linux systems from this article. Basically how to install a package ( a package is  simply same as a software in a windows environment) and uninstalling it, what is a repository, how to create/enable/disable a repository, how the package installation commands change with different Linux distributions and so on.

What is a Package ?

Basically a package is a software application in a Linux operating system. Same as in windows and Mac OS, in Linux also we can install a software in a GUI environment as well as with the command line interface.

What is a Package manager ?

There are different package managers for different Linux distributions. It is very important to remember how to use different package installation commands in a Linux system. As we all know in a windows OS, we have softwares ending with .exe extension. But in Linux, the extensions may be different. It can be having an extension like .rpm, .deb or whatever.  Actually the package manager is serving as tool which access the softwares and installing/removing/modifying them.

dpkg is used by Debian, Ubuntu and apt is also supported.
rpm is used by Red Hat, Cent OS and yum is also supported.


* Important – You should be a superuser to install packages. 

So, for here testing i’m taking two AWS EC2 Linux servers ( Ubuntu and a Red Hat ). We will take one by one.

To get to know which Linux distribution you are using, try the below command,

cat /etc/*-release




Yum package manager

What is yum ? Yum is a command we can use to get the packages installed in a Red Hat, Cent OS environments. Yum is using repositories to search and install the applications.

What are Repositories ?

Red Hat or third party repositories are used as the software sources. In a repo we include links where the package managers can search for the packages.

Simply a repository looks like below.




All repositories resides in the path “/etc/yum.repos.d“. The configuration file for yum is “/etc/yum.conf“.

From here, we will check the useful command we need to know.

1) yum repolist ( This command will list your active repositories )




2) yum repolist all ( This will list all of your repositories even it is enabled or disabled )




3) yum list installed ( This will list all your installed packages )

4) yum list vim* ( This will list installed and available packages which suits for package name )




5) yum search vim ( This also searches with package names )




yum search all  ( This gives more details than above )

6) yum info vim* ( This will display information about all the packages that suits the given name )

We can identify different parts in a package as below.

eg – vim-minimal.x86_64 : A minimal version of the VIM editor

vim-minimal – Package name
x86_64 – Architecture
*Sometimes we can see package version also with the names.
7) yum provides  ( This shows packages which contains the mentioned path names ). Not only path names, we can use application names also if we are not sure to check.
eg – yum provides tree



yum provides /var/www/html




8) yum install httpd ( This will install httpd package into the system )




yum install httpd -y ( This command will install the package without asking for entering yes or no at the end. )

9) yum update  ( This will update the package to the latest version. )
10) yum remove  ( This will uninstall the installed package )
11) yum list kernel ( This will display installed and available kernel software versions )




yum update kernel ( This will update the kernel to the latest version )

12) How to create a new repo
Go to the file location – /etc/repos.d
create your new repo – example.repo



[examplerepo] – repo id

Example Repolist – repo name
baseurl – source url
enabled – status of the repo ( can be enabled or disabled )




Apt package manager

Apt package manager is used in Ubuntu and Debian like Linux distributions.  Below listed commands would be helped.
apt actually works on a package database. The system will not know about is there are updates for the packages, if the package database is not updated. Because of this updating the package database is essential.
1) apt-get update ( This command will update the package database )
2) apt-get upgrade ( This will upgrade all the software to the latest version. )
3) apt-get install  ( This will install the package )
4) apt remove  ( This will remove the package binary file except configs)
5) apt purge  ( This will remove all files with the configuration files related to the package )
6) apt show  ( Display information about the package )
7) apt list  ( list the packages with the given name )
So, guys here are some of the basic very useful commands as in the above. You could know more than these commands with the usage.



Published in GNU/Linux Rules!
Friday, 28 February 2020 13:27

Linux Commands: The easy way



find command is a very essential command in the linux operating system. We need to use this amazing command to find files within our system hierarchy.

In a windows operating system we can use search option very simply in the GUI. Likewise we can use find linux command here to find and grab the files as our need.

Same as other linux commands find command also having so  many command options, like to search recursively in the files, to find the files with considering modified dates, accessed dates, files considering their sizes, files considering ownerships and  permissions,  and with so many options.

Also we can use the pipeline and redirection to have the output of the find command and pass it to another operation. As an example we can type the find command to find some files and delete those files with a single linux command. For that purpose we are using pipeline with –

exec or xarg command. We will discuss about -exec later. After this we will discuss the find command and it’s examples with the options. 



Syntax –

find [location] [options] [what to find]

  • location : The directories where you need to search. This can be a single location or multiple locations.
    eg – if we need to find a file under root directory, the location should be root directory.
    So, the command should start like find /
  • options : find command has so many options to optimise the search. Will discuss on below of the article.
  • What to find : The name of the file which you need to find.
    eg – if we need to find all files having the extention of .cpp under root directory, the command should be find / -name *.cpp. Here -name is the option we used to determine the file name. If we use to find a file giving a name of the file, we must use the option name. 




How to use find command with examples.


1) find files named “example.txt” in your current location

find . -name “example.txt”


2) find files named “passwd” under root directory

find / name “passwd”



 3) find files named with case insensetively.

Guess we have two files named, Text_file and text_file. These two words differ with capital T.If we need to ignore case sensitivity we need to use option name as -iname. 

find / -name “text_file” ( Case sensitive )

find / -iname “text_file” ( Case Insensitive )



 4) find file “example.txt” under your home directory

find /home -iname “example.txt”


5) find files which having the extention of .php under your home directory

find /home -type f -name “*.php”

here -type option is used to determine the type of the file, to recognize is it a file or a directory which you are searching. if you are finding a directory the option should be -type d on the above



6) find directories named example under your home directory

find /home -type d -name “example”


7) find files in more than one location

find / /home -iname “student”

Above command find the files named student in both places of root directory and /home directory.


8) find emty files

find / -type f -empty


9) find empty directories

find / -type d -empty


10) find files which have permissions of 777

find / -type f -perm 777 

-perm option is used to determine permissions with the find command. if you want to find permissions of 644, it should be like -perm 644.


11) find files not having 777 permissions

find / -type f ! -perm 777


is used to mention NOT.

12) find files which are set to SETUID

find / -perm /u=s


13) find files which are set to SETGID

find / -perm /g=s


14) find files read only files

find / -perm /u=r


15) find files and remove them

find /home -iname “*.cpp” -exec rm -rf {} ;


We will break this command into two parts.

  •  find part is ok for you. It finds all files having .cpp extension under your home directory.
  •  -exec rm -rf {} ; This part is taking the output of the find section and executes the rm -rf command to remove the searched files.  So the output of the find command is going to store inside of {} as an input to rm -rf command and -exec option makes the command as executable.


16) find files having 755 permissions and change back to 644 permissions.

find / -iname “*.cpp” -perm 755 -exec chmod 644 {} ;


17) find files based on users

find / -iname “example” -user student

Above command finds files with example named which is owned to user student.


18) find files based on group

find / -iname “example” -group user


This finds files with name example with group name user.


19) find modified files with given dates

find / -iname “*.txt” -mtime 7

find files having .txt extention under root directory which is modified on 7 days back.

find / -iname “*.txt” -mtime +7

find files having .txt extention under root directory which is modified on more than 7 days.

find / -iname “*.txt” -mtime -7

find files having .txt extention under root directory which is modified within 7 days.

find / -iname “*.txt” -mmin 7

find files having .txt extention under root directory which is modified before 7 minutes from now.

mtime is used to mention in days and mmin is used to mention in minutes. 



20) find files which accessed on 10 days back

find / -iname “*.txt” -atime 10


21) find files which are changed on before 10 minutes

find / -iname “*.txt” -cmin 10 


22) find files with size

find / -size +50M

find files which are more than 50M in size.

find / -size +50M -size -200M 

find files which sizes are more than 50M and less than 200M.


So, we have discussed alot of options can be used with find command. You can refer the internet to find more and more.




Published in GNU/Linux Rules!

DSLs are used for a specific context in a particular domain. Learn more about what they are and why you might want to use one.



domain-specific language (DSL) is a language meant for use in the context of a particular domain. A domain could be a business context (e.g., banking, insurance, etc.) or an application context (e.g., a web application, database, etc.) In contrast, a general-purpose language (GPL) can be used for a wide range of business problems and applications.

A DSL does not attempt to please all. Instead, it is created for a limited sphere of applicability and use, but it's powerful enough to represent and address the problems and solutions in that sphere. A good example of a DSL is HTML. It is a language for the web application domain. It can't be used for, say, number crunching, but it is clear how widely used HTML is on the web.

A GPL creator does not know where the language might be used or the problems the user intends to solve with it. So, a GPL is created with generic constructs that potentially are usable for any problem, solution, business, or need. Java is a GPL, as it's used on desktops and mobile devices, embedded in the web across banking, finance, insurance, manufacturing, etc., and more.

Classifying DSLs

In the DSL world, there are two types of languages:

  • Domain-specific language (DSL): The language in which a DSL is written or presented
  • Host language: The language in which a DSL is executed or processed

A DSL written in a distinct language and processed by another host language is called an external DSL.

This is a DSL in SQL that can be processed in a host language:

SELECT account
FROM accounts
WHERE account = '123' AND branch = 'abc' AND amount >= 1000



For that matter, a DSL could be written in English with a defined vocabulary and form that can be processed in another host language using a parser generator like ANTLR:

if smokes then increase premium by 10%

If the DSL and host language are the same, then the DSL type is internal, where the DSL is written in the language's semantics and processed by it. These are also referred to as embedded DSLs. Here are two examples.

  • A Bash DSL that can be executed in a Bash engine:
    if today_is_christmas; then apply_christmas_discount; fi
    This is valid Bash that is written like English.
  • A DSL written in a GPL like Java:
    orderValue = orderValue
    This uses a fluent style and is readable like English.

Yes, the boundaries between DSL and GPL sometimes blur.



DSL examples

Some languages used for DSLs include:

  • Web: HTML
  • Shell: sh, Bash, CSH, and the likes for *nix; MS-DOS, Windows Terminal, PowerShell for Windows
  • Markup languages: XML
  • Modeling: UML
  • Data management: SQL and its variants
  • Business rules: Drools
  • Hardware: Verilog, VHD
  • Build tools: Maven, Gradle
  • Numerical computation and simulation: MATLAB (commercial), GNU Octave, Scilab
  • Various types of parsers and generators: Lex, YACC, GNU Bison, ANTLR


Why DSL?

The purpose of a DSL is to capture or document the requirements and behavior of one domain. A DSL's usage might be even narrower for particular aspects within the domain (e.g., commodities trading in finance). DSLs bring business and technical teams together. This does not imply a DSL is for business use alone. For example, designers and developers can use a DSL to represent or design an application.

A DSL can also be used to generate source code for an addressed domain or problem. However, code generation from a DSL is not considered mandatory, as its primary purpose is domain knowledge. However, when it is used, code generation is a serious advantage in domain engineering.


DSL pros and cons

On the plus side, DSLs are powerful for capturing a domain's attributes. Also, since DSLs are small, they are easy to learn and use. Finally, a DSL offers a language for domain experts and between domain experts and developers.

On the downside, a DSL is narrowly used within the intended domain and purpose. Also, a DSL has a learning curve, although it may not be very high. Additionally, although there may be advantages to using tools for DSL capture, they are not essential, and the development or configuration of such tools is an added effort. Finally, DSL creators need domain knowledge as well as language-development knowledge, and individuals rarely have both.



DSL software options


Open source DSL software options include:

  • Xtext: Xtext enables the development of DSLs and is integrated with Eclipse. It makes code generation possible and has been used by several open source and commercial products to provide specific functions. MADS (Multipurpose Agricultural Data System) is an interesting idea based on Xtext for "modeling and analysis of agricultural activities" (however, the project seems to be no longer active).
  • JetBrains MPS: JetBrains MPS is an integrated development environment (IDE) to create DSLs. It calls itself a projectional editor that stores a document as its underlying abstract tree structure. (This concept is also used by programs such as Microsoft Word.) JetBrains MPS also supports code generation to Java, C, JavaScript, or XML.


DSL best practices

Want to use a DSL? Here are a few tips:

  • DSLs are not GPLs. Try to address limited ranges of problems in the definitive domain.
  • You do not need to define your own DSL. That would be tedious. Look for an existing DSL that solves your need on sites like DSLFIN, which lists DSLs for the finance domain. If you are unable to find a suitable DSL, you could define your own.
  • It is better to make DSLs "like English" rather than too technical.
  • Code generation from a DSL is not mandatory, but it offers significant and productive advantages when it is done.
  • DSLs are called languages but, unlike GPLs, they need not be executable. Being executable is not the intent of a DSL.
  • DSLs can be written with word processors. However, using a DSL editor makes syntax and semantics checks easier.

If you are using DSL now or plan to do so in the future, please share your experience in the comments.



Published in GNU/Linux Rules!
Tuesday, 15 October 2019 16:36

Benefits of centralizing GNOME in GitLabs

The GNOME project's decision to centralize on GitLab is creating benefits across the community—even beyond the developers.



“What’s your GitLab?” is one of the first questions I was asked on my first day working for the GNOME Foundation—the nonprofit that supports GNOME projects, including the desktop environment, GTK, and GStreamer. The person was referring to my username on GNOME’s GitLab instance. In my time with GNOME, I’ve been asked for my GitLab a lot.

We use GitLab for basically everything. In a typical day, I get several issues and reference bug reports, and I occasionally need to modify a file. I don’t do this in the capacity of being a developer or a sysadmin. I’m involved with the Engagement and Inclusion & Diversity (I&D) teams. I write newsletters for Friends of GNOME and interview contributors to the project. I work on sponsorships for GNOME events. I don’t write code, and I use GitLab every day.


The GNOME project has been managed a lot of ways over the past two decades. Different parts of the project used different systems to track changes to code, collaborate, and share information both as a project and as a social space. However, the project made the decision that it needed to become more integrated and it took about a year from conception to completion. There were a number of reasons GNOME wanted to switch to a single tool for use across the community. External projects touch GNOME, and providing them an easier way to interact with resources was important for the project, both to support the community and to grow the ecosystem. We also wanted to better track metrics for GNOME—the number of contributors, the type and number of contributions, and the developmental progress of different parts of the project.

When it came time to pick a collaboration tool, we considered what we needed. One of the most important requirements was that it must be hosted by the GNOME community; being hosted by a third party didn’t feel like an option, so that discounted services like GitHub and Atlassian. And, of course, it had to be free software. It quickly became obvious that the only real contender was GitLab. We wanted to make sure contribution would be easy. GitLab has features like single sign-on, which allows people to use GitHub, Google, GitLab.com, and GNOME accounts.

We agreed that GitLab was the way to go, and we began to migrate from many tools to a single tool. GNOME board member Carlos Soriano led the charge. With lots of support from GitLab and the GNOME community, we completed the process in May 2018.

There was a lot of hope that moving to GitLab would help grow the community and make contributing easier. Because GNOME previously used so many different tools, including Bugzilla and CGit, it’s hard to quantitatively measure how the switch has impacted the number of contributions. We can more clearly track some statistics though, such as the nearly 10,000 issues closed and 7,085 merge requests merged between June and November 2018. People feel that the community has grown and become more welcoming and that contribution is, in fact, easier.

People come to free software from all sorts of different starting points, and it’s important to try to even out the playing field by providing better resources and extra support for people who need them. Git, as a tool, is widely used, and more people are coming to participate in free software with those skills ready to go. Self-hosting GitLab provides the perfect opportunity to combine the familiarity of Git with the feature-rich, user-friendly environment provided by GitLab.

It’s been a little over a year, and the change is really noticeable. Continuous integration (CI) has been a huge benefit for development, and it has been completely integrated into nearly every part of GNOME. Teams that aren’t doing code development have also switched to using the GitLab ecosystem for their work. Whether it’s using issue tracking to manage assigned tasks or version control to share and manage assets, even teams like Engagement and I&D have taken up using GitLab.

It can be hard for a community, even one developing free software, to adapt to a new technology or tool. It is especially hard in a case like GNOME, a project that recently turned 22. After more than two decades of building a project like GNOME, with so many parts used by so many people and organizations, the migration was an endeavor that was only possible thanks to the hard work of the GNOME community and generous assistance from GitLab.

I find a lot of convenience in working for a project that uses Git for version control. It’s a system that feels comfortable and is familiar—it’s a tool that is consistent across workplaces and hobby projects. As a new member of the GNOME community, it was great to be able to jump in and just use GitLab. As a community builder, it’s inspiring to see the results: more associated projects coming on board and entering the ecosystem; new contributors and community members making their first contributions to the project; and increased ability to measure the work we’re doing to know it’s effective and successful.

It’s great that so many teams doing completely different things (such as what they’re working on and what skills they’re using) agree to centralize on any tool—especially one that is considered a standard across open source. As a contributor to GNOME, I really appreciate that we’re using GitLab.


Published in GNU/Linux Rules!
Tagged under

DevSecOps evolves DevOps to ensure security remains an essential part of the process.


DevOps is well-understood in the IT world by now, but it's not flawless. Imagine you have implemented all of the DevOps engineering practices in modern application delivery for a project. You've reached the end of the development pipeline—but a penetration testing team (internal or external) has detected a security flaw and come up with a report. Now you have to re-initiate all of your processes and ask developers to fix the flaw.

This is not terribly tedious in a DevOps-based software development lifecycle (SDLC) system—but it does consume time and affects the delivery schedule. If security were integrated from the start of the SDLC, you might have tracked down the glitch and eliminated it on the go. But pushing security to the end of the development pipeline, as in the above scenario, leads to a longer development lifecycle.

This is the reason for introducing DevSecOps, which consolidates the overall software delivery cycle in an automated way.

In modern DevOps methodologies, where containers are widely used by organizations to host applications, we see greater use of Kubernetes and Istio. However, these tools have their own vulnerabilities. For example, the Cloud Native Computing Foundation (CNCF) recently completed a Kubernetes security audit that identified several issues. All tools used in the DevOps pipeline need to undergo security checks while running in the pipeline, and DevSecOps pushes admins to monitor the tools' repositories for upgrades and patches.


What Is DevSecOps?

Like DevOps, DevSecOps is a mindset or a culture that developers and IT operations teams follow while developing and deploying software applications. It integrates active and automated security audits and penetration testing into agile application development.



To utilize DevSecOps, you need to:

Introduce the concept of security right from the start of the SDLC to minimize vulnerabilities in software code. Ensure everyone (including developers and IT operations teams) shares responsibility for following security practices in their tasks. Integrate security controls, tools, and processes at the start of the DevOps workflow. These will enable automated security checks at each stage of software delivery. DevOps has always been about including security—as well as quality assurance (QA), database administration, and everyone else—in the dev and release process. However, DevSecOps is an evolution of that process to ensure security is never forgotten as an essential part of the process.


Understanding the DevSecOps pipeline

There are different stages in a typical DevOps pipeline; a typical SDLC process includes phases like Plan, Code, Build, Test, Release, and Deploy. In DevSecOps, specific security checks are applied in each phase.


Plan: Execute security analysis and create a test plan to determine scenarios for where, how, and when testing will be done.

Code: Deploy linting tools and Git controls to secure passwords and API keys.

Build: While building code for execution, incorporate static application security testing (SAST) tools to track down flaws in code before deploying to production. These tools are specific to programming languages.

Test: Use dynamic application security testing (DAST) tools to test your application while in runtime. These tools can detect errors associated with user authentication, authorization, SQL injection, and API-related endpoints.

Release: Just before releasing the application, employ security analysis tools to perform thorough penetration testing and vulnerability scanning.

Deploy: After completing the above tests in runtime, send a secure build to production for final deployment.


DevSecOps tools

Tools are available for every phase of the SDLC. Some are commercial products, but most are open source. In my next article, I will talk more about the tools to use in different stages of the pipeline.

DevSecOps will play a more crucial role as we continue to see an increase in the complexity of enterprise security threats built on modern IT infrastructure. However, the DevSecOps pipeline will need to improve over time, rather than simply relying on implementing all security changes simultaneously. This will eliminate the possibility of backtracking or the failure of application delivery.



Published in GNU/Linux Rules!


Access your Android device from your PC with this open source application based on scrcpy.



In the future, all the information you need will be just one gesture away, and it will all appear in midair as a hologram that you can interact with even while you're driving your flying car. That's the future, though, and until that arrives, we're all stuck with information spread across a laptop, a phone, a tablet, and a smart refrigerator. Unfortunately, that means when we need information from a device, we generally have to look at that device.

While not quite holographic terminals or flying cars, guiscrcpy by developer Srevin Saju is an application that consolidates multiple screens in one location and helps to capture that futuristic feeling.

Guiscrcpy is an open source (GNU GPLv3 licensed) project based on the award-winning scrcpy open source engine. With guiscrcpy, you can cast your Android screen onto your computer screen so you can view it along with everything else. Guiscrcpy supports Linux, Windows, and MacOS.

Unlike many scrcpy alternatives, Guiscrcpy is not a fork of scrcpy. The project prioritizes collaborating with other open source projects, so Guiscrcpy is an extension, or a graphical user interface (GUI) layer, for scrcpy. Keeping the Python 3 GUI separate from scrcpy ensures that nothing interferes with the efficiency of the scrcpy backend. You can screencast up to 1080p resolution and, because it uses ultrafast rendering and surprisingly little CPU, it works even on a relatively low-end PC.


Scrcpy, Guiscrcpy's foundation, is a command-line application, so it doesn't have GUI buttons to handle gestures, it doesn't provide a Back or Home button, and it requires familiarity with the Linux terminal. Guiscrcpy adds GUI panels to scrcpy, so any user can run it—and cast and control their device—without sending any information over the internet. Everything works over USB or WiFi (using only a local network). Guiscrcpy also adds a desktop launcher to Linux and Windows systems and provides compiled binaries for Linux and Windows.


Installing Guiscrcpy

Before installing Guiscrcpy, you must install its dependencies, most notably scrcpy. Possibly the easiest way to install scrcpy is with snap, which is available for most major Linux distributions. If you have snap installed and active, then you can install scrcpy with one easy command:


$ sudo snap install scrcpy


While it's installing, you can install the other dependencies. The Simple DirectMedia Layer (SDL 2.0) toolkit is required to display and interact with the phone screen, and the Android Debug Bridge (adb) command connects your computer to your Android phone.

On Fedora or CentOS:



$ sudo dnf install SDL2 android-tools


On Ubuntu or Debian:


$ sudo apt install SDL2 android-tools-adb


In another terminal, install the Python dependencies:


$ python3 -m pip install -r requirements.txt --user


Setting up your phone


For your phone to accept an adb connection, it must have Developer Mode enabled. To enable Developer Mode on Android, go to Settings and select About phone. In About phone, find the Build number (it may be in the Software information panel). Believe it or not, to enable Developer Mode, tap Build number seven times in a row.




For full instructions on all the many ways you can configure your phone for access from your computer, read the Android developer documentation.

Once that's set up, plug your phone into a USB port on your computer (or ensure that you've configured it correctly to connect over WiFi).


Using guiscrcpy

When you launch guiscrcpy, you see its main control window. In this window, click the Start scrcpy button. This connects to your phone, as long as it's set up in Developer Mode and connected to your computer over USB or WiFi.



It also includes a configuration-writing system, where you can write a configuration file to your ~/.config directory to preserve your preferences between uses.

The bottom panel of guiscrcpy is a floating window that helps you perform basic controlling actions. It has buttons for Home, Back, Power, and more. These are common functions on Android devices, but an important feature of this module is that it doesn't interact with scrcpy's SDL, so it can function with no lag. In other words, this panel communicates directly with your connected device through adb rather than scrcpy.




The project is in active development and new features are still being added. The latest build has an interface for gestures and notifications.

With guiscrcpy, you not only see your phone on your screen, but you can also interact with it, either by clicking the SDL window itself, just as you would tap your physical phone, or by using the buttons on the panels.



Guiscrcpy is a fun and useful application that provides features that ought to be official features of any modern device, especially a platform like Android. Try it out yourself, and add some futuristic pragmatism to your present-day digital life.



Published in GNU/Linux Rules!
Wednesday, 08 May 2019 23:04

Using rsync to back up your Linux system

Find out how to use rsync in a backup scenario.


Backups are an incredibly important aspect of a system administrator’s job. Without good backups and a well-planned backup policy and process, it is a near certainty that sooner or later some critical data will be irretrievably lost.

All companies, regardless of how large or small, run on their data. Consider the financial and business cost of losing all of the data you need to run your business. There is not a business today ranging from the smallest sole proprietorship to the largest global corporation that could survive the loss of all or even a large fraction of its data. Your place of business can be rebuilt using insurance, but your data can never be rebuilt.

By loss, here, I don't mean stolen data; that is an entirely different type of disaster. What I mean here is the complete destruction of the data.

Even if you are an individual and not running a large corporation, backing up your data is very important. I have two decades of personal financial data as well as that for my now closed businesses, including a large number of electronic receipts. I also have many documents, presentations, and spreadsheets of various types that I have created over the years. I really don't want to lose all of that.

So backups are imperative to ensure the long-term safety of my data.


Backup options


There are many options for performing backups. Most Linux distributions are provided with one or more open source programs specially designed to perform backups. There are many commercial options available as well. But none of those directly met my needs so I decided to use basic Linux tools to do the job.

In my article for the Open Source Yearbook last year, Best Couple of 2015: tar and ssh, I showed that fancy and expensive backup programs are not really necessary to design and implement a viable backup program.

Since last year, I have been experimenting with another backup option, the rsync command which has some very interesting features that I have been able to use to good advantage. My primary objectives were to create backups from which users could locate and restore files without having to untar a backup tarball, and to reduce the amount of time taken to create the backups.

This article is intended only to describe my own use of rsync in a backup scenario. It is not a look at all of the capabilities of rsync or the many ways in which it can be used.


The rsync command

The rsync command was written by Andrew Tridgell and Paul Mackerras and first released in 1996. The primary intention for rsync is to remotely synchronize the files on one computer with those on another. Did you notice what they did to create the name there? rsync is open source software and is provided with almost all major distributions.

The rsync command can be used to synchronize two directories or directory trees whether they are on the same computer or on different computers but it can do so much more than that. rsync creates or updates the target directory to be identical to the source directory. The target directory is freely accessible by all the usual Linux tools because it is not stored in a tarball or zip file or any other archival file type; it is just a regular directory with regular files that can be navigated by regular users using basic Linux tools. This meets one of my primary objectives.

One of the most important features of rsync is the method it uses to synchronize preexisting files that have changed in the source directory. Rather than copying the entire file from the source, it uses checksums to compare blocks of the source and target files. If all of the blocks in the two files are the same, no data is transferred. If the data differs, only the block that has changed on the source is transferred to the target. This saves an immense amount of time and network bandwidth for remote sync. For example, when I first used my rsync Bash script to back up all of my hosts to a large external USB hard drive, it took about three hours. That is because all of the data had to be transferred. Subsequent syncs took 3-8 minutes of real time, depending upon how many files had been changed or created since the previous sync. I used the time command to determine this so it is empirical data. Last night, for example, it took just over three minutes to complete a sync of approximately 750GB of data from six remote systems and the local workstation. Of course, only a few hundred megabytes of data were actually altered during the day and needed to be synchronized.

The following simple rsync command can be used to synchronize the contents of two directories and any of their subdirectories. That is, the contents of the target directory are synchronized with the contents of the source directory so that at the end of the sync, the target directory is identical to the source directory.


rsync -aH sourcedir targetdir


The -a option is for archive mode which preserves permissions, ownerships and symbolic (soft) links. The -H is used to preserve hard links. Note that either the source or target directories can be on a remote host.

Now let's assume that yesterday we used rsync to synchronized two directories. Today we want to resync them, but we have deleted some files from the source directory. The normal way in which rsync would do this is to simply copy all the new or changed files to the target location and leave the deleted files in place on the target. This may be the behavior you want, but if you would prefer that files deleted from the source also be deleted from the target, you can add the --delete option to make that happen.


Another interesting option, and my personal favorite because it increases the power and flexibility of rsync immensely, is the --link-dest option. The --link-dest option allows a series of daily backups that take up very little additional space for each day and also take very little time to create.


Specify the previous day's target directory with this option and a new directory for today. rsync then creates today's new directory and a hard link for each file in yesterday's directory is created in today's directory. So we now have a bunch of hard links to yesterday's files in today's directory. No new files have been created or duplicated. Just a bunch of hard links have been created. Wikipedia has a very good description of hard links. After creating the target directory for today with this set of hard links to yesterday's target directory, rsync performs its sync as usual, but when a change is detected in a file, the target hard link is replaced by a copy of the file from yesterday and the changes to the file are then copied from the source to the target.


So now our command looks like the following.

rsync -aH --delete --link-dest=yesterdaystargetdir sourcedir todaystargetdir


There are also times when it is desirable to exclude certain directories or files from being synchronized. For this, there is the --exclude option. Use this option and the pattern for the files or directories you want to exclude. You might want to exclude browser cache files so your new command will look like this.


rsync -aH --delete --exclude Cache --link-dest=yesterdaystargetdir sourcedir todaystargetdir

Note that each file pattern you want to exclude must have a separate exclude option.

rsync can sync files with remote hosts as either the source or the target. For the next example, let's assume that the source directory is on a remote computer with the hostname remote1 and the target directory is on the local host. Even though SSH is the default communications protocol used when transferring data to or from a remote host, I always add the ssh option. The command now looks like this.


rsync -aH -e ssh --delete --exclude Cache --link-dest=yesterdaystargetdir remote1:sourcedir todaystargetdir


This is the final form of my rsync backup command.

rsync has a very large number of options that you can use to customize the synchronization process. For the most part, the relatively simple commands that I have described here are perfect for making backups for my personal needs. Be sure to read the extensive man page for rsync to learn about more of its capabilities as well as the options discussed here.

Performing backups

I automated my backups because – “automate everything.” I wrote a BASH script that handles the details of creating a series of daily backups using rsync. This includes ensuring that the backup medium is mounted, generating the names for yesterday and today's backup directories, creating appropriate directory structures on the backup medium if they are not already there, performing the actual backups and unmounting the medium.

I run the script daily, early every morning, as a cron job to ensure that I never forget to perform my backups.

My script, rsbu, and its configuration file, rsbu.conf, are available at https://github.com/opensourceway/rsync-backup-script


Recovery testing

No backup regimen would be complete without testing. You should regularly test recovery of random files or entire directory structures to ensure not only that the backups are working, but that the data in the backups can be recovered for use after a disaster. I have seen too many instances where a backup could not be restored for one reason or another and valuable data was lost because the lack of testing prevented discovery of the problem.

Just select a file or directory to test and restore it to a test location such as /tmp so that you won't overwrite a file that may have been updated since the backup was performed. Verify that the files' contents are as you expect them to be. Restoring files from a backup made using the rsync commands above simply a matter of finding the file you want to restore from the backup and then copying it to the location you want to restore it to.

I have had a few circumstances where I have had to restore individual files and, occasionally, a complete directory structure. Most of the time this has been self-inflicted when I accidentally deleted a file or directory. At least a few times it has been due to a crashed hard drive. So those backups do come in handy.


The last step

But just creating the backups will not save your business. You need to make regular backups and keep the most recent copies at a remote location, that is not in the same building or even within a few miles of your business location, if at all possible. This helps to ensure that a large-scale disaster does not destroy all of your backups.

A reasonable option for most small businesses is to make daily backups on removable media and take the latest copy home at night. The next morning, take an older backup back to the office. You should have several rotating copies of your backups. Even better would be to take the latest backup to the bank and place it in your safe deposit box, then return with the backup from the day before.

Source: opensource.com

Marielle Price

Published in GNU/Linux Rules!

Basic rsync commands are usually enough to manage your Linux backups, but a few extra options add speed and power to large backup sets.



It seems clear that backups are always a hot topic in the Linux world. Back in 2017, David Both offered Opensource.com readers tips on "Using rsync to back up your Linux system," and earlier this year, he published a poll asking us, "What's your primary backup strategy for the /home directory in Linux?" In another poll this year, Don Watkins asked, "Which open source backup solution do you use?"

My response is rsync. I really like rsync! There are plenty of large and complex tools on the market that may be necessary for managing tape drives or storage library devices, but a simple open source command line tool may be all you need.

Basic rsync

I managed the binary repository system for a global organization that had roughly 35,000 developers with multiple terabytes of files. I regularly moved or archived hundreds of gigabytes of data at a time. Rsync was used. This experience gave me confidence in this simple tool. (So, yes, I use it at home to back up my Linux systems.)



The basic rsync command is simple.

rsync -av SRC DST

Indeed, the rsync commands taught in any tutorial will work fine for most general situations. However, suppose we need to back up a very large amount of data. Something like a directory with 2,000 sub-directories, each holding anywhere from 50GB to 700GB of data. Running rsync on this directory could take a tremendous amount of time, particularly if you're using the checksum option, which I prefer.

Performance is likely to suffer if we try to sync large amounts of data or sync across slow network connections. Let me show you some methods I use to ensure good performance and reliability.


Advanced rsync

One of the first lines that appears when rsync runs is: "sending incremental file list." If you do a search for this line, you'll see many questions asking things like: why is it taking forever? or why does it seem to hang up?

Here's an example based on this scenario. Let's say we have a directory called /storage that we want to back up to an external USB device mounted at /media/WDPassport.

If we want to back up /storage to a USB external drive, we could use this command:

rsync -cav /storage /media/WDPassport


The c option tells rsync to use file checksums instead of timestamps to determine changed files, and this usually takes longer. In order to break down the /storage directory, I sync by subdirectory, using the find command. Here's an example:

find /storage -type d -exec rsync -cav {} /media/WDPassport \;


This looks OK, but if there are any files in the /storage directory, they will not be copied. So, how can we sync the files in /storage? There is also a small nuance where certain options will cause rsync to sync the . directory, which is the root of the source directory; this means it will sync the subdirectories twice, and we don't want that.

Long story short, the solution I settled on is a "double-incremental" script. This allows me to break down a directory, for example, breaking /home into the individual users' home directories or in cases when you have multiple large directories, such as music or family photos.

Here is an example of my script:


for HOME in $HOMES; do
     cd /home/$HOME
     rsync -cdlptgov --delete . /$DRIVE/$HOME
     find . -maxdepth 1 -type d -not -name "." -exec rsync -crlptgov --delete {} /$DRIVE/$HOME \;


The first rsync command copies the files and directories that it finds in the source directory. However, it leaves the directories empty so we can iterate through them using the find command. This is done by passing the d argument, which tells rsync not to recurse the directory.

-d, --dirs                  transfer directories without recursing


The find command then passes each directory to rsync individually. Rsync then copies the directories' contents. This is done by passing the r argument, which tells rsync to recurse the directory.

-r, --recursive             recurse into directories


This keeps the increment file that rsync uses to a manageable size.

Most rsync tutorials use the a (or archive) argument for convenience. This is actually a compound argument.

-a, --archive               archive mode; equals -rlptgoD (no -H,-A,-X)


The other arguments that I pass would have been included in the a; those are lptg, and o.

-l, --links                 copy symlinks as symlinks
-p, --perms                 preserve permissions
-t, --times                 preserve modification times
-g, --group                 preserve group
-o, --owner                 preserve owner (super-user only)


The --delete option tells rsync to remove any files on the destination that no longer exist on the source. This way, the result is an exact duplication. You can also add an exclude for the .Trash directories or perhaps the .DS_Store files created by MacOS.

-not -name ".Trash*" -not -name ".DS_Store"


Be careful

One final recommendation: rsync can be a destructive command. Luckily, its thoughtful creators provided the ability to do "dry runs." If we include the noption, rsync will display the expected output without writing any data.

rsync -cdlptgovn --delete . /$DRIVE/$HOME

This script is scalable to very large storage sizes and large latency or slow link situations. I'm sure there is still room for improvement, as there always is. If you have suggestions, please share them in the comments.

Source: opensource.com


Marielle Price 

Published in GNU/Linux Rules!
Tagged under