Filtering files and folders in Linux using find and grep

How to Filter files and folders in the Linux command line?

In this article, you will be learning about how to filter the Linux files and folders using piping, pattern, matching, find, grep, etc

Let’s get started.

The grep utility searches any given input files, selecting lines that match one or more patterns. By default, a pattern matches an input line if the
regular expression (RE) in the pattern matches the input line without its trailing newline. An empty expression matches every line. Each input line that matches at least one of the patterns is written to the standard output.

You can use the grep --help or man grep to get the description and usage of the grep from the terminal itself without searching it on google.

man grep command to get the description on terminal

If you want to search the pattern or string from a file using the grep command use the below syntax.

grep [STRING] [PATH]example:
grep false /etc/passwd

A detailed explanatory video on the Overview of the grep command in Linux

📌 You need to clone the data to practice Linux commands by following this Article

After cloning the data directory you need to cd into orders directory by using the following command cd retail_db/orders

once you are in the orders folder you can run the following command 👇🏻

grep COMPLETE part-00000

To get the count of the keyword in the file you need to pipe the output of the above command to wc -l. For piping, you use the symbol | between two commands. The command and output look like this👇🏻

Get the count of lines in the file using grep and piping

If you want to get the count of lines that contains either of the words COMPLETE or CLOSE then you need to use egrep

📌 grep is used for simple patterns and basic regular expressions (BREs); egrep can handle extended regular expressions (EREs).

If you want to perform the case insensitive matching you need to use -i control argument along with the grep.

-i, — ignore-casePerform case insensitive matching. By default, grep is case sensitive.grep -i pending part-00000 | wc -l

The below picture shows the grep command with or without using the -i control argument.

Using -i control argument with grep to Perform case insensitive matching

A few basic examples using grep and piping detailed video

You can pipe the ls command to grep command using | pipe to get the folder with a specific string.

for example, you can use the following command ls -ltr | grep hr for this to work you need to be in the data directory of the repo.

grep the folders which contain the specific string

To get the count of the number of hr folders in the data directory you can use the following command ls -ltr | grep hr | wc -l

Get the count of folders with the specific name using grep and piping ls command

If you want to recursively go through the folders and search for a specific string pattern you can use the following commands with piping.

ls -lR | grep part-r

The output of the above command will look like this.

recursively search the folder with specific pattern files

A Overview of Piping while running shell commands detailed video

You can use the basic pattern matching in the ls -ltr command to get the specific files that match the given pattern. Here you are using the data/nyse_all/nyse_data directory. Check out the below picture for reference.

If you want to find the files and folder which start with the specific pattern in the directory you can use the following command

ls -ltr nyse*

Find the file or folder which starts with a specific pattern in the directory

Overview of Basic Pattern Matching detailed video

The wcutility displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified) to the standard
output. A line is defined as a string of characters delimited by a ⟨newline⟩ character. Characters beyond the final ⟨newline⟩ character will not be
included in the line count.

A word is defined as a string of characters delimited by white space characters.

You can use man wc or wc --help to know more about the wc command from the Linux terminal

If you go inside the data/retail_db/orders and run the wc * command. You can get the count of lines, words, and characters in the file inside the directory.

If you want to get the number of words, lines, and characters separately in that file you can run the following command with the control argument -w -l -c

wc [-clmw] [file ...]-w      The number of words in each input file is written to the standard output.
-l      The number of lines in each input file is written to the standard output.
-c      The number of bytes in each input file is written to the standard output.  This will cancel out any prior usage of the -m option.

The output will look like the following👇🏻

wc command using along with -w, -l, and -c control arguments

You can use wc command to find all the files with a specific pattern in the directory /data/retail_db

Using wc command to find the pattern */*

You can also use wc on top of the other command by using the pipe. For example, you can pipe it to ls command and grep command.

using pipe in Linux to get desired output from multiple commands

Deep Dive into wc command to get word count or line count detailed video

The find utility recursively descends the directory tree for each path listed, evaluating an expression (composed of the “primaries” and “operands” listed
below) in terms of each file in the tree.

You can get the syntax and description of the find command by using man find or find --help

man find command to get the manual page for the find command on terminal

You can run the find command to get all the folders that end with CSV.

find [PATH] [PRIMARIES] [PATTERN]-name pattern

True if the last component of the pathname being examined matches pattern. Special shell pattern matching characters (“[”, “]”, “*”, and “?”) may be used as part of pattern. These characters may be matched explicitly by escaping them with a backslash (“”).

example:
find . -name *.csv

If you want to learn it with the help of video then you can watch this video timestamp👇🏻

Using Linux find command to find directories or folders

from the home directory, you can search all the files in the local repo by using the following find command

find ~/data -type fhere f indicates files

Get all the files in the repo by using the find command in the Linux shell

You can get all the files in the repo ending with the .gz extension by using the find command.

find ~/data -type f -name "*.gz"

How to get all compressed files by using the find command in Linux

You can also get all the files that have the specific pattern in the name by using the find command.

for example, If you want to get all the files which the year(YYYY) in their name you need to use the following command on Linux.

find ~/data -type f -name "*_????.*"

Get the file names with a specific pattern using the find command in Linux

How to integrate find command with other commands in Linux?

You can use the find command with other commands by adding -exec to the find command.

This is how integrated command looks like

find ~/data -type f -name "*_????.*" -exec ls -ltr {} +;

Using the find command along with another command on Linux

Using Linux find command to find files by type and pattern detailed video

These are some of the important directories in Linux which will help to troubleshoot some issues as a DevOps engineer or SRE(Site Reliability Engineer)

Important Directories in Linux shell to troubleshoot

1. /tmp
2. /etc
3. /var
4. /var/log

Here is the clear explanation about these important folders in linux. Check out the video timestamp below👇🏻

Important folders in the Linux file system

If you want to get the list of files modified from the last 1 day on your computer. You can use the find command in the following way.

First, you need to log in as a root user on your system sudo su — root and enter your system password.

Once you enter the password you will be logged in as a root user on your system. Now you can use the following find command to get the list of modified files on the last day in your system by using -mtime -1 with the find command.

find /var/log -type f -mtime -1 -exec ls -ltr {} +;

How to get the list of files modified on my computer from the last day?

Here is the clear explanation about the find command👇🏻

Find command on Linux

You can use -size along with the find command to get the list of files based on the size in Linux

for example, you can use the following command to get the files with a specific size in the data repository.

find -type f -size 7477339c -exec ls -ltr {} +

You can get a file with a specific size using the find command

If you want to get the files with sizes between 5MB and 6MB you can use the following command

find -type f -size 6M -exec ls -ltr {} +;

Get the files with a specific size range using the find command on Linux

Get the list of files based on size using the Linux find command detailed video