In this article, you will be learning about how to filter the Linux files and folders using piping, pattern, matching, find, grep, etc
Let’s get started.
The grep utility searches any given input files, selecting lines that match one or more patterns. By default, a pattern matches an input line if the
regular expression (RE) in the pattern matches the input line without its trailing newline. An empty expression matches every line. Each input line that matches at least one of the patterns is written to the standard output.
You can use the grep --help
or man grep
to get the description and usage of the grep from the terminal itself without searching it on google.
If you want to search the pattern or string from a file using the grep
command use the below syntax.
grep [STRING] [PATH]example: grep false /etc/passwd
📌 You need to clone the data to practice Linux commands by following this Article
After cloning the data directory you need to cd
into orders directory by using the following command cd retail_db/orders
once you are in the orders folder you can run the following command 👇🏻
grep COMPLETE part-00000
To get the count of the keyword in the file you need to pipe the output of the above command to wc -l
. For piping, you use the symbol |
between two commands. The command and output look like this👇🏻
If you want to get the count of lines that contains either of the words COMPLETE or CLOSE then you need to use egrep
📌
grep
is used for simple patterns and basic regular expressions (BREs);egrep
can handle extended regular expressions (EREs).
If you want to perform the case insensitive matching you need to use -i
control argument along with the grep
.
-i, — ignore-casePerform case insensitive matching. By default, grep is case sensitive.grep -i pending part-00000 | wc -l
The below picture shows the grep command with or without using the -i
control argument.
You can pipe the ls
command to grep
command using |
pipe to get the folder with a specific string.
for example, you can use the following command ls -ltr | grep hr
for this to work you need to be in the data directory of the repo.
To get the count of the number of hr folders in the data directory you can use the following command ls -ltr | grep hr | wc -l
If you want to recursively go through the folders and search for a specific string pattern you can use the following commands with piping.
ls -lR | grep part-r
The output of the above command will look like this.
You can use the basic pattern matching in the ls -ltr
command to get the specific files that match the given pattern. Here you are using the data/nyse_all/nyse_data
directory. Check out the below picture for reference.
If you want to find the files and folder which start with the specific pattern in the directory you can use the following command
ls -ltr nyse*
The wc
utility displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified) to the standard
output. A line is defined as a string of characters delimited by a ⟨newline⟩ character. Characters beyond the final ⟨newline⟩ character will not be
included in the line count.
A word is defined as a string of characters delimited by white space characters.
You can use man wc
or wc --help
to know more about the wc
command from the Linux terminal
If you go inside the data/retail_db/orders
and run the wc *
command. You can get the count of lines, words, and characters in the file inside the directory.
wc *
command in LinuxIf you want to get the number of words, lines, and characters separately in that file you can run the following command with the control argument -w -l -c
wc [-clmw] [file ...]-w The number of words in each input file is written to the standard output. -l The number of lines in each input file is written to the standard output. -c The number of bytes in each input file is written to the standard output. This will cancel out any prior usage of the -m option.
The output will look like the following👇🏻
You can use wc
command to find all the files with a specific pattern in the directory /data/retail_db
You can also use wc
on top of the other command by using the pipe. For example, you can pipe it to ls
command and grep
command.
wc
command to get word count or line count detailed videoThe find utility recursively descends the directory tree for each path listed, evaluating an expression (composed of the “primaries” and “operands” listed
below) in terms of each file in the tree.
You can get the syntax and description of the find command by using man find
or find --help
You can run the find command to get all the folders that end with CSV.
find [PATH] [PRIMARIES] [PATTERN]-name pattern
True if the last component of the pathname being examined matches pattern. Special shell pattern matching characters (“[”, “]”, “*”, and “?”) may be used as part of pattern. These characters may be matched explicitly by escaping them with a backslash (“”).
example:
find . -name *.csv
If you want to learn it with the help of video then you can watch this video timestamp👇🏻
from the home directory, you can search all the files in the local repo by using the following find command
find ~/data -type fhere f indicates files
You can get all the files in the repo ending with the .gz extension by using the find command.
find ~/data -type f -name "*.gz"
You can also get all the files that have the specific pattern in the name by using the find command.
for example, If you want to get all the files which the year(YYYY) in their name you need to use the following command on Linux.
find ~/data -type f -name "*_????.*"
How to integrate find
command with other commands in Linux?
You can use the find command with other commands by adding -exec
to the find command.
This is how integrated command looks like
find ~/data -type f -name "*_????.*" -exec ls -ltr {} +;
These are some of the important directories in Linux which will help to troubleshoot some issues as a DevOps engineer or SRE(Site Reliability Engineer)
1. /tmp
2. /etc
3. /var
4. /var/log
Here is the clear explanation about these important folders in linux. Check out the video timestamp below👇🏻
If you want to get the list of files modified from the last 1 day on your computer. You can use the find command in the following way.
First, you need to log in as a root user on your system sudo su — root
and enter your system password.
Once you enter the password you will be logged in as a root user on your system. Now you can use the following find command to get the list of modified files on the last day in your system by using -mtime -1
with the find command.
find /var/log -type f -mtime -1 -exec ls -ltr {} +;
Here is the clear explanation about the find command👇🏻
You can use -size
along with the find command to get the list of files based on the size in Linux
for example, you can use the following command to get the files with a specific size in the data repository.
find -type f -size 7477339c -exec ls -ltr {} +
If you want to get the files with sizes between 5MB and 6MB you can use the following command
find -type f -size 6M -exec ls -ltr {} +;