Searching text with basic regular expressions

3.2 Searching and Extracting Data from Files (Weight: 3)

📘Linux Essentials (LPI 010-160)


1. What Are Regular Expressions?

A regular expression is a pattern used to match text.

Instead of searching only for exact words, regular expressions allow you to search for patterns of characters.

This makes it possible to:

  • Search for words that follow a specific structure
  • Find configuration settings in files
  • Extract important data from logs
  • Filter command output

Example:

A pattern could search for:

  • All lines containing the word error
  • All lines beginning with user
  • All lines ending with .conf

Regular expressions are most commonly used with commands such as:

  • grep
  • sed
  • awk
  • less
  • vi / vim

For the Linux Essentials exam, the most important command used with regular expressions is:

grep

2. Using grep with Regular Expressions

The grep command searches files for lines that match a pattern.

Basic syntax:

grep PATTERN filename

Example:

grep error system.log

This command searches system.log and prints all lines containing the word error.

Example output:

Mar 10 login error detected
Mar 10 disk error reported

This is a simple text search.

However, regular expressions allow more advanced pattern matching.


3. Basic Regular Expression (BRE)

Linux tools like grep use Basic Regular Expressions (BRE) by default.

BRE allows the use of special pattern characters, called metacharacters, to define search rules.

These characters do not represent normal text. Instead, they describe how text should be matched.

Common metacharacters include:

.   *   ^   $   [ ]   \   \( \)

Understanding these symbols is essential for the Linux Essentials exam.


4. The Dot (.) Character

The dot . matches any single character.

Example file users.txt:

user1
user2
userA
admin

Command:

grep user. users.txt

Matches:

user1
user2
userA

Explanation:

  • . replaces any one character
  • user. means user followed by one character

5. The Asterisk (*) Character

The * symbol means:

Match the previous character zero or more times

Example file:

log
loog
looog
lg

Command:

grep lo*g file.txt

Matches:

log
loog
looog
lg

Explanation:

o* means:

  • zero o
  • one o
  • many o

6. Start of Line Anchor (^)

The ^ symbol matches the beginning of a line.

Example file:

admin user login
user admin login
admin system start

Command:

grep ^admin file.txt

Matches:

admin user login
admin system start

Explanation:

^admin means the line must start with “admin”.

This is useful for finding configuration lines in files.

Example:

Searching for lines that start with a configuration key.


7. End of Line Anchor ($)

The $ symbol matches the end of a line.

Example file:

server.conf
client.conf
server.log

Command:

grep .conf$ files.txt

Matches:

server.conf
client.conf

Explanation:

.conf$ means the line must end with .conf.

This can be used when searching lists of configuration files.


8. Character Classes [ ]

Square brackets [ ] match any one character inside the brackets.

Example file:

user1
user2
user3
userA

Command:

grep user[12] users.txt

Matches:

user1
user2

Explanation:

[12] means match 1 or 2.


9. Character Ranges

You can define ranges inside brackets.

Example:

[a-z]

Matches any lowercase letter.

Example file:

user1
userA
userb

Command:

grep user[a-z] users.txt

Matches:

userb

Other common ranges:

[0-9]   digits
[A-Z] uppercase letters
[a-z] lowercase letters

Example:

grep error[0-9] logfile.txt

This matches:

error1
error2
error5

10. Negated Character Classes

A negated class matches any character not listed.

Syntax:

[^characters]

Example:

grep user[^0-9] file.txt

Matches lines where user is followed by non-numeric characters.

Example matches:

userA
userB

But not:

user1
user2

11. Escaping Special Characters

Some characters have special meaning in regular expressions.

To search them literally, use the backslash \.

Example:

Searching for the character .

grep \.conf file.txt

Without the backslash:

grep .conf

. means any character, not a dot.

Escaping ensures the correct search.


12. Grouping Expressions

Basic regular expressions allow grouping using:

\( pattern \)

Example:

grep \(error\|fail\) logfile.txt

This matches lines containing:

error
fail

However, grouping is more common in extended regular expressions.


13. Repetition with Curly Braces

Curly braces allow specifying the number of repetitions.

However, in Basic Regular Expressions, they must be escaped.

Example:

grep a\{3\} file.txt

Matches:

aaa

Meaning:

a repeated exactly three times.


14. Case-Sensitive Searching

By default, searches are case-sensitive.

Example:

grep error logfile.txt

Matches:

error

But not:

Error
ERROR

To ignore case:

grep -i error logfile.txt

Now it matches all case variations.


15. Useful grep Options for Regex Searching

Important options often used with regular expressions:

-i (ignore case)

grep -i warning logfile.txt

Matches:

Warning
WARNING
warning

-v (invert match)

Shows lines not matching the pattern.

grep -v error logfile.txt

Displays lines without “error”.


-n (show line numbers)

grep -n error logfile.txt

Output example:

12:error detected
45:error reading disk

-r (recursive search)

Search through directories.

grep -r error /var/log

Searches for error in all files under /var/log.

This is commonly used for analyzing system logs.


16. Example IT Use Cases

Regular expressions are heavily used in IT tasks.

Searching System Logs

grep error /var/log/syslog

Finds error messages.


Finding Failed Login Attempts

grep failed /var/log/auth.log

Searching Configuration Settings

grep ^Port /etc/ssh/sshd_config

Finds SSH port configuration lines.


Finding IP Addresses

Example pattern:

grep [0-9]\.[0-9]\.[0-9]\.[0-9] access.log

Used when analyzing web server logs.


17. Difference Between Basic and Extended Regex

Linux provides two main regex types.

TypeCommand
Basic Regular Expressionsgrep
Extended Regular Expressionsgrep -E

Extended regex supports:

  • +
  • ?
  • |
  • ()

These do not need escaping.

However, Linux Essentials mostly focuses on basic regex.


18. Common Mistakes

Forgetting to escape special characters

Wrong:

grep .conf file.txt

Correct:

grep \.conf file.txt

Not using anchors when needed

Searching for:

admin

Matches anywhere in line.

But:

^admin

Matches only at the start.


Misunderstanding *

* applies to the previous character, not the entire word.

Example:

lo*g

Not:

log*

19. Key Exam Points to Remember

For the Linux Essentials exam, remember:

  • Regular expressions are used for pattern matching
  • grep is the most common command for regex searching
  • Basic regex metacharacters include:
.   *   ^   $   [ ]   \ 

Important concepts:

  • . matches any character
  • * repeats the previous character
  • ^ start of line
  • $ end of line
  • [ ] character classes
  • [^ ] negation
  • \ escapes special characters

You should be able to:

  • Understand simple regex patterns
  • Use regex with grep
  • Recognize anchors and character classes
  • Perform searches in files and directories
Buy Me a Coffee