/* Jerrell's Technical Notes */: Unix Commands: grep/sed/awk

Index

1. grep

The grep command globally searches for regular expressions in files and prints all lines that contain the expression.

The name of grep comes from the "globally search and print" command in the ex editor:

    :g/pattern/p

Since patterns are also called "regular expressions", the above command becomes:

    :g/re/p

Thus grep.

Exit Status
The exit status of grep is 0 if the matching succeeded, 1 if the pattern was not found, and 2 if the file was not found.

Regular Expression Metacharacters for grep (also for vi, ex, sed, and awk)

^	beginning of line
$	end of line
.	one character
*	zero or more characters
[]	any one in the set
[^]	not in the set
\<	beginning of word, e.g. \
\>	end of word, e.g. love\> matches words that ends with love
$pattern$	remember the matched pattern; up to 9 can be remembered; recall with \1, \2, etc.
x\{m\}	repetition of character x for exactly m times
x\{m,\}	repetition of character x for at least m times
x\{m,n\}	repetition of character x for at least m times and no more than n times

Example:

% cat sample
northwest NW
northeast NE
north N
  
% grep '\# match all lines containing words
                         # starting with "north"
northwest NW
northeast NE
north N
  
% grep '\' sample
   # match all lines containing the word "north"
   # this is the same as grep -w north sample
north N

The sed editor is not an interactive editor like vi. It executes editing commands on a file and prints the result to standard output. It is nondestructive, meaning the original file is not changed in any way. sed commands can also be put in a script file.

Template:

sed 'command' filename

sed processes one line at a time. The line being processed is stored in a temporary buffer called a pattern space.

Exit Status
The exit status of sed is zero for success, and nonzero otherwise.

Line Addresses
Line addressing determines which lines will be processed. An address consists of one number or two numbers separated by a comma (,), or a regular expression.

Example:

% sed '1,3d' file       # delete line 1 to 3 in file
% sed '/good/d' file    # delete all lines containing "good"

Regular Expression Metacharacters
sed use the same metacharacters as grep, plus "&" wich remembers the search pattern.

Example:

s/love/**&**    # changes love to **love**

sed Commands

a\	appends one or more lines of text to the current line
c	changes text in the current line with new text
d	deletes lines
i\	inserts text above the current line
h	copies the contents of the pattern space to a holding buffer
H	appends the contents of the pattern space to a holding buffer
g	gets what is in the holding buffer and copies into the pattern buffer, overwriting what was there
G	gets what is in the holding buffer and copies into the pattern buffer, appending to what was there
l	lists nonprinting characters
p	prints lines
n	reads the next input line and starts processing the new line with the next command rather than the first command
q	quits or exits
r	reads line from a file
!	applies the command to all lines except the selected ones

Substitution Flags

s	substitutes one string for another
g	global substitution on one line
p	prints line
w	writes lines to a file
x	exchanges contents of the holding buffer with the pattern space
y	translates one character to another

sed Options

-e	allows multiple edits
-n	suppresses default output
-f	reads a sed script file

Examples:

The p Command

% sed '/north/p' file    # prints all lines matching north, then prints
                        # all the lines in the file since by default
                        # sed prints all lines.
% sed -n '/north/p' file    # prints lines matching north
% sed -n '/west/, /east/p' file   # all lines between the 2
                                 # patterns are printed

The d Command

% sed '/north/d' file    # deletes all lines matching north

The -e option allows multiple edits:

% sed -e '1,3d' -e 's/ABC/XYZ/'file   # deletes lines 1 to 3, then
                                     # replaces ABC with XYZ

The r Command and the w Command

% sed '/ABC/r file2' file    # read the contents of file2 and insert
                            # it after each of the lines in file
                            # that matches ABC
% sed -n '/ABC/w file2' file    # writes all lines in file that match
                               # ABC to file2

The a\ Command and the i\ Command

% sed '/ABC/a\This is a new line.' file   # appends the new line into
                                         # file after where ABC is matched
% sed '/ABC/i\This is a new line.' file    # inserts the new line into
                                          # file before where ABC is matched

The n Command

% sed '/ABC/{n; s/bad/good/;}' file    # find the line that matches ABC,
                        # then replace bad with good on the next line

The q Command

% sed '5q' file    # print the first 5 lines and then quit

Holding and Getting

% sed -e '/ABC/h' -e '$G' file
                 # copy the line matching ABC into the
                 # holding buffer, and then paste it to the end of
                 # the file. This is similar to the yank and paste
                 # commands of vi.
% sed -e '/ABC/{h;d;}' -e '/XYZ/{G;}' file
    # copy the line matching
                 # ABC into the holding buffer, and then delete it;
                 # then paste it after the line matching XYZ.
                 # This is similar to the move command of vi.

3. awk

3.1 Basics

awk is a utility used for manipulating textes and generating reports. There are a number versions of awk: the old awk (awk), the new awk (nawk), and the GNU awk (gawk).

Syntax

awk 'command' file

The most general form of the command is as follows:

awk 'BEGIN           {initializations}
    search_pattern1 {actions}
    search_pattern2 {actions}
    ...
    END             {final actions}'  file

Of course it has to be on the same line, or put in a script.

Commands in the initialization part are run once. Then for each line of the input, the search patterns are matched one by one, and corresponding actions taken if the pattern matches. Finally, the actions in the final actions part are taken.
Search patterns are put between a pair of /
If a search pattern is missing, then the corresponding actions will be taken for each line.
If action is missing, the default is to print the entire line.

The following simple commands are used most often.

nawk 'pattern' file   # print lines matching pattern
nawk '{action}' file  # take action for every line
nawk 'pattern {action}' file

Records and fields

A line is considered to have a number of fields. By default field separators are spaces.

awk sees lines of input files as records. By default, the output and input records separators are a carriage return, stored in the built-in variables ORS and RS, respectively.

$0 refers to the entire record (line). The line (record) number is stored in the built-in variable NR.

% nawk '{print NR, $0} file  # precede each line with line number

Each record consists of fields which, by default, are separated by white space. The number of fields is kept in the built-in variable NF. The fields are represented by $1, $2, etc.

Field separators can be reset with the -F option.

% nawk -F: '/ABC/{print $1, $2}' file   # set field separator to :
% nawk -F'[ :\t]' '{print $1, $2}' file
   # set field separator to space :, and \t

$1, $2, $3, ... represent the 1st, 2nd, 3rd field, and so on.
$0 represents the entire line.

Examples

% nawk '/ABC/' file   # prints lines matching ABC
% nawk '{print $1} file  # prints the first field of file

% date
Sat Oct 24 12:34 EDT 2003

% date | nawk '{print "Month: " $2 "\nYear: ", $6}'
Month: Oct
Year:  2003

% date | nawk '{print "Month: " $2 "\nYear: " $6}'
Month: Oct
Year: 2003

3.2 Search patterns

A pattern consists of a regular expression, a boolean expression, or a combination of both. A pattern is put between a pair of "/".

Regular expressions

Regular expressions can be used as patterns.

Regular Expression Metacharacters

`^`	beginning of line
`$`	end of line
`.`	single character
`*`	zero or more of preceding character
`+`	one or more of preceding character
`?`	zero or one of preceding character
`[ABC]`	any one in the set
`[^ABC]`	any one not in the set
`[A-Z]`	any one in the range
`[A\|B]`	A or B
`(AB)+`	one or more sets of AB
`\*`	literal *
`&`	remember the pattern

The Match Operator ~ and the Not Operator !

Search can be constrained to a single field with the match operator. For example, $1 ~ /ABC/ matches 1st field against the pattern ABC.

Examples

% nawk '$1 ~ /ABC/' file  # print lines whose 1st field matches ABC
% nawk '$1 !~ /ABC/' file # print lines whose 1st field doesn't
                         # match ABC

It is possible to search for a block of consecutive lines using one search pattern to match the first line in the block and another pattern to match the last line in the block. For example:

    /ABC/,/XYZ/

matches a block whose first line contain the pattern ABC and last line contains XYZ.

Relational and Logical Operations

awk fields can serve as operands of relational, arithmetic, and logical operators (same as C operators).

Example

% nawk '$2 == 1234' file
     # print lines whose 2nd field is equal to 1234
% nawk '{max=($1 > $2) ? $1 : $2; print max} file
     # for each line print the max of the 1st and 2nd fields
% nawk '$2 > 5 && $2 < 15' file
     # print lines whose 2nd field is betwee 5 and 15
% nawk 'NR==2, NR==4 {print $0}' file
     # print lines 2 to 4

Here is a trick to test if field 1 is a numeric value.

(( $1 + 0 ) == $1 )

If it is a string, the left-hand side would be 0, and the equality would be false.

3.3 Variables

Variables don't require declaration; actually there's no variable declaration, although variables should probably be initialized in the BEGIN part.

Variables don't have data type, and can be used to store either string or numeric values. A text string that doesn't look like a number will be treated as 0 in a numeric operation. The following 2 lines are actually equivalent.

    v = 100
   v = "100"

An uninitialized variable has a value 0, but will print nothing.

Built-in variables

`$0, $1, $2,...`	Field variables
`FILENAME`	Name of input file
`FS`	Field separator, by default the space character. Can be modified.
`NF`	Number of fields in the current record
`NR`	Number of the current record (line number)
`OFMT`	Output format (default: "%.6g")

Field variables are not read-only.

% nawk 'BEGIN{OFMT="%.2f"; print 1.23456, 12E-2}' file
1.23 0.12

Arrays

Like variables, arrays require no declaration. Arrayes are one-dimensional and the first index is 1. Strings can be used as array index, essentially creating a dictionary.

3.4 Operators

awk supports C arithmetic and assignment operators. For strings, it supports concatenation.

% awk 'BEGIN { x=3; y=x+2; print y}'
% awk 'BEGIN { s="hello " "world"; print s}  # prints "hello world"

3.5 Built-in functions

length()

The length() function returns the length of its parameter as a string. If no parameter is given (in this case, () are not necessary), it returns the length of the input line.

sqrt(), exp(), log(), int()(get the integer part of the parameter)

substr()

The substr(string, start, max size of substring) function returns a substring.

split()

The split(string, array, [field separator]) function splits a string into fields and store the fields in an array.

index()

The index(string, pattern) function returns the starting index of a pattern in a string; 0 if the pattern is not matched.

print()

print() by itself prints the input line.
print() with one argument prints the argument.
print() with multiple arguments prints all of them, separated by whitespace (or as specified by OFS) when the arguments are separated by commas, or concatenated when the arguments are separated by spaces.

Commas between the arguments of the print command will be printed as OFS (ouput field separator, which by default is a white space).

% cat file
ABC:XYZ

% nawk -F: '{print $1, $2}' file
ABC XYZ
% nawk -F: '{print $1 $2}' file
ABCXYZ

printf()

The printf() function is similar to the function in C.

% echo "UNIX" | nawk '{printf "|%15s|\n", $1}'
|           UNIX|

3.6 Control structures

awk supports C control structures: if/else, for-loop, while-loop.

3.7 Recipes

Precede each line with a line number

% nawk '{print NR, $0}' file

Add numbers in a column

% cat file
0.1
0.2
0.3

% nawk '{t += $1} END {print t}' file
0.6

Add numbers with same index
In the following file, the first column are index, and we are to add numbers with the same index.

% cat file
1 0.1
2 0.2
3 0.3
1 40
2 50
3 60

% nawk '{a[$1] += $2} END {for(i=1; i<=3; i++) print a[i]}' file
40.1
50.2
60.3

For each line matching a pattern prints its previous line

% cat file
some line
line before pattern
the pattern ABC here
some line
another line before pattern
pattern ABC again
more line

% nawk '/ABC/ {print pre} {pre=$0}' file
line before pattern
another line before pattern

/* Jerrell's Technical Notes */

August 19, 2008

Unix Commands: grep/sed/awk

Index

1. grep

2. sed (Streamlined Editor)

3. awk

3.1 Basics

Syntax

Records and fields

Examples

3.2 Search patterns

Regular expressions

The Match Operator ~ and the Not Operator !

Relational and Logical Operations

3.3 Variables

Arrays

3.4 Operators

3.5 Built-in functions

3.6 Control structures

3.7 Recipes