basics in awk
Wednesday, August 13th, 2008awk is a very, very useful command-line program that any Linux/Unix ninja should be familiar with. Awk is specifically geared towards processing text, and it was actually a combination of awk and sed that were an inspiration for Perl.
To start with, awk has three major elements that you need to be aware of when you’re working with it. These are the field separator, the pattern, and the action for the pattern.
Your fied separator is obviously what is inbetween the text elements you want to work with. If you open up a terminal and type ‘ps -elf’, you’ll see that this would just be spaces. Some files, like CSV files, have commas as the separator. Awk can be told what to look for via the -F option on the command-line, or in the program itself. For one-off piping, I prefer to do it via the -F option.
The pattern is much like an ‘if … then’ statement in other programming languages. If there isn’t a pattern, the action specified will be applied to all rows of input.
What makes awk handy is that it gives you capabilities that the `cut` command simply can’t provide. For instance, if I have a twenty-column CSV and I would like to spit out the third and eleventh column, I can execute the following:
awk -F',' '{print $3 FS $11}' file.input
The -F’,’ tells awk that the input fields will be separated by commas. The area enclosed in the braces is the action I talked about earlier. I didn’t specify a pattern before the action, so the action was applied to every line of input. “print $3 FS $11” tells awk to print to the screen the third field of input, the field separator (which we defined as a comma with the -F’,'), and the eleventh field of input.
If I wanted to do the same, but only print lines where the third field was over a number, say, 110, I could execute the following:
awk -F',' '$3 > 110 {print $3 FS $11}'" file.input
The pattern before the braces functions much like an “if … then”. If the third field is over 110, awk prints out the third field, the field separator, and the eleventh field.
There is much, much more that you can do with awk, but this should be enough to hint you in the right direction. I know I use awk daily for various tasks related to command-line mischief. A common thing I use awk for is to manipulate /etc/passwd, where some user account information is stored.
Fortunately, GNU awk is often smart enough to pick up the field separators without specifying the -F option. For instance, /etc/passwd is separated by a colon “:”, but GNU awk automatically recognizes this. It’s worth noting that on some other systems without GNU utilities, awk may behave in ways that you don’t anticipate.
That’s it for the moment, just some small tips to get you moving. I’d recommend picking up a book on AWK. I recommend you pick up a copy of “The AWK Programming Language” by Aho, Kernighan and Weinberger. It only makes sense, since they are the creators of AWK. I have also been told that the O’Reilly AWK book is very good. In addition, the GNU awk is well-documented all over the Internet, so you shouldn’t be lacking in study material if you put some effort into it.
Until next time!
-LightningCrash
