Archive for the ‘awk’ Category

basics in awk

Wednesday, August 13th, 2008

awk is a very, very useful command-line program that any Linux/Unix ninja should be familiar with. Awk is specifically geared towards processing text, and it was actually a combination of awk and sed that were an inspiration for Perl.

To start with, awk has three major elements that you need to be aware of when you’re working with it. These are the field separator, the pattern, and the action for the pattern.

Your fied separator is obviously what is inbetween the text elements you want to work with. If you open up a terminal and type ‘ps -elf’, you’ll see that this would just be spaces. Some files, like CSV files, have commas as the separator. Awk can be told what to look for via the -F option on the command-line, or in the program itself. For one-off piping, I prefer to do it via the -F option.

The pattern is much like an ‘if … then’ statement in other programming languages. If there isn’t a pattern, the action specified will be applied to all rows of input.

What makes awk handy is that it gives you capabilities that the `cut` command simply can’t provide. For instance, if I have a twenty-column CSV and I would like to spit out the third and eleventh column, I can execute the following:

awk -F',' '{print $3 FS $11}' file.input

The -F’,’ tells awk that the input fields will be separated by commas. The area enclosed in the braces is the action I talked about earlier. I didn’t specify a pattern before the action, so the action was applied to every line of input. “print $3 FS $11” tells awk to print to the screen the third field of input, the field separator (which we defined as a comma with the -F’,'), and the eleventh field of input.

If I wanted to do the same, but only print lines where the third field was over a number, say, 110, I could execute the following:

awk -F',' '$3 > 110 {print $3 FS $11}'" file.input

The pattern before the braces functions much like an “if … then”. If the third field is over 110, awk prints out the third field, the field separator, and the eleventh field.

There is much, much more that you can do with awk, but this should be enough to hint you in the right direction. I know I use awk daily for various tasks related to command-line mischief. A common thing I use awk for is to manipulate /etc/passwd, where some user account information is stored.

Fortunately, GNU awk is often smart enough to pick up the field separators without specifying the -F option. For instance, /etc/passwd is separated by a colon “:”, but GNU awk automatically recognizes this. It’s worth noting that on some other systems without GNU utilities, awk may behave in ways that you don’t anticipate.

That’s it for the moment, just some small tips to get you moving. I’d recommend picking up a book on AWK. I recommend you pick up a copy of “The AWK Programming Language” by Aho, Kernighan and Weinberger. It only makes sense, since they are the creators of AWK. I have also been told that the O’Reilly AWK book is very good. In addition, the GNU awk is well-documented all over the Internet, so you shouldn’t be lacking in study material if you put some effort into it.

Until next time!

-LightningCrash

De-RIAAing my music collection

Friday, October 5th, 2007

I recently decided that I won’t own any music from an artist that is represented by the RIAA. Now, how do I go about De-RIAAing my ripped albums?

RIAA Radar has a website that will let you search for artists, albums, keywords, etc and it will give you information as to whether or not an album was released under the RIAA.

So I did a view-source on their search page and determined that there are only three variables that you need to POST in order to search: searchtype, keyword, and submit.

I can use wget to grab the file, like so:
wget http://www.riaaradar.com/search.asp --post-data "searchtype=ArtistSearch&keyword=Audioslave&submit=Go\!" -O Audioslave

This saves the file as Audioslave. Audioslave IS represented by the RIAA, by the way.

Now, how do I take my ripped albums and compare them to the RIAA Radar site?

(more…)

How I loathe regexps….but wait….

Wednesday, September 12th, 2007

Well, I got frustrated with having to refer to documentation every time I wanted to do something with regexps, so I decided to find a cheat sheet. I hate regexps, but I love them too, you know?

Thankfully, www.ilovejackdaniels.com has a cheat sheet I don’t mind having. It’s more thorough than the others I’ve found and it comes in PDF and PNG formats. I had to print the PNG one, since evince printed the greys as solid black in the PDF version.

Anyway, check it out here.

I’ve been using regexps so much lately that I contemplated taping it over my second monitor. I settled for hanging it from the cube wall right next to it.

Until next time!

-LightningCrash