coreutils

Counting and averaging

Suppose you have a big log file with lines like:

 07-06-2005 13:45:28 - [12013] script.pl: Sending XML
 07-06-2005 13:45:28 - [12013] script.pl: Sending XML
 07-06-2005 13:45:28 - [12013] script.pl: Sending XML
 07-06-2005 13:45:28 - [12013] script.pl: Sending XML
 07-06-2005 13:45:29 - [12013] script.pl: Sending XML
 07-06-2005 13:45:29 - [12013] script.pl: Sending XML
 07-06-2005 13:45:30 - [12013] script.pl: Sending XML
 07-06-2005 13:45:30 - [12013] script.pl: Sending XML
 07-06-2005 13:45:30 - [12013] script.pl: Sending XML
 07-06-2005 13:45:30 - [12013] script.pl: Sending XML
 07-06-2005 13:45:30 - [12013] script.pl: Sending XML

You see that the number of sent XML datagrams differs quite a bit and you want to know the average speed. Do this with the following combination of uniq and awk:

  cat very_big_log | uniq -c | awk 'BEGIN{s=0};{s=s+$1};END{print s/NR}'

Directory listings

Sort on modified time, most recent first:

  $ ls -lt

Large log files

Have to search for a bug through a 800Mb logfile? Want to attach a particular piece of the logfile to an e-mail or ticket in your issue system? Big log files often clog up your editor.

One solution is to find the occurrence and the events leading up to the occurrence with grep, then chop the exact part out with head and/or tail:

  $ grep -n NullPointerException giant_log_file.txt
  55000: java.lang.NullPointerException:
  $ head -54950 giant_log_file.txt | tail -100 > excerpt.txt

This is much easier than waiting for your editor to load the file, or use less or similar to search for the piece and manually copy/paste the lines.

Alternatively, if you can reproduce the issue then you can just follow the logfile on your screen and at the same time write it to a separate file:

  $ tail -f server.log | tee ~/issue_1234_logfile.txt

The tail utility also understands bytes instead of lines. To roughly get the last megabyte of a logfile, do:

  $ tail -c 1024000 giant_log_file > last_megabyte.txt

Stripping off headers

To strip off a 3-line header from a file, create the following script (naming it, say, stripheader) and place it in your path:

 #!/bin/sh
 HEADER_SZ=$1
 N_LINES=`wc -l $i | cut -f1 -d" "`
 tail -n $(( $N_LINES - $HEADER_SZ )) $2

Usage is:

 $ stripheader 5 file_with_5-line_header.txt > newfile_without_header.txt