Parsing apache logs …

Seems I’m not alone in the world wanting to parse apache log files. I googled lots of people bitterly complaining about it. Some folks wanted to write a grammar, and a flex/yacc/bison thingy. I am sure that there are some Java programmers who’ve been working on this … oh … 6 or 7 years or so, and may be approaching a solution, with a Java byte code only slightly below 1 PB in size.
But I digress. This is the core of the code I’ve mentioned before, and darn it, I wanted to get the logging in shape. So I looked at the horrible morass of terrible … ancient code . Really horrible stuff that. And I looked at the logs.
And thought to myself … dammit, I can make a regex that handles this.
So I tried, and … sure enough, it works.

@column = ($line =~ /(\d+.\d+.\d+.\d+)\s+(\S+)\s+(\S+)\s+\[(\d+\/\S+\/\d+):(\d+:\d+:\d+)\s+([-+]{0,1}\d+)\]\s+\"(.*?)\s+HTTP\/\d+\.\d+\"\s+(\d+)\s+(\d+)\s+\"(.*?)\"\s+\"(.*?)\"/);
	# parsed it BABY!!!
	# c[0] = IP address
	# c[1] = user name?
	# c[2] = unknown
	# c[3] = date
	# c[4] = time
	# c[5] = timezone (relative to GMT)
	# c[6] = incoming request (GET, PUT, HEAD, ... with relative URI part)
	# c[7] = return code (200, 404, ...)
	# c[8] = size of returned data in bytes
	# c[9] = referrer (or - for none)
	# c[10]= User Agent string

As Chris pointed out, there’s an XKCD for that.
Yeah. Baby! My inner loop just lost 80% of its lines. Much easier to understand (is it wrong that I can parse some subset of regexes in my head? The recursive ones give me a headache and I have to start banging my head against the wall to stop them).
A minor error in my edits in the loop, will fix now. Nice that this works so well …

2 thoughts on “Parsing apache logs …”

    • @Tyler
      System isn’t Windows, so this isn’t something we can use. Logparser looks like it comes in .MSI format (a Microsoft Install package), rather than source. So we can’t even look at porting it.
      We tend to focus on open development and sources here. If LogParser became open source, we could be interested.

Comments are closed.