Updating a design to modern concepts …

So in order to (really) bring my monitoring app into the modern age, I want to change its flow from a synchronous on-demand event driven analysis and reporting tool, to an asynchronous monitoring and analysis tool, with an on-demand “report” function which is basically a presentation core atop the data set.
There are many reasons for this. Not the least of which is that this should be far more efficient at handling what I want to do … not to mention more responsive. I also don’t really want to do this as many independent processes … past history with debugging many independent but functionally interdependent processes.
What we are fundamentally doing is parsing logs. Right now, its apache logs, but a well designed system should be able to parse any logs, with the addition of a basic parser code (no, not a grammar … but something nice and simple).
So what if we wanted to run the parser when the log gets updated? Ok, I know … there are some codes that are smart enough to trigger an event upon an action. Assume for the moment that we are dealing with something where this isn’t true.
Let me go far afield from Apache. And look at Gluster. Its logs are (at best) a horrible … horrible mess. Extracting anything useful from them is very hard. And unfortunately, with many more people depending upon it, we have to parse the output, and at generate some sort of signal when dejecta impacts the rotating air movement system.
But the same is true of other servers as well. The issue is that there really is no good standard for this right now. Something with one of the message queues and a nice standard format? Would be nice. Until then, we have to ()*&*(&^%&% parse ()*&*&^%$%$%$ logs.
Apache is my stand in for a good test case.
So, rather than wait for an external query event to look at stuff, why not set up a nice asynchonous inotify based log reader? Maintain local state only during program execution. Read till the end of the file on startup, calculate the offset, turn on an inotify listener, and only scan the changes from the offset to the end of the file on the write event … updating the internal offset, and doing whatever needful thing we need to do after parsing the data?
Yeah, its more complex. But it gives us far more power.

First step in this process is getting Mojolicious to run in a thread environment. The web side of this is “easy”. Not quite a SMOP. But we have this part working. So if we can run it in a thread, yeah, we are moving in the right direction.
Second step is getting the inotify listener running in the same thread environment. We already use this code for another product (in our Tiburon system), so I am hoping it works as well here.
After that is a little glue logic, and thinking up more efficient data storage (don’t need to keep all this data in RAM, and I want to efficiently/quickly serialize/deserialize it. I smell some CSV “files” living in /dev/shm/ … we do that in our manometer.pl code now. And that’s where we lifted our thread design.
So right now, I am assembling the pieces, and seeing what I can do with them. All of this in a nice multi-threaded perl code. With integrated web servers, and all manner of other nice features.

2 thoughts on “Updating a design to modern concepts …”

  1. Hi Joe,
    I’ve never parsed the gluster logs, but have parsed logs for various other software that have been problematic. Do you have any example of logs that are done well?

  2. Beware of log rotation, particularly rotation of nearly empty logs. The new log may be larger than the old one, so just checking size isn’t enough. A hash of the log’s first N bytes may suffice for some N long enough to capture a date/time.

Comments are closed.