How I processed a log file 20x SLOWER than before
Sometimes you have to sit back and laugh at yourself. Having recently written Starfish to help speed up slow tasks, I tried to find as many uses as I could for it. At MOG we have to parse huge log files so I thought I would be clever and try to use Starfish for the task. After running it for a while, I looked at the stats only to find that it had been processing my file 20x slower than it would have without distributing it. At first I was puzzled, until I realized a very important thing about distributing processes. You have to make sure that the task you distribute takes longer than the distribution process.
if overhead_time > processing_time then puts "Don't use Starfish" end
It turns out that I could process 10,000 lines of the log in about a second... so to send each one of those lines over the network to have them processed was just silly. Even sending 10,000 lines at a time is relatively unnecessary.
I share this story so that you might not make the same mistake that I did. However I realized that Starfish can know when the overhead makes it not worth the trouble, I can actually warn people using Starfish when it is and is not a good use of resources. I will be adding this to the next release which shall come out shortly.
You should follow me on twitter here.
Technoblog reader special: click here to get $10 off web hosting by FatCow!