Search, Cron and the Internet Archive - Oh My!

By Markus Sandy at 04/28/2007 - 11:58

A SpinXpress user posted a video to the Internet Archive but it did not show up at Ourmedia. Turns out that the reason for this is that the person had never logged into Ourmedia before and the "ia" module quietly does nothing in that case. No error message, no nothing.

So this got me looking over ia.module in general and hook_cron() in particular. Then I looked at all the hook_cron's in all our modules and found lots of issues.

I had been wanting to investigate these issues for a long time now and so this is fun to finally get into it. Also, this relates to my recent work with Ourmedia's search block. I had been inspired at OSCMSS by Robert Douglass to look into Drupal search. This has a hook_cron and so very much relates. Another reason I'm interested in this is that we get a watchdog error message from the ia module during every cron run that indicates "IA: unable to fetch headers from OAI interface".

To make things worse, I can't find the cron job that is powering Ourmedia on our servers and so am guessing it is coming from somewhere else. A quick rename of cron.php should take care of that if I need it. Cron is running every minute, I suppose that is to ensure quick updates on new Archive.org uploads, but I can see that it also may mean executing a lot of expensive queries far more often than necessary.

So I'm looking at each module. More info as I discover it....

Links:

Comments and discussion: