Wednesday, October 17, 2007

Trying OmniFind Yahoo Search

When looking for Windows search util solutions, I stumbled upon OmniFind, which seemed too good to be true:
Install it in 3 clicks, configure it in minutes.
Free, searches up to 500,000 documents.
Search both the enterprise and the Internet from a single interface.
Incorporates open source Apache Lucene technology to deliver the best of community innovation with IBM's enterprise features.
But OmniFind was exactly like that! Downloading, installing, configuring, testing indexing a website and a filesystem location, all done in 15 minutes!

The server OS requirements are not my favorite, but for the enterprise it makes sense, and expected when it comes to IBM. Their favorites are of course Redhat and Suse. Too bad for me, my favorite Linux being Debian, and of course i always vouch for FreeBSD.

32-bit Red Hat Enterprise LinuxVersion 4, Update 3
32-bit SUSE Linux Enterprise 10
32-bit Windows XP SP2
32-bit Windows 2003 Server SP1

Some notes from the testing so far:

Indexing filesystems, with .doc, .xls, works like a charm, and the search results can be browsed "as html" and "cached". Very useful!

OmniFind installs as its own webservice, on a port of your choice. I changed the search page appearance with company logo and disabled all the Yahoo links. All very simple from the OmniFind admin control panel!

Searching for a string inside any word, you should add a wildcard. For example you should search "regression*" to make sure you locate occurrancies of "regressions".

Reindexing seems to be something you have to wrap into your own scripts, and schedule them, eg. with at jobs.

You can use scripts to start or stop a crawler.
Crawler management scripts allow you to schedule and execute start and stop crawler actions, or start and stop a crawler from the command line.
Cleaning the index for documents that should not be crawled is not so friendly. It seems you have to delete the entire source, eg. website, then crawl it again. It can be tiresome if it is a big website.

The language pack should be installed before you start crawling your big sources, as you will have to do it all over again when then language pack has been installed.

Crawling protected websites was possible, i have tested https:// protected by basic authentication, it worked fine. Crawling formbased authentication, as a company portal document handling system, should also be possible:

HTML form-based authentication
Form name (optional)
Example: loginPage
Form action
Example: http://www.example.org/
authentication/login.do
HTTP method: POST or GET
Example: POST
Form parameters (optional)
Example: userid and myuserID


So far, I am very pleased with OmniFind, I recommend everyone give it a try. OmniFind might be the single point of entry for knowledge search that your organization need to bring knowledge from many sources to life and use!!

No comments: