Tuesday, October 16, 2007

Search tools, challenges and non-trivial requirements

I have listed some key challenges for my current usage of search tools:

  • Create a point of entry for search.
  • Link to relevant search query from a portal (eg. a operation status website).
  • Some knowledge should only be available to some people. This seems to the biggest hurdle!

Limiting knowledge/search only to some people could be solved in at least 2 ways:

  1. Set up different indexer/crawler configurations, each searchable from different search prompt. Problem could be multiple crawls of the same info (load, storage, ressources)
  2. Index/crawl everything once, and let the search box/website/frontend control who can see what. This would be preferred.

Listing non-trivial requirements which are not always availble:

  • Parse open office word and calc, (.odt and .ods), which is basically zipfiles with xml (unzip and parse eg. content.xml).
  • Crawling/indexing file sytems (shares/harddrives), setting a baseurl for how the searchresults will become browsable.
  • Reindexing must automated, eg. scheduled or cron'd.

No comments: