I have listed some key challenges for my current usage of search tools:
- Create a point of entry for search.
- Link to relevant search query from a portal (eg. a operation status website).
- Some knowledge should only be available to some people. This seems to the biggest hurdle!
Limiting knowledge/search only to some people could be solved in at least 2 ways:
- Set up different indexer/crawler configurations, each searchable from different search prompt. Problem could be multiple crawls of the same info (load, storage, ressources)
- Index/crawl everything once, and let the search box/website/frontend control who can see what. This would be preferred.
Listing non-trivial requirements which are not always availble:
- Parse open office word and calc, (.odt and .ods), which is basically zipfiles with xml (unzip and parse eg. content.xml).
- Crawling/indexing file sytems (shares/harddrives), setting a baseurl for how the searchresults will become browsable.
- Reindexing must automated, eg. scheduled or cron'd.
No comments:
Post a Comment