[an error occurred while processing this directive]

HP OpenVMS Systems

ask the wizard
Content starts here

Seeking OpenVMS Web Search Engine?

» close window

The Question is:

 
Hello,
I have just installed the Secure Web Server 1.0-1 and I am looking for a search
 engine for my intranet.
Thanks and regards,
JL Rossier
 
 


The Answer is :

 
Search engine alternatives for Compaq Secure Web Server (see
URLs below):
 
- Commercial Product
- Open Source Solution
- Remote Search Service
 
Which option you choose will depend not only on your search

engine requirements (feature set, support, and maintenance),
but also on OpenVMS platform availability.

 
Commercially-available search engines for OpenVMS are scarce,

but there are a few companies offering pure Java search engine

implementations that may run on OpenVMS unmodified (talk to the

vendor).
 
Open source solutions exist, but primarily for the UNIX
platform.
 
The original SWISH (Simple Web Indexing for Humans)

source code has been ported to OpenVMS, but its more powerful
predecessor, SWISH-E, has not. Compaq may include the source

code for SWISH on the OpenVMS Freeware CD since it is no
longer readily available on the Internet.
 
ht://Dig is probably the most popular open source search engine.

It has not yet been ported to OpenVMS.
 
Remote search services do not (usually) require software to

be installed on the host system. The remote service simply
indexes your site and hosts a search interface. This reduces

maintenance costs.
 
Search engines fundamentally differ in the type of indexing

method they utilize: filesystem or spider. The filesystem
method of indexing simply scans the site's local filesystem

for files to index. This method is fast, but the disadvantages

are that the index is restricted to one host (multiple sites
cannot be indexed), the indexing occurs on the raw files so
server-side includes or JSP output is not indexed, and obsolete

files are included in the index (even though no URL link exists).

In contrast, spider indexing works by starting at a given URL
and scanning all reference links. This is slower, but does not

suffer the disadvantages of filesystem indexing.
 
Finally, search engines differ in their search capabilities
and whether a front-end user interface is included. Features

range from simple boolean searches to fuzzy searches, synonyms,

and meta data.
 
Here are some criteria to consider when selecting a search engine:

 
- How many web pages are served?
- What document formats must be supported (HTML, PDF, XML, DOC, etc.)?

- What indexing features are required?
	+ Duplicate page detection
	+ Indexing control and scheduling

	+ Robot indexing
	+ Indexing secure pages

	+ Indexing meta data
	+ Multiple character sets
- What search features are required?
	+ Phrase searching
	+ Boolean searching

	+ Wild card searching
	+ Field and meta data options (title, URL, etc)
	+ Date-range searching
	+ Relevance-ranking customization
 
Feature Comparison of SWISH/SWISH-E and ht://Dig:
 
SWISH (old)
 
- Ported to OpenVMS: Yes

- Implementation language: C
- Local file system indexing only (shows obsolete pages,

  does not scan multiple sites, does not process server

  side includes or JSP output)
- Smaller sites

- No HTML front end

- Search capabilities: boolean, field
- File formats: text, HTML
 
SWISH-E (new)
 
- Ported to OpenVMS: No
- Implementation language: C

- Spider-based indexing beginning with version 1.2 (skips

  obsolete pages, scans multiple servers, processes server-side

  includes or JSP output)
- Smaller sites
- No HTML front end
- Search capabilities boolean, field, meta-data support, word

  stemming, prefix wild-carding
- File formats: text, HTML
 
ht://Dig

 
- Ported to OpenVMS: No
- Implementation language: C/C++
- Spider-based indexing (skips obsolete pages, scans multiple

  servers, processes server-side includes or JSP output)
- Larger sites
- Includes HTML front end

- Search capabilities: boolean, field, meta-data support, word

  stemming, prefix wild-carding, synonym, soundex, metaphone
- File formats: text, HTML, PDF/Postscript/Word (with separate

  external converters)
 
Commercial Software:

 
ASTAware SearchKey Pro (pure Java) (http://www.astaware.com/r_prod_info.html)

Trident Search Site Server (pure Java) (http://www.noviforum.si/)
 
Open Source Search Tools:

 
VMS
 
WWWVMSINDEX (http://www.sil.org/ftp/pub/software/vms/)

SWISH 1.1 - replaced by SWISH-E (see below)
 
Non-VMS

 
ht://Dig (http://htdig.sourceforge.net/)

Harvest (http://www.tardis.ed.ac.uk/harvest/)

PLweb Turbo (http://www.pls.com/plweb.htm)
SWISH-E (http://sunsite.berkeley.edu/SWISH-E/)

SWISH++ (http://homepage.mac.com/pauljlucas/software/swish/)

Isearch (http://www.cnidr.org/ir/isearch.html)
mnoGoSearch (was UdmSearch) (http://search.mnogo.ru/)

 
Remote Search Services:

 
Atomz (http://www.atomz.com/)

FreeFind (http://www.freefind.com/)

PicoSearch (http://www.picosearch.com/)

SearchButton (http://www.SearchButton.com/)

SiteMiner (http://siteminer.mycomputer.com/)

answer written or last revised on ( 26-JAN-2001 )

» close window