Difference between revisions of "Document search"
(Created page with 'This is my first attempt at a HowTo so please bear with me and help improve it ... I needed a document search facility for my users, essentially to make them able to search thro…') |
(No difference)
|
Revision as of 21:04, 30 September 2009
This is my first attempt at a HowTo so please bear with me and help improve it ...
I needed a document search facility for my users, essentially to make them able to search through various notes, memos etc. available on the web server. I found a usable script at www.kscripts.com, and have adjusted it a bit to make it more feasible for the SME-server, so I have produced a new file package you can get here: http://ibsgaardenprivat.dk/ksearch1.5b.tgz
Here is a copy of my new README, part of the file package:
GENERAL INSTALLATION INSTRUCTIONS:
You will need a text editor, and access to your server to edit and run scripts. See faqs.html for details.
The contents of the directory "search" will be copied to a newly created directory on the web server "/opt/ksearch".
- $sudo yum install xpdf (if you want to index PDF files)
- Open search_form.html
- In line 14 change "../index.html" to the URL to the web page you want the user to return to, after searching
- In line 19 change "/ksearch/ksearch.cgi" to the URL to the script ksearch.cgi
- Open search_tips.html
- In line 18 change "../index.html" to the URL to the web page you want the user to return to, after searching
- Open configuration/configuration.pl, necessary changes:
- Line 13: $INDEXER_START is the path to the directory in which files will be searched, including sub-directories. The directory may be the ibay's html directory or any sub-directory of this. All files in this directory must of course be accessible from WWW.
- Line 17: $BASE_URL is the URL pointing to the directory in line 13
- Line 20: $SEARCH_URL is the absolute URL to ksearch.cgi
- Line 23: $KSEARCH_DIR is the file path to the ksearch directory
- Line 26: $KSEARCH_URL is the URL to the ksearch directory
- Line 31: If you want to restrict access to indexer.cgi (and hence ability to initiate the indexing process) to certain domains, set @VALID_REFERERS to a list of acceptable domains. NOTE: There is a difference between http://www.mydomain.com and http://mydomain.com. An empty list means that all domains are accepted.
- Line 32: $INDEXER_URL is the absolute URL to indexer.cgi
- Line 33: $PASSWORD is a self-chosen password required to access indexer.cgi
- Line 72: $LOG_SEARCH is the path to search_log.txt, used for logging searches
- All other configuration.pl changes are optional. If you don't know what they are, then don't change them.
- Ignore Files and Folders: ignore_files.txt
- Add the full path of files/folders you do NOT want to index to the ignore files list, on separate lines. =NOTE=: After indexing, you may discover files/folders you don't want to include in your search engine. You may later come back and add files/folders -- however, you'll need to re-index your website using indexer.cgi
- Stop Terms: stop_terms.txt
- Add terms you want to IGNORE to the search engine stop terms list, on separate lines. =NOTE=: After indexing, you may discover terms you don't want to include in your search engine. You may later come back and add terms to the file -- however, you'll need to re-index your website using indexer.cgi
- Copy the contents of the directory "search" to /opt/ksearch:
$sudo mkdir /opt/ksearch $sudo cp -R search/* /opt/ksearch/
The 5 files not included in directory "search" (CHANGELOG.txt, GNU.txt, HISTORY.txt, README.txt, and FAQs.html) are for personal reference, troubleshooting, and future use, and need not be copied.
- Change the ownership of all copied files to www.www:
$sudo chown -R www.www /opt/ksearch
- Using the chmod command, set permissions for each copied file and directory as follows
$sudo chmod 755 /opt/ksearch/*.cgi /opt/ksearch/indexer.pl $sudo chmod 744 /opt/ksearch/configuration/* $sudo chmod 755 /opt/ksearch/ks_images $sudo chmod 644 /opt/ksearch/ks_images/* $sudo chmod 644 /opt/ksearch/*html $sudo chmod 644 /opt/ksearch/templates/*
- Make an addition to httpd.conf by creating the file
/etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/98Ksearch
With the following contents:
Alias /ksearch /opt/ksearch <Directory /opt/ksearch > Options +ExecCGI order deny,allow deny from all allow from { "$localAccess $externalSSLAccess"; } </Directory>
- Expand the template:
$sudo /sbin/e-smith/expand-template /etc/httpd/conf/httpd.conf
- Restart httpd:
$sudo /etc/init.d/httpd-e-smith restart
- Run the INDEXER:
Open your browser and run the indexer script, e.g.: http://www.MyWebsite.com/ksearch/indexer.cgi The time required will depend on the size of your site and your server's CPU. =NOTE=: You need to use the same URL path as specified in configuration.pl line 28, @VALID_REFERERS.
- Test it out:
Open the search_form.html (e.g. http://www.MyWebsite.com/ksearch/search_form.html) Run a search. Questions or problems, FIRST read the enclosed FAQs.html file
- As an alternative to doing indexing via a browser and the indexer.cgi script, you may do indexing from a command line with indexer.pl. For this to work, you will probably need to change the line in indexer.pl, starting with "my $configuration_file" to make sure it points to the correct configuration file.