Document search

From SME Server
Revision as of 07:42, 1 October 2009 by Holck (talk | contribs) (How to install a web-accessible document search facility)
Jump to navigationJump to search

This is my first attempt at a HowTo so please bear with me and help improve it ...

I needed a document search facility for my users, essentially to make them able to search through various notes, memos etc. available on the web server. I found a usable script at www.kscripts.com, and have adjusted it a bit to make it more feasible for the SME-server, so I have produced a new file package you can get here: http://ibsgaardenprivat.dk/ksearch1.5b.tgz

Here is a copy of my new README, part of the file package:

This is my first attempt at a HowTo so please bear with me and help improve it ...

I needed a document search facility for my users, essentially to make them able to search through various notes, memos etc. available on the web server. I found a usable script at www.kscripts.com, and have adjusted it a bit to make it more feasible for the SME-server, so I have produced a new file package you can get here: http://ibsgaardenprivat.dk/ksearch1.5b.tgz

Here is a copy of my new README, part of the file package:

General Installation Instructions:

You will need a text editor, and access to your server to edit and run scripts. See faqs.html for details.

The contents of the directory "search" will be copied to a newly created directory on the web server "/opt/ksearch".

  1. $sudo yum install xpdf (if you want to index PDF files)
  2. Open search_form.html
    • In line 14 change "../index.html" to the URL to the web page you want the user to return to, after searching
    • In line 19 change "/ksearch/ksearch.cgi" to the URL to the script ksearch.cgi
  3. Open search_tips.html
    • In line 18 change "../index.html" to the URL to the web page you want the user to return to, after searching
  4. Open configuration/configuration.pl, necessary changes:
    • Line 13: $INDEXER_START is the path to the directory in which files will be searched, including sub-directories. The directory may be the ibay's html directory or any sub-directory of this. All files in this directory must of course be accessible from WWW.
    • Line 17: $BASE_URL is the URL pointing to the directory in line 13
    • Line 20: $SEARCH_URL is the absolute URL to ksearch.cgi
    • Line 23: $KSEARCH_DIR is the file path to the ksearch directory
    • Line 26: $KSEARCH_URL is the URL to the ksearch directory
    • Line 31: If you want to restrict access to indexer.cgi (and hence ability to initiate the indexing process) to certain domains, set @VALID_REFERERS to a list of acceptable domains. NOTE: There is a difference between http://www.mydomain.com and http://mydomain.com. An empty list means that all domains are accepted.
    • Line 32: $INDEXER_URL is the absolute URL to indexer.cgi
    • Line 33: $PASSWORD is a self-chosen password required to access indexer.cgi
    • Line 72: $LOG_SEARCH is the path to search_log.txt, used for logging searches
    • All other configuration.pl changes are optional. If you don't know what they are, then don't change them.
  5. Ignore Files and Folders: ignore_files.txt
    • Add the full path of files/folders you do NOT want to index to the ignore files list, on separate lines. =NOTE=: After indexing, you may discover files/folders you don't want to include in your search engine. You may later come back and add files/folders -- however, you'll need to re-index your website using indexer.cgi
  6. Stop Terms: stop_terms.txt
    • Add terms you want to IGNORE to the search engine stop terms list, on separate lines. =NOTE=: After indexing, you may discover terms you don't want to include in your search engine. You may later come back and add terms to the file -- however, you'll need to re-index your website using indexer.cgi
  7. Copy the contents of the directory "search" to /opt/ksearch:
    $sudo mkdir /opt/ksearch
    $sudo cp -R search/* /opt/ksearch/
    The 5 files not included in directory "search" (CHANGELOG.txt, GNU.txt, HISTORY.txt, README.txt, and FAQs.html) are for personal reference, troubleshooting, and future use, and need not be copied.
  8. Change the ownership of all copied files to www.www:
    $sudo chown -R www.www /opt/ksearch
  9. Using the chmod command, set permissions for each copied file and directory as follows
    $sudo chmod 755 /opt/ksearch/*.cgi /opt/ksearch/indexer.pl
    $sudo chmod 744 /opt/ksearch/configuration/*
    $sudo chmod 755 /opt/ksearch/ks_images
    $sudo chmod 644 /opt/ksearch/ks_images/*
    $sudo chmod 644 /opt/ksearch/*html
    $sudo chmod 644 /opt/ksearch/templates/*
  10. Make an addition to httpd.conf by creating the file
    /etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/98Ksearch
    With the following contents:
    Alias /ksearch /opt/ksearch
    <Directory /opt/ksearch >
    Options +ExecCGI
    order deny,allow
    deny from all
    allow from { "$localAccess, $externalSSLAccess"; }
    </Directory>
  11. Expand the template:
    $sudo /sbin/e-smith/expand-template /etc/httpd/conf/httpd.conf
  12. Restart httpd:
    $sudo /etc/init.d/httpd-e-smith restart
  13. Run the INDEXER: Open your browser and run the indexer script, e.g.: http://www.MyWebsite.com/ksearch/indexer.cgi. The time required will depend on the size of your site and your server's CPU. =NOTE=: You need to use the same URL path as specified in configuration.pl line 28, @VALID_REFERERS.
  14. Test it out:
    Open the search_form.html (e.g. http://www.MyWebsite.com/ksearch/search_form.html). Run a search. Questions or problems, FIRST read the enclosed FAQs.html file
  15. As an alternative to doing indexing via a browser and the indexer.cgi script, you may do indexing from a command line with indexer.pl. For this to work, you will probably need to change the line in indexer.pl, starting with "my $configuration_file" to make sure it points to the correct configuration file.