Difference between revisions of "Document search"

From SME Server
Jump to navigationJump to search
(Created page with 'This is my first attempt at a HowTo so please bear with me and help improve it ... I needed a document search facility for my users, essentially to make them able to search thro…')
(No difference)

Revision as of 21:04, 30 September 2009

This is my first attempt at a HowTo so please bear with me and help improve it ...

I needed a document search facility for my users, essentially to make them able to search through various notes, memos etc. available on the web server. I found a usable script at www.kscripts.com, and have adjusted it a bit to make it more feasible for the SME-server, so I have produced a new file package you can get here: http://ibsgaardenprivat.dk/ksearch1.5b.tgz

Here is a copy of my new README, part of the file package:

GENERAL INSTALLATION INSTRUCTIONS:

You will need a text editor, and access to your server to edit and run scripts. See faqs.html for details.

The contents of the directory "search" will be copied to a newly created directory on the web server "/opt/ksearch".

  1. $sudo yum install xpdf (if you want to index PDF files)
  2. Open search_form.html
    • In line 14 change "../index.html" to the URL to the web page you want the user to return to, after searching
    • In line 19 change "/ksearch/ksearch.cgi" to the URL to the script ksearch.cgi
  3. Open search_tips.html
    • In line 18 change "../index.html" to the URL to the web page you want the user to return to, after searching
  4. Open configuration/configuration.pl, necessary changes:
    • Line 13: $INDEXER_START is the path to the directory in which files will be searched, including sub-directories. The directory may be the ibay's html directory or any sub-directory of this. All files in this directory must of course be accessible from WWW.
    • Line 17: $BASE_URL is the URL pointing to the directory in line 13
    • Line 20: $SEARCH_URL is the absolute URL to ksearch.cgi
    • Line 23: $KSEARCH_DIR is the file path to the ksearch directory
    • Line 26: $KSEARCH_URL is the URL to the ksearch directory
    • Line 31: If you want to restrict access to indexer.cgi (and hence ability to initiate the indexing process) to certain domains, set @VALID_REFERERS to a list of acceptable domains. NOTE: There is a difference between http://www.mydomain.com and http://mydomain.com. An empty list means that all domains are accepted.
    • Line 32: $INDEXER_URL is the absolute URL to indexer.cgi
    • Line 33: $PASSWORD is a self-chosen password required to access indexer.cgi
    • Line 72: $LOG_SEARCH is the path to search_log.txt, used for logging searches
    • All other configuration.pl changes are optional. If you don't know what they are, then don't change them.
  5. Ignore Files and Folders: ignore_files.txt
    • Add the full path of files/folders you do NOT want to index to the ignore files list, on separate lines. =NOTE=: After indexing, you may discover files/folders you don't want to include in your search engine. You may later come back and add files/folders -- however, you'll need to re-index your website using indexer.cgi
  6. Stop Terms: stop_terms.txt
    • Add terms you want to IGNORE to the search engine stop terms list, on separate lines. =NOTE=: After indexing, you may discover terms you don't want to include in your search engine. You may later come back and add terms to the file -- however, you'll need to re-index your website using indexer.cgi
  7. Copy the contents of the directory "search" to /opt/ksearch:
              $sudo mkdir /opt/ksearch
              $sudo cp -R search/* /opt/ksearch/

The 5 files not included in directory "search" (CHANGELOG.txt, GNU.txt, HISTORY.txt, README.txt, and FAQs.html) are for personal reference, troubleshooting, and future use, and need not be copied.

  1. Change the ownership of all copied files to www.www:
              $sudo chown -R www.www /opt/ksearch
  1. Using the chmod command, set permissions for each copied file and directory as follows
              $sudo chmod 755 /opt/ksearch/*.cgi /opt/ksearch/indexer.pl
              $sudo chmod 744 /opt/ksearch/configuration/*
              $sudo chmod 755 /opt/ksearch/ks_images
              $sudo chmod 644 /opt/ksearch/ks_images/*
              $sudo chmod 644 /opt/ksearch/*html
              $sudo chmod 644 /opt/ksearch/templates/*
  1. Make an addition to httpd.conf by creating the file
              /etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/98Ksearch

With the following contents:

              Alias /ksearch /opt/ksearch
              <Directory /opt/ksearch >
                       Options +ExecCGI
                       order deny,allow
                       deny from all
                       allow from { "$localAccess $externalSSLAccess"; }
               </Directory>
  1. Expand the template:
               $sudo /sbin/e-smith/expand-template /etc/httpd/conf/httpd.conf
  1. Restart httpd:
               $sudo /etc/init.d/httpd-e-smith restart
  1. Run the INDEXER:
               Open your browser and run the indexer script, e.g.: http://www.MyWebsite.com/ksearch/indexer.cgi 
               The time required will depend on the size of your site and your server's CPU.
                       =NOTE=: You need to use the same URL path as specified in configuration.pl line 28, @VALID_REFERERS.
  1. Test it out:
               Open the search_form.html (e.g. http://www.MyWebsite.com/ksearch/search_form.html)
               Run a search.  Questions or problems, FIRST read the enclosed FAQs.html file
  1. As an alternative to doing indexing via a browser and the indexer.cgi script, you may do indexing from a command line with indexer.pl. For this to work, you will probably need to change the line in indexer.pl, starting with "my $configuration_file" to make sure it points to the correct configuration file.