Difference between revisions of "Document search"

From SME Server
Jump to navigationJump to search
(Created page with 'This is my first attempt at a HowTo so please bear with me and help improve it ... I needed a document search facility for my users, essentially to make them able to search thro…')
 
m (remove duplicate content and make it a note box, despite that I think this is more suitable for a the talk page.)
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This is my first attempt at a HowTo so please bear with me and help improve it ...
+
{{Note box|This is my first attempt at a HowTo so please bear with me and help improve it ...
  
 
I needed a document search facility for my users, essentially to make them able to search through various notes, memos etc. available on the web server. I found a usable script at www.kscripts.com, and have adjusted it a bit to make it more feasible for the SME-server, so I have produced a new file package you can get here: http://ibsgaardenprivat.dk/ksearch1.5b.tgz
 
I needed a document search facility for my users, essentially to make them able to search through various notes, memos etc. available on the web server. I found a usable script at www.kscripts.com, and have adjusted it a bit to make it more feasible for the SME-server, so I have produced a new file package you can get here: http://ibsgaardenprivat.dk/ksearch1.5b.tgz
  
Here is a copy of my new README, part of the file package:
+
Here is a copy of my new README, part of the file package:}}
  
== GENERAL INSTALLATION INSTRUCTIONS: ==
+
=== General Installation Instructions: ===
  
 
You will need a text editor, and access to your server to edit and run scripts. See faqs.html for details.
 
You will need a text editor, and access to your server to edit and run scripts. See faqs.html for details.
Line 32: Line 32:
 
# Stop Terms: stop_terms.txt
 
# Stop Terms: stop_terms.txt
 
#* Add terms you want to IGNORE to the search engine stop terms list, on separate lines. =NOTE=: After indexing, you may discover terms you don't want to include in your search engine.  You may later come back and add terms to the file -- however, you'll need to re-index your website using indexer.cgi
 
#* Add terms you want to IGNORE to the search engine stop terms list, on separate lines. =NOTE=: After indexing, you may discover terms you don't want to include in your search engine.  You may later come back and add terms to the file -- however, you'll need to re-index your website using indexer.cgi
# Copy the contents of the directory "search" to /opt/ksearch:
+
# Copy the contents of the directory "search" to /opt/ksearch:<br /> $sudo mkdir /opt/ksearch<br /> $sudo cp -R search/* /opt/ksearch/<br />The 5 files not included in directory "search" (CHANGELOG.txt, GNU.txt, HISTORY.txt, README.txt, and FAQs.html) are for personal reference, troubleshooting, and future use, and need not be copied.
              $sudo mkdir /opt/ksearch
+
# Change the ownership of all copied files to www.www:<br />$sudo chown -R www.www /opt/ksearch
              $sudo cp -R search/* /opt/ksearch/
+
# Using the chmod command, set permissions for each copied file and directory as follows<br />$sudo chmod 755 /opt/ksearch/*.cgi /opt/ksearch/indexer.pl<br />$sudo chmod 744 /opt/ksearch/configuration/*<br />$sudo chmod 755 /opt/ksearch/ks_images<br />$sudo chmod 644 /opt/ksearch/ks_images/*<br />$sudo chmod 644 /opt/ksearch/*html<br /> $sudo chmod 644 /opt/ksearch/templates/*
The 5 files not included in directory "search" (CHANGELOG.txt, GNU.txt, HISTORY.txt, README.txt, and FAQs.html) are for personal reference, troubleshooting, and future use, and need not be copied.
+
# Make an addition to httpd.conf by creating the file<br />/etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/98Ksearch<br />With the following contents:<br />Alias /ksearch /opt/ksearch<br /><Directory /opt/ksearch ><br />  Options +ExecCGI<br />  order deny,allow<br />  deny from all<br />  allow from { "$localAccess, $externalSSLAccess"; }<br /></Directory>
# Change the ownership of all copied files to www.www:
+
# Expand the template:<br />$sudo /sbin/e-smith/expand-template /etc/httpd/conf/httpd.conf
              $sudo chown -R www.www /opt/ksearch
+
# Restart httpd:<br />$sudo /etc/init.d/httpd-e-smith restart
# Using the chmod command, set permissions for each copied file and directory as follows
+
# Run the INDEXER: Open your browser and run the indexer script, e.g.: http://www.MyWebsite.com/ksearch/indexer.cgi. The time required will depend on the size of your site and your server's CPU. =NOTE=: You need to use the same URL path as specified in configuration.pl line 28, @VALID_REFERERS.
              $sudo chmod 755 /opt/ksearch/*.cgi /opt/ksearch/indexer.pl
+
# Test it out:<br />Open the search_form.html (e.g. http://www.MyWebsite.com/ksearch/search_form.html). Run a search.  Questions or problems, FIRST read the enclosed FAQs.html file
              $sudo chmod 744 /opt/ksearch/configuration/*
 
              $sudo chmod 755 /opt/ksearch/ks_images
 
              $sudo chmod 644 /opt/ksearch/ks_images/*
 
              $sudo chmod 644 /opt/ksearch/*html
 
              $sudo chmod 644 /opt/ksearch/templates/*
 
# Make an addition to httpd.conf by creating the file
 
              /etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/98Ksearch
 
With the following contents:
 
              Alias /ksearch /opt/ksearch
 
              <Directory /opt/ksearch >
 
                        Options +ExecCGI
 
                        order deny,allow
 
                        deny from all
 
                        allow from { "$localAccess $externalSSLAccess"; }
 
                </Directory>
 
# Expand the template:
 
                $sudo /sbin/e-smith/expand-template /etc/httpd/conf/httpd.conf
 
# Restart httpd:
 
                $sudo /etc/init.d/httpd-e-smith restart
 
# Run the INDEXER:
 
                Open your browser and run the indexer script, e.g.: http://www.MyWebsite.com/ksearch/indexer.cgi  
 
                The time required will depend on the size of your site and your server's CPU.
 
                        =NOTE=: You need to use the same URL path as specified in configuration.pl line 28, @VALID_REFERERS.
 
# Test it out:
 
                Open the search_form.html (e.g. http://www.MyWebsite.com/ksearch/search_form.html)
 
                Run a search.  Questions or problems, FIRST read the enclosed FAQs.html file
 
 
# As an alternative to doing indexing via a browser and the indexer.cgi script, you may do indexing from a command line with indexer.pl. For this to work, you will probably need to change the line in indexer.pl, starting with "my $configuration_file" to make sure it points to the correct configuration file.
 
# As an alternative to doing indexing via a browser and the indexer.cgi script, you may do indexing from a command line with indexer.pl. For this to work, you will probably need to change the line in indexer.pl, starting with "my $configuration_file" to make sure it points to the correct configuration file.
 +
----
 +
[[Category:Howto]]

Latest revision as of 22:20, 1 October 2009

Important.png Note:
This is my first attempt at a HowTo so please bear with me and help improve it ...

I needed a document search facility for my users, essentially to make them able to search through various notes, memos etc. available on the web server. I found a usable script at www.kscripts.com, and have adjusted it a bit to make it more feasible for the SME-server, so I have produced a new file package you can get here: http://ibsgaardenprivat.dk/ksearch1.5b.tgz

Here is a copy of my new README, part of the file package:


General Installation Instructions:

You will need a text editor, and access to your server to edit and run scripts. See faqs.html for details.

The contents of the directory "search" will be copied to a newly created directory on the web server "/opt/ksearch".

  1. $sudo yum install xpdf (if you want to index PDF files)
  2. Open search_form.html
    • In line 14 change "../index.html" to the URL to the web page you want the user to return to, after searching
    • In line 19 change "/ksearch/ksearch.cgi" to the URL to the script ksearch.cgi
  3. Open search_tips.html
    • In line 18 change "../index.html" to the URL to the web page you want the user to return to, after searching
  4. Open configuration/configuration.pl, necessary changes:
    • Line 13: $INDEXER_START is the path to the directory in which files will be searched, including sub-directories. The directory may be the ibay's html directory or any sub-directory of this. All files in this directory must of course be accessible from WWW.
    • Line 17: $BASE_URL is the URL pointing to the directory in line 13
    • Line 20: $SEARCH_URL is the absolute URL to ksearch.cgi
    • Line 23: $KSEARCH_DIR is the file path to the ksearch directory
    • Line 26: $KSEARCH_URL is the URL to the ksearch directory
    • Line 31: If you want to restrict access to indexer.cgi (and hence ability to initiate the indexing process) to certain domains, set @VALID_REFERERS to a list of acceptable domains. NOTE: There is a difference between http://www.mydomain.com and http://mydomain.com. An empty list means that all domains are accepted.
    • Line 32: $INDEXER_URL is the absolute URL to indexer.cgi
    • Line 33: $PASSWORD is a self-chosen password required to access indexer.cgi
    • Line 72: $LOG_SEARCH is the path to search_log.txt, used for logging searches
    • All other configuration.pl changes are optional. If you don't know what they are, then don't change them.
  5. Ignore Files and Folders: ignore_files.txt
    • Add the full path of files/folders you do NOT want to index to the ignore files list, on separate lines. =NOTE=: After indexing, you may discover files/folders you don't want to include in your search engine. You may later come back and add files/folders -- however, you'll need to re-index your website using indexer.cgi
  6. Stop Terms: stop_terms.txt
    • Add terms you want to IGNORE to the search engine stop terms list, on separate lines. =NOTE=: After indexing, you may discover terms you don't want to include in your search engine. You may later come back and add terms to the file -- however, you'll need to re-index your website using indexer.cgi
  7. Copy the contents of the directory "search" to /opt/ksearch:
    $sudo mkdir /opt/ksearch
    $sudo cp -R search/* /opt/ksearch/
    The 5 files not included in directory "search" (CHANGELOG.txt, GNU.txt, HISTORY.txt, README.txt, and FAQs.html) are for personal reference, troubleshooting, and future use, and need not be copied.
  8. Change the ownership of all copied files to www.www:
    $sudo chown -R www.www /opt/ksearch
  9. Using the chmod command, set permissions for each copied file and directory as follows
    $sudo chmod 755 /opt/ksearch/*.cgi /opt/ksearch/indexer.pl
    $sudo chmod 744 /opt/ksearch/configuration/*
    $sudo chmod 755 /opt/ksearch/ks_images
    $sudo chmod 644 /opt/ksearch/ks_images/*
    $sudo chmod 644 /opt/ksearch/*html
    $sudo chmod 644 /opt/ksearch/templates/*
  10. Make an addition to httpd.conf by creating the file
    /etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/98Ksearch
    With the following contents:
    Alias /ksearch /opt/ksearch
    <Directory /opt/ksearch >
    Options +ExecCGI
    order deny,allow
    deny from all
    allow from { "$localAccess, $externalSSLAccess"; }
    </Directory>
  11. Expand the template:
    $sudo /sbin/e-smith/expand-template /etc/httpd/conf/httpd.conf
  12. Restart httpd:
    $sudo /etc/init.d/httpd-e-smith restart
  13. Run the INDEXER: Open your browser and run the indexer script, e.g.: http://www.MyWebsite.com/ksearch/indexer.cgi. The time required will depend on the size of your site and your server's CPU. =NOTE=: You need to use the same URL path as specified in configuration.pl line 28, @VALID_REFERERS.
  14. Test it out:
    Open the search_form.html (e.g. http://www.MyWebsite.com/ksearch/search_form.html). Run a search. Questions or problems, FIRST read the enclosed FAQs.html file
  15. As an alternative to doing indexing via a browser and the indexer.cgi script, you may do indexing from a command line with indexer.pl. For this to work, you will probably need to change the line in indexer.pl, starting with "my $configuration_file" to make sure it points to the correct configuration file.