Difference between revisions of "Swish-e"
m (→Forum link) |
|||
Line 70: | Line 70: | ||
ReplaceRules remove /home/e-smith/files/ibays | ReplaceRules remove /home/e-smith/files/ibays | ||
ReplaceRules prepend //smeservername | ReplaceRules prepend //smeservername | ||
+ | # Next line will not work if you have dir's called "files"... | ||
ReplaceRules replace /files/ / | ReplaceRules replace /files/ / | ||
+ | # | ||
Next: run the swish. The index file will be placed in the current dir. | Next: run the swish. The index file will be placed in the current dir. |
Revision as of 12:17, 15 March 2009
Description
Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files.
Forum link
http://forums.contribs.org/index.php/topic,43486.0.html
Please add comment there so I can merge it here later!
Installation
Download rpm's from http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-2.4.5-4.i386.rpm wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-debuginfo-2.4.5-4.i386.rpm wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-devel-2.4.5-4.i386.rpm wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-perl-2.4.5-4.i386.rpm wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-perl-api-2.4.5-4.i386.rpm
Install with dependencies from the SME Contribs repository by issuing the following command on the SME Server shell.
Howto enable dag's repository: http://wiki.contribs.org/Dag
yum --enablerepo=dag localinstall swish-e-2.4.5-4.i386.rpm swish-e-d* swish-e-p*
There is no need to reboot. Test:
swish-e -h
Setup Part 2
In order to have swish-e index .doc .xls and .pdf files we need:
yum install --enablerepo=dag perl-Spreadsheet-ParseExcel perl-MIME-Types xpdf catdoc
Test filter:
swish-filter-test swish-filter-test -man swish-filter-test -headers /path/to/xlsfile.xls swish-filter-test -headers /path/to/docfile.doc swish-filter-test -headers /path/to/pdffile.pdf
Configuration
As I was not interested in indexing web pages, just files in ibays I used the following spider: /usr/libexec/swish-e/DirTree.pl
I modified it, so it would index .doc .xls .pdf files:
sub check_path { my $path = shift; return 1 if $path = /\.doc$/; # return true if ends in .doc? return 1 if $path = /\.xls$/; # return true if ends in .xls? return 1 if $path = /\.pdf$/; # return true if ends in .pdf? return 0; # otherwise return false }
Next create a config file: ibay.cfg
# ibay.cfg, a shwish-e config file # IndexDir /usr/libexec/swish-e/DirTree.pl # SwishProgParameters /home/e-smith/files/ibays/ibayname/files # StoreDescription HTML <body> 20000 # # replace to make links to UNC # works in IE, needs fix for Firefox ReplaceRules remove /home/e-smith/files/ibays ReplaceRules prepend //smeservername # Next line will not work if you have dir's called "files"... ReplaceRules replace /files/ / #
Next: run the swish. The index file will be placed in the current dir.
swish-e -c ibay.cfg -S prog -v 9
This should create both index.swish-e and index.swish-e.prop in the current dir.
swish.cgi
For PoC I have setup this basic configuration in /home/e-smith/files/ibays/Primary/cgi-bin
Copy (or symlink) swish.cgi. I prefer copy as I can modify the script without loosing the original.
cp /usr/libexec/swish-e/swish.cgi /home/e-smith/files/ibays/Primary/cgi-bin/
Create /home/e-smith/files/ibays/Primary/cgi-bin/.swishcgi.conf:
return { swish_index => '/home/e-smith/files/ibays/Primary/cgi-bin/index.swish-e', title_property => 'Just a Sample Title ', # Not required, but recommended # # Next line to make it clickable # prepend_path => 'file:////', # link_property => 'swishdocpath', title_property => 'swishtitle', };
Options
Under construction
Usage
Search should now be available at http://smeservername/cgi-bin/swish.cgi