opensubscriber
   Find in this group all groups
 
Unknown more information…

h : htdig-general@lists.sourceforge.net 24 March 2011 • 11:22AM -0400

[htdig] htdig windows7
by John Dorapalli

REPLY TO AUTHOR
 
REPLY TO GROUP




Hi,

I am trying to make it work on windows7 for PDF indexing.
All the database files are being generated but I see the following issues,

1) The db.docsdb is generated with pdf id but not with TItle.
2) The excrepts(H) attribute is missing from the db.docs file
3) The db.worddump is generated with junk charecters.

The db.docs and db.worddump files, I tried using the ones generated on linux
which worked fine but not the db.docsdb and db.docs.index files.

Please let me know what options I have?

I tested running perl sccripts doc2html and pdf2html and they are parsing my
pdf but only the local ones. They are not parsing when I pass the URL of the
pdf.
pdftotext and pdfinfo are working fine.

Also, how can index the pdfs in my local system directory.
I tried these options but it didn't work,

start_url:             http://localhost/pdf/
#local_urls:   http://localhost/pdf/ = C:/cygwin/var/www/htdocs/pdf/
#local_urls_only: true


Thanks for your help.

John

Bookmark with:

Delicious   Digg   reddit   Facebook   StumbleUpon

opensubscriber is not affiliated with the authors of this message nor responsible for its content.