opensubscriber
   Find in this group all groups
 
Unknown more information…

h : htdig-general@lists.sourceforge.net 20 August 2008 • 3:48AM -0400

[htdig] How can I prevent htdig from crawling and/or reporting Apache indexes?
by Mark Bartlett

REPLY TO AUTHOR
 
REPLY TO GROUP




Hi Everyone!

I am using htdig to search a robohelp generated website residing on Apache.
It seems to crawl the site but It also crawls the apache index pages and
returns those results from a search.
If I turn off the Apache Indexes htdig does not crawl the site.

A example Apache index page is at:
http://proddoc.groundworkopensource.com/Bookshelf_RoboHelp/Maintaining_GroundWork_Monitor/

I have tried adding to the exclude_urls directive in htdig.conf but I am
unsure how to use it properly.

fwiw: I have read the FAQ 4.20 thru 4.23...

So How can I prevent htdig from crawling and/or reporting Apache indexes?

Thanks,
Mark

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
ht://Dig general mailing list: <htdig-general@list...>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Bookmark with:

Delicious   Digg   reddit   Facebook   StumbleUpon

opensubscriber is not affiliated with the authors of this message nor responsible for its content.