public class StandardSitemapResolverFactory extends Object implements ISitemapResolverFactory, IXMLConfigurable
Factory used to created StandardSitemapResolver
instances.
Refer to StandardSitemapResolver
for resolution logic.
<sitemapResolverFactory ignore="[false|true]" lenient="[false|true]" class="com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory"> <tempDir>(where to store temp files)</tempDir> <path> (Optional path relative to URL root for a sitemap. Use a single empty "path" tag to rely instead on any sitemaps specified as start URLs or defined in robots.txt, if enabled. Not specifying any path tags falls back to trying to locate sitemaps using default paths.) </path> (... repeat path tag as needed ...) </sitemapResolverFactory>
The following ignores sitemap files present on web sites.
<sitemapResolverFactory ignore="true"/>
StandardSitemapResolver
Constructor and Description |
---|
StandardSitemapResolverFactory() |
Modifier and Type | Method and Description |
---|---|
ISitemapResolver |
createSitemapResolver(HttpCrawlerConfig config,
boolean resume) |
boolean |
equals(Object other) |
long |
getFrom() |
String[] |
getSitemapLocations()
Deprecated.
Since 2.3.0, use
HttpCrawlerConfig.getStartSitemapURLs() |
String[] |
getSitemapPaths()
Gets the URL paths, relative to the URL root, from which to try
locate and resolve sitemaps.
|
File |
getTempDir()
Gets the directory where sitemap files are temporary stored
before they are parsed.
|
int |
hashCode() |
boolean |
isEscalateErrors() |
boolean |
isLenient() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setEscalateErrors(boolean escalateErrors) |
void |
setFrom(long from) |
void |
setLenient(boolean lenient) |
void |
setSitemapLocations(String... sitemapLocations)
Deprecated.
Since 2.3.0, use
HttpCrawlerConfig.setStartSitemapURLs(String[]) |
void |
setSitemapPaths(String... sitemapPaths)
Sets the URL paths, relative to the URL root, from which to try
locate and resolve sitemaps.
|
void |
setTempDir(File tempDir)
Sets the temporary directory where sitemap files are temporary stored
before they are parsed.
|
String |
toString() |
public ISitemapResolver createSitemapResolver(HttpCrawlerConfig config, boolean resume)
createSitemapResolver
in interface ISitemapResolverFactory
public String[] getSitemapPaths()
public void setSitemapPaths(String... sitemapPaths)
sitemapPaths
- sitemap paths.@Deprecated public String[] getSitemapLocations()
HttpCrawlerConfig.getStartSitemapURLs()
public void setSitemapLocations(String... sitemapLocations)
HttpCrawlerConfig.setStartSitemapURLs(String[])
sitemapLocations
- sitemap locationspublic boolean isLenient()
public void setLenient(boolean lenient)
public long getFrom()
public void setFrom(long from)
public boolean isEscalateErrors()
public void setEscalateErrors(boolean escalateErrors)
public File getTempDir()
null
(default), temporary
files are created directly under AbstractCrawlerConfig.getWorkDir()
.
the crawler working directory is also undefined, it will use the
system temporary directory, as returned by
FileUtils.getTempDirectory()
.public void setTempDir(File tempDir)
tempDir
- directory where temporary files are writtenpublic void loadFromXML(Reader in) throws IOException
loadFromXML
in interface IXMLConfigurable
IOException
public void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
Copyright © 2009–2020 Norconex Inc.. All rights reserved.