public class SegmentCountURLFilter extends AbstractOnMatchFilter implements IReferenceFilter, IDocumentFilter, IMetadataFilter, IXMLConfigurable
Filters URL based based on the number of URL segments. A URL with a number of segments equal or more than the specified count will either be included or excluded, as specified.
By default segments are obtained by breaking the URL text at each forward slashes (/), starting after the host name. You can define different or additional segment separator characters.
When duplicate is true, it will count the maximum
number of duplicate segments found.
<filter class="com.norconex.collector.http.filter.impl.SegmentCountURLFilter"
onMatch="[include|exclude]"
count="(numeric value)"
duplicate="[false|true]"
separator="(a regex identifying segment separator)" />
The following will reject URLs with more than 5 forward slashes after the domain.
<filter class="com.norconex.collector.http.filter.impl.SegmentCountURLFilter"
onMatch="exclude" count="5" />
Pattern| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULT_SEGMENT_COUNT
Default segment count.
|
static String |
DEFAULT_SEGMENT_SEPARATOR_PATTERN
Default segment separator pattern.
|
| Constructor and Description |
|---|
SegmentCountURLFilter()
Constructor.
|
SegmentCountURLFilter(int count)
Constructor.
|
SegmentCountURLFilter(int count,
OnMatch onMatch)
Constructor.
|
SegmentCountURLFilter(int count,
OnMatch onMatch,
boolean duplicate)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
acceptDocument(ImporterDocument document) |
boolean |
acceptMetadata(String reference,
Properties metadata) |
boolean |
acceptReference(String url) |
boolean |
equals(Object obj) |
int |
getCount() |
String |
getSeparator()
Gets the segment separator pattern
|
int |
hashCode() |
boolean |
isDuplicate() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setCount(int count) |
void |
setDuplicate(boolean duplicate) |
void |
setSeparator(String separator) |
String |
toString() |
getOnMatch, loadFromXML, saveToXML, setOnMatchpublic static final String DEFAULT_SEGMENT_SEPARATOR_PATTERN
public static final int DEFAULT_SEGMENT_COUNT
public SegmentCountURLFilter()
public SegmentCountURLFilter(int count)
count - how many segmentpublic SegmentCountURLFilter(int count,
OnMatch onMatch)
count - how many segmentonMatch - what to do on matchpublic SegmentCountURLFilter(int count,
OnMatch onMatch,
boolean duplicate)
count - how many segmentonMatch - what to do on matchduplicate - whether to handle duplicatespublic String getSeparator()
public final void setSeparator(String separator)
public int getCount()
public final void setCount(int count)
public boolean isDuplicate()
public final void setDuplicate(boolean duplicate)
public boolean acceptDocument(ImporterDocument document)
acceptDocument in interface IDocumentFilterpublic boolean acceptMetadata(String reference, Properties metadata)
acceptMetadata in interface IMetadataFilterpublic boolean acceptReference(String url)
acceptReference in interface IReferenceFilterpublic void loadFromXML(Reader in)
loadFromXML in interface IXMLConfigurablepublic void saveToXML(Writer out) throws IOException
saveToXML in interface IXMLConfigurableIOExceptionpublic String toString()
toString in class AbstractOnMatchFilterpublic int hashCode()
hashCode in class AbstractOnMatchFilterpublic boolean equals(Object obj)
equals in class AbstractOnMatchFilterCopyright © 2009–2020 Norconex Inc.. All rights reserved.