Index (1.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES

A B C D E F G H I K L M N P R S T U

A

addLabel(String) - Method in class de.l3s.boilerpipe.document.TextBlock: Adds an arbitrary String label to this TextBlock.
addLabelAction(LabelAction) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
addLabels(Set<String>) - Method in class de.l3s.boilerpipe.document.TextBlock: Adds a set of labels to this TextBlock.
addLabels(String...) - Method in class de.l3s.boilerpipe.document.TextBlock: Adds a set of labels to this TextBlock.
addLabelsTo(TextBlock) - Method in class de.l3s.boilerpipe.labels.LabelAction
AddPrecedingLabelsFilter - Class in de.l3s.boilerpipe.filters.heuristics: Adds the labels of the preceding block to the current block, optionally adding a prefix.
AddPrecedingLabelsFilter(String) - Constructor for class de.l3s.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter: Creates a new AddPrecedingLabelsFilter instance.
addTagAction(String, TagAction) - Method in class de.l3s.boilerpipe.sax.TagActionMap: Adds a particular TagAction for a given tag.
addTextBlock(TextBlock) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
addTo(TextBlock) - Method in class de.l3s.boilerpipe.labels.ConditionalLabelAction
addTo(TextBlock) - Method in class de.l3s.boilerpipe.labels.LabelAction
addWhitespaceIfNecessary() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
ARTICLE_EXTRACTOR - Static variable in class de.l3s.boilerpipe.extractors.CommonExtractors: Works very well for most types of Article-like HTML.
ARTICLE_METADATA - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
ArticleExtractor - Class in de.l3s.boilerpipe.extractors: A full-text extractor which is tuned towards news articles.
ArticleExtractor() - Constructor for class de.l3s.boilerpipe.extractors.ArticleExtractor
ArticleMetadataFilter - Class in de.l3s.boilerpipe.filters.heuristics
ArticleSentencesExtractor - Class in de.l3s.boilerpipe.extractors: A full-text extractor which is tuned towards extracting sentences from news articles.
ArticleSentencesExtractor() - Constructor for class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
avgNumWords() - Method in class de.l3s.boilerpipe.document.TextDocumentStatistics: Returns the average number of words at block-level (= overall number of words divided by the number of blocks).

B

BlockProximityFusion - Class in de.l3s.boilerpipe.filters.heuristics: Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
BlockProximityFusion(int, boolean, boolean) - Constructor for class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion: Creates a new BlockProximityFusion instance.
BoilerpipeDocumentSource - Interface in de.l3s.boilerpipe: Something that can be represented as a TextDocument.
BoilerpipeExtractor - Interface in de.l3s.boilerpipe: Describes a complete filter pipeline.
BoilerpipeFilter - Interface in de.l3s.boilerpipe: A generic BoilerpipeFilter.
BoilerpipeHTMLContentHandler - Class in de.l3s.boilerpipe.sax: A simple SAX ContentHandler, used by BoilerpipeSAXInput.
BoilerpipeHTMLContentHandler() - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler: Constructs a BoilerpipeHTMLContentHandler using the DefaultTagActionMap.
BoilerpipeHTMLContentHandler(TagActionMap) - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler: Constructs a BoilerpipeHTMLContentHandler using the given TagActionMap.
BoilerpipeHTMLParser - Class in de.l3s.boilerpipe.sax: A simple SAX Parser, used by BoilerpipeSAXInput.
BoilerpipeHTMLParser() - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser: Constructs a BoilerpipeHTMLParser using a default HTML content handler.
BoilerpipeHTMLParser(BoilerpipeHTMLContentHandler) - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser: Constructs a BoilerpipeHTMLParser using the given BoilerpipeHTMLContentHandler.
BoilerpipeHTMLParser(boolean) - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
BoilerpipeInput - Interface in de.l3s.boilerpipe: A source that returns TextDocuments.
BoilerpipeProcessingException - Exception in de.l3s.boilerpipe: Exception for signaling failure in the processing pipeline.
BoilerpipeProcessingException() - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
BoilerpipeProcessingException(String, Throwable) - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
BoilerpipeProcessingException(String) - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
BoilerpipeProcessingException(Throwable) - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
BoilerpipeSAXInput - Class in de.l3s.boilerpipe.sax: Parses an InputSource using SAX and returns a TextDocument.
BoilerpipeSAXInput(InputSource) - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeSAXInput: Creates a new instance of BoilerpipeSAXInput for the given InputSource.
BoilerplateBlockFilter - Class in de.l3s.boilerpipe.filters.simple: Removes TextBlocks which have explicitly been marked as "not content".
BoilerplateBlockFilter() - Constructor for class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter

C

CANOLA_EXTRACTOR - Static variable in class de.l3s.boilerpipe.extractors.CommonExtractors: Trained on krdwrd Canola (different definition of "boilerplate").
CanolaExtractor - Class in de.l3s.boilerpipe.extractors: A full-text extractor trained on krdwrd Canola .
CanolaExtractor() - Constructor for class de.l3s.boilerpipe.extractors.CanolaExtractor
changesTagLevel() - Method in class de.l3s.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
changesTagLevel() - Method in class de.l3s.boilerpipe.sax.CommonTagActions.Chained
changesTagLevel() - Method in class de.l3s.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
changesTagLevel() - Method in class de.l3s.boilerpipe.sax.MarkupTagAction
changesTagLevel() - Method in interface de.l3s.boilerpipe.sax.TagAction
characters(char[], int, int) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
CLASSIFIER - Static variable in class de.l3s.boilerpipe.extractors.CanolaExtractor: The actual classifier, exposed.
classify(TextBlock, TextBlock, TextBlock) - Method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
classify(TextBlock, TextBlock, TextBlock) - Method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
clone() - Method in class de.l3s.boilerpipe.document.TextBlock
CommonExtractors - Class in de.l3s.boilerpipe.extractors: Provides quick access to common BoilerpipeExtractors.
CommonTagActions - Class in de.l3s.boilerpipe.sax: Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
CommonTagActions.BlockTagLabelAction - Class in de.l3s.boilerpipe.sax: CommonTagActions for block-level elements, which triggers some LabelAction on the generated TextBlock.
CommonTagActions.BlockTagLabelAction(LabelAction) - Constructor for class de.l3s.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
CommonTagActions.Chained - Class in de.l3s.boilerpipe.sax
CommonTagActions.Chained(TagAction, TagAction) - Constructor for class de.l3s.boilerpipe.sax.CommonTagActions.Chained
CommonTagActions.InlineTagLabelAction - Class in de.l3s.boilerpipe.sax: CommonTagActions for inline elements, which triggers some LabelAction on the generated TextBlock.
CommonTagActions.InlineTagLabelAction(LabelAction) - Constructor for class de.l3s.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
ConditionalLabelAction - Class in de.l3s.boilerpipe.labels: Adds labels to a TextBlock if the given criteria are met.
ConditionalLabelAction(TextBlockCondition, String...) - Constructor for class de.l3s.boilerpipe.labels.ConditionalLabelAction
ContentFusion - Class in de.l3s.boilerpipe.filters.heuristics
ContentFusion() - Constructor for class de.l3s.boilerpipe.filters.heuristics.ContentFusion: Creates a new ContentFusion instance.

D

de.l3s.boilerpipe - package de.l3s.boilerpipe: The Boilerpipe top-level package.
de.l3s.boilerpipe.conditions - package de.l3s.boilerpipe.conditions
de.l3s.boilerpipe.document - package de.l3s.boilerpipe.document: The classes in this package represent the simple Boilerpipe document model.
de.l3s.boilerpipe.estimators - package de.l3s.boilerpipe.estimators
de.l3s.boilerpipe.extractors - package de.l3s.boilerpipe.extractors: This package contains some standard extractors (i.e., completely piped BoilerpipeFilters)
de.l3s.boilerpipe.filters.english - package de.l3s.boilerpipe.filters.english: The BoilerpipeFilters in this package have only been tested on English text.
de.l3s.boilerpipe.filters.heuristics - package de.l3s.boilerpipe.filters.heuristics: The BoilerpipeFilters in this package are pure heuristics.
de.l3s.boilerpipe.filters.simple - package de.l3s.boilerpipe.filters.simple: The BoilerpipeFilters in this package are straight-forward and probably not really specific to English.
de.l3s.boilerpipe.labels - package de.l3s.boilerpipe.labels
de.l3s.boilerpipe.sax - package de.l3s.boilerpipe.sax: Classes related to parsing and producing HTML from/to Boilerpipe TextDocuments.
de.l3s.boilerpipe.util - package de.l3s.boilerpipe.util: Some helper classes.
debugString() - Method in class de.l3s.boilerpipe.document.TextDocument: Returns detailed debugging information about the contained TextBlocks.
DEFAULT_EXTRACTOR - Static variable in class de.l3s.boilerpipe.extractors.CommonExtractors: Usually worse than ArticleExtractor, but simpler/no heuristics.
DEFAULT_INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
DEFAULT_INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
DefaultExtractor - Class in de.l3s.boilerpipe.extractors: A quite generic full-text extractor.
DefaultExtractor() - Constructor for class de.l3s.boilerpipe.extractors.DefaultExtractor
DefaultLabels - Class in de.l3s.boilerpipe.labels: Some pre-defined labels which can be used in conjunction with TextBlock.addLabel(String) and TextBlock.hasLabel(String).
DefaultLabels() - Constructor for class de.l3s.boilerpipe.labels.DefaultLabels
DefaultTagActionMap - Class in de.l3s.boilerpipe.sax: Default TagActions.
DefaultTagActionMap() - Constructor for class de.l3s.boilerpipe.sax.DefaultTagActionMap
DensityRulesClassifier - Class in de.l3s.boilerpipe.filters.english: Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities.
DensityRulesClassifier() - Constructor for class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
DocumentTitleMatchClassifier - Class in de.l3s.boilerpipe.filters.heuristics: Marks TextBlocks which contain parts of the HTML <TITLE> tag, using some heuristics which are quite specific to the news domain.
DocumentTitleMatchClassifier(String) - Constructor for class de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier

E

EMPTY_END - Static variable in class de.l3s.boilerpipe.document.TextBlock
EMPTY_START - Static variable in class de.l3s.boilerpipe.document.TextBlock
end(BoilerpipeHTMLContentHandler, String, String) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
end(BoilerpipeHTMLContentHandler, String, String) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.Chained
end(BoilerpipeHTMLContentHandler, String, String) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
end(BoilerpipeHTMLContentHandler, String, String) - Method in class de.l3s.boilerpipe.sax.MarkupTagAction
end(BoilerpipeHTMLContentHandler, String, String) - Method in interface de.l3s.boilerpipe.sax.TagAction
endDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
endElement(String, String, String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
endPrefixMapping(String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
ExpandTitleToContentFilter - Class in de.l3s.boilerpipe.filters.heuristics: Marks all TextBlocks "content" which are between the headline and the part that has already been marked content, if they are marked DefaultLabels.MIGHT_BE_CONTENT.
ExpandTitleToContentFilter() - Constructor for class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
ExtractorBase - Class in de.l3s.boilerpipe.extractors: The base class of Extractors.
ExtractorBase() - Constructor for class de.l3s.boilerpipe.extractors.ExtractorBase

F

fetch(URL) - Static method in class de.l3s.boilerpipe.sax.HTMLFetcher: Fetches the document at the given URL, using URLConnection.
flushBlock() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler

G

getCharset() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
getContainedTextElements() - Method in class de.l3s.boilerpipe.document.TextBlock: Returns the containedTextElements BitSet, or null.
getContent() - Method in class de.l3s.boilerpipe.document.TextDocument: Returns the TextDocument's content.
getData() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
getDefaultInstance() - Static method in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter: Returns the singleton instance for DeleteBlocksAfterContentFilter.
getDefaultInstance() - Static method in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
getExtraStyleSheet() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Returns the extra stylesheet definition that will be inserted in the HEAD element.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.ArticleExtractor: Returns the singleton instance for ArticleExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor: Returns the singleton instance for ArticleSentencesExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.CanolaExtractor: Returns the singleton instance for CanolaExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.DefaultExtractor: Returns the singleton instance for DefaultExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.LargestContentExtractor: Returns the singleton instance for LargestContentExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor: Returns the singleton instance for NumWordsRulesExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier: Returns the singleton instance for RulebasedBoilerpipeClassifier.
getInstance() - Static method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier: Returns the singleton instance for RulebasedBoilerpipeClassifier.
getInstance() - Static method in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder: Returns the singleton instance for TerminatingBlocksFinder.
getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter: Returns the singleton instance for ExpandTitleToContentFilter.
getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor: Returns the singleton instance for BlockFusionProcessor.
getInstance() - Static method in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter: Returns the singleton instance for BoilerplateBlockFilter.
getInstance() - Static method in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter: Returns the singleton instance for TerminatingBlocksFinder.
getLabels() - Method in class de.l3s.boilerpipe.document.TextBlock: Returns the labels associated to this TextBlock, or null if no such labels exist.
getLinkDensity() - Method in class de.l3s.boilerpipe.document.TextBlock
getNumWords() - Method in class de.l3s.boilerpipe.document.TextBlock
getNumWords() - Method in class de.l3s.boilerpipe.document.TextDocumentStatistics: Returns the overall number of words in all blocks.
getNumWordsInAnchorText() - Method in class de.l3s.boilerpipe.document.TextBlock
getOffsetBlocksEnd() - Method in class de.l3s.boilerpipe.document.TextBlock
getOffsetBlocksStart() - Method in class de.l3s.boilerpipe.document.TextBlock
getPostHighlight() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Returns the string that will be inserted after any highlighted HTML block.
getPotentialTitles() - Method in class de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
getPreHighlight() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Returns the string that will be inserted before any highlighted HTML block.
getTagLevel() - Method in class de.l3s.boilerpipe.document.TextBlock
getText(String) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor: Extracts text from the HTML code given as a String.
getText(InputSource) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor: Extracts text from the HTML code available from the given InputSource.
getText(Reader) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor: Extracts text from the HTML code available from the given Reader.
getText(TextDocument) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor: Extracts text from the given TextDocument object.
getText() - Method in class de.l3s.boilerpipe.document.TextBlock
getText(boolean, boolean) - Method in class de.l3s.boilerpipe.document.TextDocument: Returns the TextDocument's content, non-content or both
getText(String) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase: Extracts text from the HTML code given as a String.
getText(InputSource) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase: Extracts text from the HTML code available from the given InputSource.
getText(URL) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase: Extracts text from the HTML code available from the given URL.
getText(Reader) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase: Extracts text from the HTML code available from the given Reader.
getText(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase: Extracts text from the given TextDocument object.
getTextBlocks() - Method in class de.l3s.boilerpipe.document.TextDocument: Returns the TextBlocks of this document.
getTextDensity() - Method in class de.l3s.boilerpipe.document.TextBlock
getTextDocument() - Method in interface de.l3s.boilerpipe.BoilerpipeInput: Returns (somehow) a TextDocument.
getTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeSAXInput: Retrieves the TextDocument using a default HTML parser.
getTextDocument(BoilerpipeHTMLParser) - Method in class de.l3s.boilerpipe.sax.BoilerpipeSAXInput: Retrieves the TextDocument using the given HTML parser.
getTitle() - Method in class de.l3s.boilerpipe.document.TextDocument: Returns the "main" title for this document, or null if no such title has ben set.
getTitle() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler

H

hasLabel(String) - Method in class de.l3s.boilerpipe.document.TextBlock: Checks whether this TextBlock has the given label.
HR - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
HTMLDocument - Class in de.l3s.boilerpipe.sax: An InputSourceable for HTMLFetcher.
HTMLDocument(byte[], Charset) - Constructor for class de.l3s.boilerpipe.sax.HTMLDocument
HTMLDocument(String) - Constructor for class de.l3s.boilerpipe.sax.HTMLDocument
HTMLFetcher - Class in de.l3s.boilerpipe.sax: A very simple HTTP/HTML fetcher, really just for demo purposes.
HTMLHighlighter - Class in de.l3s.boilerpipe.sax: Highlights text blocks in an HTML document that have been marked as "content" in the corresponding TextDocument.

I

ignorableWhitespace(char[], int, int) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
IgnoreBlocksAfterContentFilter - Class in de.l3s.boilerpipe.filters.english: Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT.
IgnoreBlocksAfterContentFilter(int) - Constructor for class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
IgnoreBlocksAfterContentFromEndFilter - Class in de.l3s.boilerpipe.filters.english: Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT, and after any content block.
INDICATES_END_OF_TEXT - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
InputSourceable - Interface in de.l3s.boilerpipe.sax: An InputSourceable can return an arbitrary number of new InputSources for a given document.
INSTANCE - Static variable in class de.l3s.boilerpipe.estimators.SimpleEstimator: Returns the singleton instance of SimpleEstimator
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.ArticleExtractor
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.CanolaExtractor
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.DefaultExtractor
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.KeepEverythingExtractor
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.LargestContentExtractor
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.ArticleMetadataFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.ContentFusion
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.LabelFusion
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.InvertedFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.MarkEverythingContentFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
INSTANCE - Static variable in class de.l3s.boilerpipe.sax.DefaultTagActionMap
INSTANCE_200 - Static variable in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
INSTANCE_EXPAND_TO_SAME_TAGLEVEL - Static variable in class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
INSTANCE_PRE - Static variable in class de.l3s.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
INSTANCE_STRICTLY_NOT_CONTENT - Static variable in class de.l3s.boilerpipe.filters.simple.LabelToBoilerplateFilter
INSTANCE_TEXT - Static variable in class de.l3s.boilerpipe.filters.simple.SurroundingToContentFilter
InvertedFilter - Class in de.l3s.boilerpipe.filters.simple: Reverts the "isContent" flag for all TextBlocks
isContent() - Method in class de.l3s.boilerpipe.document.TextBlock
isLowQuality(TextDocumentStatistics, TextDocumentStatistics) - Method in class de.l3s.boilerpipe.estimators.SimpleEstimator: Given the statistics of the document before and after applying the BoilerpipeExtractor, can we regard the extraction quality (too) low?
isOutputHighlightOnly() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter: If true, only HTML enclosed within highlighted content will be returned

K

KEEP_EVERYTHING_EXTRACTOR - Static variable in class de.l3s.boilerpipe.extractors.CommonExtractors: Dummy Extractor; should return the input text.
KeepEverythingExtractor - Class in de.l3s.boilerpipe.extractors: Marks everything as content.
KeepEverythingWithMinKWordsExtractor - Class in de.l3s.boilerpipe.extractors: A full-text extractor which extracts the largest text component of a page.
KeepEverythingWithMinKWordsExtractor(int) - Constructor for class de.l3s.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
KeepLargestBlockFilter - Class in de.l3s.boilerpipe.filters.heuristics: Keeps the largest TextBlock only (by the number of words).
KeepLargestBlockFilter(boolean) - Constructor for class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
KeepLargestFulltextBlockFilter - Class in de.l3s.boilerpipe.filters.english: Keeps the largest TextBlock only (by the number of words).
KeepLargestFulltextBlockFilter() - Constructor for class de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter

L

LabelAction - Class in de.l3s.boilerpipe.labels: Helps adding labels to TextBlocks.
LabelAction(String...) - Constructor for class de.l3s.boilerpipe.labels.LabelAction
LabelFusion - Class in de.l3s.boilerpipe.filters.heuristics: Fuses adjacent blocks if their labels are equal.
LabelFusion(String) - Constructor for class de.l3s.boilerpipe.filters.heuristics.LabelFusion: Creates a new LabelFusion instance.
labels - Variable in class de.l3s.boilerpipe.labels.LabelAction
LabelToBoilerplateFilter - Class in de.l3s.boilerpipe.filters.simple: Marks all blocks that contain a given label as "boilerplate".
LabelToBoilerplateFilter(String...) - Constructor for class de.l3s.boilerpipe.filters.simple.LabelToBoilerplateFilter
LabelToContentFilter - Class in de.l3s.boilerpipe.filters.simple: Marks all blocks that contain a given label as "content".
LabelToContentFilter(String...) - Constructor for class de.l3s.boilerpipe.filters.simple.LabelToContentFilter
LARGEST_CONTENT_EXTRACTOR - Static variable in class de.l3s.boilerpipe.extractors.CommonExtractors: Like DefaultExtractor, but keeps the largest text block only.
LargestContentExtractor - Class in de.l3s.boilerpipe.extractors: A full-text extractor which extracts the largest text component of a page.

M

MarkEverythingContentFilter - Class in de.l3s.boilerpipe.filters.simple: Marks all blocks as content.
MARKUP_PREFIX - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
MarkupTagAction - Class in de.l3s.boilerpipe.sax: Assigns labels for element CSS classes and ids to the corresponding TextBlock.
MarkupTagAction(boolean) - Constructor for class de.l3s.boilerpipe.sax.MarkupTagAction
MAX_DISTANCE_1 - Static variable in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
MAX_DISTANCE_1_CONTENT_ONLY - Static variable in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
MAX_DISTANCE_1_CONTENT_ONLY_SAME_TAGLEVEL - Static variable in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
MAX_DISTANCE_1_SAME_TAGLEVEL - Static variable in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
meetsCondition(TextBlock) - Method in interface de.l3s.boilerpipe.conditions.TextBlockCondition: Returns true iff the given TextBlock tb meets the defined condition.
mergeNext(TextBlock) - Method in class de.l3s.boilerpipe.document.TextBlock
MIGHT_BE_CONTENT - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
MinClauseWordsFilter - Class in de.l3s.boilerpipe.filters.simple: Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5).
MinClauseWordsFilter(int) - Constructor for class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
MinClauseWordsFilter(int, boolean) - Constructor for class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
MinFulltextWordsFilter - Class in de.l3s.boilerpipe.filters.english: Keeps only those content blocks which contain at least k full-text words (measured by HeuristicFilterBase.getNumFullTextWords(TextBlock)). k is 30 by default.
MinFulltextWordsFilter(int) - Constructor for class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
MinWordsFilter - Class in de.l3s.boilerpipe.filters.simple: Keeps only those content blocks which contain at least k words.
MinWordsFilter(int) - Constructor for class de.l3s.boilerpipe.filters.simple.MinWordsFilter

N

newExtractingInstance() - Static method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enclosed markup.
newHighlightingInstance() - Static method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Creates a new HTMLHighlighter, which is set-up to return the full HTML text, with the extracted text portion highlighted.
NumWordsRulesClassifier - Class in de.l3s.boilerpipe.filters.english: Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.
NumWordsRulesClassifier() - Constructor for class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
NumWordsRulesExtractor - Class in de.l3s.boilerpipe.extractors: A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).
NumWordsRulesExtractor() - Constructor for class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor

P

process(TextDocument) - Method in interface de.l3s.boilerpipe.BoilerpipeFilter: Processes the given document doc.
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ArticleExtractor
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.CanolaExtractor
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.DefaultExtractor
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.KeepEverythingExtractor
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.LargestContentExtractor
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.ArticleMetadataFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.ContentFusion
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.LabelFusion
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.InvertedFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.LabelToBoilerplateFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.LabelToContentFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.MarkEverythingContentFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.MinWordsFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.SurroundingToContentFilter
process(TextDocument, String) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Processes the given TextDocument and the original HTML text (as a String).
process(TextDocument, InputSource) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Processes the given TextDocument and the original HTML text (as an InputSource).
process(URL, BoilerpipeExtractor) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
processingInstruction(String, String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler

R

recycle() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler: Recycles this instance.
removeLabel(String) - Method in class de.l3s.boilerpipe.document.TextBlock

S

setContentHandler(BoilerpipeHTMLContentHandler) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
setContentHandler(ContentHandler) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
setDocumentLocator(Locator) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
setExtraStyleSheet(String) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Sets the extra stylesheet definition that will be inserted in the HEAD element.
setIsContent(boolean) - Method in class de.l3s.boilerpipe.document.TextBlock
setOutputHighlightOnly(boolean) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML document.
setPostHighlight(String) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Sets the string that will be inserted after any highlighted HTML block.
setPreHighlight(String) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter: Sets the string that will be inserted prior to any highlighted HTML block.
setTagAction(String, TagAction) - Method in class de.l3s.boilerpipe.sax.TagActionMap: Sets a particular TagAction for a given tag.
setTagLevel(int) - Method in class de.l3s.boilerpipe.document.TextBlock
setTitle(String) - Method in class de.l3s.boilerpipe.document.TextDocument: Updates the "main" title for this document.
setTitle(String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
SimpleBlockFusionProcessor - Class in de.l3s.boilerpipe.filters.heuristics: Merges two subsequent blocks if their text densities are equal.
SimpleBlockFusionProcessor() - Constructor for class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
SimpleEstimator - Class in de.l3s.boilerpipe.estimators: Estimates the "goodness" of a BoilerpipeExtractor on a given document.
skippedEntity(String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
SplitParagraphBlocksFilter - Class in de.l3s.boilerpipe.filters.simple: Splits TextBlocks at paragraph boundaries.
SplitParagraphBlocksFilter() - Constructor for class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.Chained
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.MarkupTagAction
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in interface de.l3s.boilerpipe.sax.TagAction
startDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
startElement(String, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
startPrefixMapping(String, String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
STRICTLY_NOT_CONTENT - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
SurroundingToContentFilter - Class in de.l3s.boilerpipe.filters.simple
SurroundingToContentFilter(TextBlockCondition) - Constructor for class de.l3s.boilerpipe.filters.simple.SurroundingToContentFilter

T

TA_ANCHOR_TEXT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions: Marks this tag as "anchor" (this should usually only be set for the <A> tag).
TA_BLOCK_LEVEL - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions: Explicitly marks this tag a simple "block-level" element, which always generates whitespace
TA_BODY - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions: Marks this tag the body element (this should usually only be set for the <BODY> tag).
TA_FONT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions: Special TagAction for the <FONT> tag, which keeps track of the absolute and relative font size.
TA_IGNORABLE_ELEMENT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions: Marks this tag as "ignorable", i.e. all its inner content is silently skipped.
TA_INLINE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions: Deprecated. Use CommonTagActions.TA_INLINE_WHITESPACE instead
TA_INLINE_NO_WHITESPACE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions: Marks this tag a simple "inline" element, which neither generates whitespace, nor a new block.
TA_INLINE_WHITESPACE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions: Marks this tag a simple "inline" element, which generates whitespace, but no new block.
TagAction - Interface in de.l3s.boilerpipe.sax: Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
TagActionMap - Class in de.l3s.boilerpipe.sax: Base class for definition a set of TagActions that are to be used for the HTML parsing process.
TagActionMap() - Constructor for class de.l3s.boilerpipe.sax.TagActionMap
TerminatingBlocksFinder - Class in de.l3s.boilerpipe.filters.english: Finds blocks which are potentially indicating the end of an article text and marks them with DefaultLabels.INDICATES_END_OF_TEXT.
TerminatingBlocksFinder() - Constructor for class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
TextBlock - Class in de.l3s.boilerpipe.document: Describes a block of text.
TextBlock(String) - Constructor for class de.l3s.boilerpipe.document.TextBlock
TextBlock(String, BitSet, int, int, int, int, int) - Constructor for class de.l3s.boilerpipe.document.TextBlock
TextBlockCondition - Interface in de.l3s.boilerpipe.conditions: Evaluates whether a given TextBlock meets a certain condition.
TextDocument - Class in de.l3s.boilerpipe.document: A text document, consisting of one or more TextBlocks.
TextDocument(List<TextBlock>) - Constructor for class de.l3s.boilerpipe.document.TextDocument: Creates a new TextDocument with given TextBlocks, and no title.
TextDocument(String, List<TextBlock>) - Constructor for class de.l3s.boilerpipe.document.TextDocument: Creates a new TextDocument with given TextBlocks and given title.
TextDocumentStatistics - Class in de.l3s.boilerpipe.document: Provides shallow statistics on a given TextDocument
TextDocumentStatistics(TextDocument, boolean) - Constructor for class de.l3s.boilerpipe.document.TextDocumentStatistics: Computes statistics on a given TextDocument.
TITLE - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
toInputSource() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
toInputSource() - Method in interface de.l3s.boilerpipe.sax.InputSourceable
tokenize(CharSequence) - Static method in class de.l3s.boilerpipe.util.UnicodeTokenizer: Tokenizes the text and returns an array of tokens.
toString() - Method in class de.l3s.boilerpipe.document.TextBlock
toString() - Method in class de.l3s.boilerpipe.labels.LabelAction
toTextDocument() - Method in interface de.l3s.boilerpipe.BoilerpipeDocumentSource
toTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler: Returns a TextDocument containing the extracted TextBlock s.
toTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser: Returns a TextDocument containing the extracted TextBlock s.

U

UnicodeTokenizer - Class in de.l3s.boilerpipe.util: Tokenizes text according to Unicode word boundaries and strips off non-word characters.
UnicodeTokenizer() - Constructor for class de.l3s.boilerpipe.util.UnicodeTokenizer

A B C D E F G H I K L M N P R S T U

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES