|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectinfo.informatica.doc.DocumentFragment
info.informatica.html.HTMLFragment
public class HTMLFragment
HTML fragment, with API methods for parsing and basic manipulations.
This class provides simple and fast parsing capabilities, written with these ideas in mind:
HTMLEventParser instead.getTags method), but is also useful for performance tuning if,
say, one knows that the desired tag will not occur before the first 1000
bytes of the document.
This class provides a fast and small-footprint approach to parsing.
HTMLEventParser| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class info.informatica.doc.DocumentFragment |
|---|
info.informatica.doc.DocumentFragment.FragmentComp, info.informatica.doc.DocumentFragment.NotFragmentComp |
| Constructor Summary | |
|---|---|
HTMLFragment(info.informatica.doc.DocumentFragment fragment)
|
|
HTMLFragment(HTMLTag tag)
|
|
HTMLFragment(String html)
|
|
HTMLFragment(String html,
info.informatica.doc.FragmentPosition pos)
|
|
| Method Summary | |
|---|---|
String |
eraseComments()
Gets a version of this fragment with all the comments erased (substituted by spaces). |
String |
eraseTags()
Gets a version of this fragment where each tag of this document has been replaced by a blank space, including comments. |
static String |
eraseTags(String html)
Replaces each tag of the given HTML text by a blank space. |
info.informatica.doc.FragmentPosition |
findBlockByName(String tagname)
Returns the position of a start-end tag and the enclosed fragment (this is called a block). |
info.informatica.doc.FragmentPosition |
findTagByName(String tagname)
Gets the position of a tag of a given type. |
info.informatica.doc.FragmentPosition |
findTagByName(String tagname,
int inipos)
Gets the position of a tag of a given type, starting to search for it at a given place. |
TagIterator |
getAllTags()
|
CharData |
getCharData(info.informatica.doc.FragmentPosition pos)
Gets the character data at the given position. |
TagFinder |
getFinder()
Gets a finder of all tags. |
IdTagFinder |
getIdFinder(String tagid)
Gets a finder of tags of the given ID. |
NameTagFinder |
getNameFinder(String tagname)
Gets a finder of tags of the given type. |
HTMLTag |
getTag(info.informatica.doc.FragmentPosition pos)
Gets a tag by its position in the document. |
HTMLTag |
getTagBlockByName(String tagname,
int inipos)
Gets the tag block consisting of the tag named tagname and
the enclosed character data. |
HTMLTag |
getTagById(String tagid)
Convenience method that gets the tag of ID tagid. |
HTMLTag |
getTagByName(String tagname)
Convenience method that gets the tag of type taname. |
CharData |
getTagDataById(String tagid)
Gets the Character Data enclosed by given tag of ID tagid. |
CharData |
getTagDataByName(String tagname)
Gets the Character Data enclosed by given tag of name tagname. |
CharData |
getTagDataByName(String tagname,
int inipos)
Gets the Character Data enclosed by given tag of name tagname that starts at position inipos. |
TagParser |
getTagParser()
Gets the tag parser that will be used to parse this fragment's tags. |
TagIterator |
getTagsById(String tagid)
Gets all the tags of type tagname in the document. |
TagIterator |
getTagsByName(String tagname)
Gets all the tags of type tagname in the document. |
void |
insertAfter(info.informatica.doc.FragmentPosition pos,
HTMLFragment newel)
Insert a fragment after the given position. |
void |
insertAfter(info.informatica.doc.FragmentPosition pos,
String newstr)
Insert a String after the given position. |
void |
insertBefore(info.informatica.doc.FragmentPosition pos,
info.informatica.doc.DocumentFragment newel)
Insert an element before the given position. |
void |
insertBefore(info.informatica.doc.FragmentPosition pos,
String newstr)
Insert a string before the given position. |
int |
length()
|
void |
remove(info.informatica.doc.FragmentPosition pos)
Removes an HTML fragment. |
void |
removeBlock(HTMLTag tag)
Removes a tag and all the enclosed fragments, if any. |
void |
removePair(HTMLTag tag)
Removes both the start and end tag (if any). |
void |
replace(info.informatica.doc.FragmentPosition pos,
HTMLFragment newel)
Replaces a subfragment with a new one. |
void |
replace(info.informatica.doc.FragmentPosition pos,
String newstr)
Replaces a subfragment with a string. |
void |
setTagParser(TagParser tagParser)
Sets the tag parser that will be used to parse this fragment's tags. |
String |
toPureText()
Gets a text version of the fragment, obtained after erasing all comments and all tags. |
static String |
toPureText(String s)
Gets the plain text version of a String containing HTML. |
String |
toString()
|
void |
update(info.informatica.doc.DocumentFragment e)
Updates the given subfragment in the document. |
| Methods inherited from class info.informatica.doc.DocumentFragment |
|---|
adjustWidth, compareTo, getCurrentPosition, getPosition, setPosition |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public HTMLFragment(String html)
public HTMLFragment(String html,
info.informatica.doc.FragmentPosition pos)
public HTMLFragment(info.informatica.doc.DocumentFragment fragment)
public HTMLFragment(HTMLTag tag)
| Method Detail |
|---|
public void remove(info.informatica.doc.FragmentPosition pos)
throws HTMLDocumentException
HTMLParsingException - if removal could not be done.
HTMLDocumentException
public void replace(info.informatica.doc.FragmentPosition pos,
HTMLFragment newel)
throws HTMLParsingException
newel - the new subfragment which replaces the old one.pos - the position where the old fragment is.
HTMLParsingException - if replacement was not successful.
public void replace(info.informatica.doc.FragmentPosition pos,
String newstr)
throws HTMLParsingException
newstr - the String which replaces the subfragment.pos - the position where the old fragment is.
HTMLParsingException - if replacement was not successful.public void update(info.informatica.doc.DocumentFragment e)
Should be called immediately each time an element is modified, if you want to keep the consistency of the document.
e - the element to update.
public void removePair(HTMLTag tag)
throws HTMLDocumentException
tag - the tag to be removed.
HTMLDocumentException
public void removeBlock(HTMLTag tag)
throws info.informatica.doc.DocumentException
tag - the tag to be removed.
info.informatica.doc.DocumentExceptionpublic info.informatica.doc.FragmentPosition findBlockByName(String tagname)
tagname - name of the tag.
public void insertBefore(info.informatica.doc.FragmentPosition pos,
info.informatica.doc.DocumentFragment newel)
throws HTMLParsingException
pos - the position before which the element must be inserted.newel - the element to be inserted.
HTMLParsingException - if cannot insert at given position.
public void insertBefore(info.informatica.doc.FragmentPosition pos,
String newstr)
throws HTMLParsingException
pos - the position before which the string must be inserted.newstr - the string to be inserted.
HTMLParsingException - if cannot insert at given position.
public void insertAfter(info.informatica.doc.FragmentPosition pos,
HTMLFragment newel)
throws HTMLParsingException
pos - the position after which the element must be inserted.newel - the element to be inserted.
HTMLParsingException - if cannot insert at given position.
public void insertAfter(info.informatica.doc.FragmentPosition pos,
String newstr)
throws HTMLParsingException
pos - the position after which the string must be inserted.newstr - the string to be inserted.
HTMLParsingException - if cannot insert at given position.public String eraseTags()
This fragment remains unaltered, just an erased version is returned.
public static String eraseTags(String html)
Does not do the same with comments.
html - the HTML to be processed.
public String eraseComments()
The size of the returned fragment String is preserved, the comments are just filled with spaces.
public static String toPureText(String s)
s - the string containing the HTML.
public String toPureText()
public CharData getCharData(info.informatica.doc.FragmentPosition pos)
pos - the position.
public HTMLTag getTagBlockByName(String tagname,
int inipos)
tagname and
the enclosed character data. Starts the search at position
inipos.
Be careful using blocks for tags with optional end tag. You may get a block enclosed by the current start tag and the end tag of ANOTHER tag. In principle, use Tag Blocks only when you have in advance some information about the document tag layout.
tagname - Tag name.inipos - position to start search.
public CharData getTagDataByName(String tagname)
tagname.
Be careful using Tag Data for tags with optional end tag.
tagname - Tag name.
public CharData getTagDataByName(String tagname,
int inipos)
tagname that starts at position inipos.
Be careful using Tag Data for tags with optional end tag.
tagname - Tag name.inipos - position to start search.
public CharData getTagDataById(String tagid)
tagid.
Be careful using Tag Data for tags with optional end tag.
tagid - Tag ID.
public HTMLTag getTagById(String tagid)
tagid.
tagid - Tag ID.
public HTMLTag getTagByName(String tagname)
taname.
tagname - Tag name.
public final TagParser getTagParser()
public void setTagParser(TagParser tagParser)
tagParser - the tag parser.
public HTMLTag getTag(info.informatica.doc.FragmentPosition pos)
throws TagParsingException
pos - the position of the Tag
TagParsingExceptionpublic TagIterator getTagsByName(String tagname)
tagname in the document.
tagname - the name of the Tags to be retrieved.
public TagIterator getTagsById(String tagid)
tagname in the document.
tagid - the ID of the Tags to be retrieved.
public info.informatica.doc.FragmentPosition findTagByName(String tagname)
tagname - the name (type) of the tag.
public info.informatica.doc.FragmentPosition findTagByName(String tagname,
int inipos)
tagname - the name (type) of the tag.inipos - the first place in the document to start searching.
public TagIterator getAllTags()
public TagFinder getFinder()
public NameTagFinder getNameFinder(String tagname)
tagname - the name (type) of the tags to look for.
public IdTagFinder getIdFinder(String tagid)
tagid - the ID of the tags to look for.
public String toString()
toString in class Objectpublic final int length()
length in class info.informatica.doc.DocumentFragment
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||