|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objecttoxi.data.feeds.util.EntityStripper
public class EntityStripper

Strips HTML entities such as " from a string, replacing them by their Unicode equivalents.
| Field Summary | |
|---|---|
static int |
LONGEST_ENTITY
Longest an entity can be 10, at least in our tables, including the lead & and trail ;. |
static int |
SHORTEST_ENTITY
The shortest an entity can be 4, at least in our tables, including the lead & and trailing ;. |
static char |
UNICODE_NBSP_160_0x0a
unicode nbsp control char, 160, 0x0a. |
| Constructor Summary | |
|---|---|
EntityStripper()
|
|
| Method Summary | |
|---|---|
static char |
bareHTMLEntityToChar(java.lang.String bareEntity,
char howToTranslateNbsp)
convert an entity to a single char. |
static java.lang.String |
flattenHTML(java.lang.String text,
char translateNbspTo)
strips tags and entities from HTML. |
static java.lang.String |
flattenXML(java.lang.String text)
strips tags and entities from XML.. |
static char |
possEntityToChar(java.lang.String possBareEntityWithSemicolon)
Checks a number of gauntlet conditions to ensure this is a valid entity. |
static java.lang.String |
stripHTMLEntities(java.lang.String text,
char translateNbspTo)
Converts HTML to text converting entities such as " back to " and < back to < Ordinary text passes unchanged. |
static java.lang.String |
stripHTMLTags(java.lang.String html)
Removes tags from HTML leaving just the raw text. |
static java.lang.String |
stripXMLEntities(java.lang.String text)
Converts XML to text converting entities such as " back to " and < back to < Ordinary text passes unchanged. |
static java.lang.String |
stripXMLTags(java.lang.String xml)
Removes tags from XML leaving just the raw text. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final char UNICODE_NBSP_160_0x0a
public static final int LONGEST_ENTITY
public static final int SHORTEST_ENTITY
| Constructor Detail |
|---|
public EntityStripper()
| Method Detail |
|---|
public static char bareHTMLEntityToChar(java.lang.String bareEntity,
char howToTranslateNbsp)
bareEntity - String entity to convert convert. must have lead & and trail ;
stripped; may have form: #x12ff or #123 or lt or nbsp style
entity. Works faster if entity in lower case.howToTranslateNbsp - char you would like   translated to, usually ' ' or (char)
160
public static java.lang.String flattenHTML(java.lang.String text,
char translateNbspTo)
text - to flattentranslateNbspTo - char you would like translated to, usually ' ' or
(char) 160 .
public static java.lang.String flattenXML(java.lang.String text)
text - to flatten
public static char possEntityToChar(java.lang.String possBareEntityWithSemicolon)
possBareEntityWithSemicolon - string that may hold an entity. Lead & must be stripped, but
may optionally contain text past the ;
public static java.lang.String stripHTMLEntities(java.lang.String text,
char translateNbspTo)
text - raw text to be processed. Must not be null.translateNbspTo - char you would like translated to, usually ' ' or
(char) 160 .
public static java.lang.String stripHTMLTags(java.lang.String html)
html - input HTML
public static java.lang.String stripXMLEntities(java.lang.String text)
text - raw XML text to be processed. Must not be null.
public static java.lang.String stripXMLTags(java.lang.String xml)
xml - input XML
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||