Re: HTML Parsers

From: Mike Schrag (mschra..dimension.com)
Date: Tue Mar 24 2009 - 09:32:44 EDT

  • Next message: Mike Schrag: "Re: HTML Parsers"

    > I am curious, what parser is used by you all for parsing HTML
    > templates in WOLips?
    The parser in WOLips is a VERRRYY modified FuzzyXMLParser, modified to
    be incredibly tolerant of terrible html (tolerant in that it will give
    a reasonable parse tree back even when your html is nasty). I looked
    at a lot of options before going w/ this one, but we also had some
    very particular requirements that had to be met, so I needed something
    that was pretty close out of the box, but that looked like it could be
    molded to exactly what we needed. In particular, we have to support
    craziness like <tr class="<webobject name ="whatever"></webobject>">
    and actually return a parse tree back that makes any sort of sense. I
    haven't tried, but I suspect that code is actually pretty independent
    if you wanted to try to pull it out.

    ms



    This archive was generated by hypermail 2.0.0 : Tue Mar 24 2009 - 09:33:57 EDT