Best common purpose Java HTML parser

Real-life HTML and even XHTML is far away of being well-formed and valid but is quite dirty. Therefore you cannot use the "javax.xml.parsers" package to parse real-life HTML as you would get many exceptions. So I have looked for a good common purpose HTML parser which is still under active development and not being dumped to a source code repository and forgotten years ago. As a result of my investigation I have found the "NekoHTML" (org.cyberneko.*) HTML parser written in Java which is quite good suitable for extracting tag content out of HTML/XHTML documents -- e.g. the title of a HTML/XHTML document.


  1. Anonymous24.4.10

    Check this URL for HTML parser


Post a Comment

Popular posts from this blog

Tuning ext4 for performance with emphasis on SSD usage

NetBeans 6.1: Working with Google´s Android SDK, Groovy and Grails