jsoup is a Java library that makes it easy to work with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, ...
Source code from http://www.jamesmolloy.co.uk/tutorial_html/index.html with improved build system and some simplifications. Behaviour is very close to the tutorial so ...