Free tutorials for Java, Eclipse and Web programming



Follow me on twitter

Reading HTML web pages with Java - Tutorial

Lars Vogel

Version 0.6

22.04.2010

Revision History
Revision 0.103.10.2008Lars Vogel
First Version
Revision 0.201.05.2009Lars Vogel
How to read a webpage with Java
Revision 0.425.07.2009Lars Vogel
Re-worked
Revision 0.521.12.2009Lars Vogel
How to use TinyUrl or Tr.im
Revision 0.622.04.2010Lars Vogel
How to set the proxy via command line

Reading webpages with Java

This article describes how to access HTML webpages via standard Java. The setting of a proxy is also described.


Table of Contents

1. HTML Webpages and Java
2. Proxy
2.1. How to set the proxy in Java code
2.2. How to set the proxy in Java code
3. Examples
3.1. Read web page via Java
3.2. Getting the return code from a webpage
3.3. Content Type / MIME Type
4. Using Http get services
5. Thank you
6. Questions and Discussion
7. Resources

1. HTML Webpages and Java

Java provides API's to access resources over the network, for example to read webpages. The main classes which are used to read web resources is "java.net.URL" and "java.net.HttpURLConnection". URL can be used to define a web resources while "HttpURLConnection" can be used to access the web resource.

Tip

Please note that the Apache Foundation provides a powerful framework to to transmit and receive HTTP messages via the HttpClient . While I certainly would recommend to use this framework this article will only use standard Java libraries to explain the basics.

The examples for this article are contained in project "de.vogella.web.html".