Version 0.8
Copyright © 2008 - 2011 Lars Vogel
05.01.2011
| Revision History | ||
|---|---|---|
| Revision 0.1 | 03.10.2008 | Lars Vogel |
| First Version | ||
| Revision 0.2 - 0.8 | 01.05.2009 - 05.01.2011 | Lars Vogel |
| bug fixes and enhancements | ||
Table of Contents
Java provides API's to access resources over the network, for example to read webpages. The main classes which are used to read web resources is "java.net.URL" and "java.net.HttpURLConnection". URL can be used to define a web resources while "HttpURLConnection" can be used to access the web resource.
The Apache Foundation provides a powerful framework to to transmit and receive HTTP messages via the Apache HttpClient.
Create a Java project "de.vogella.web.html". The following code will read a webpage from a sever and print it to the console.
package de.vogella.web.html; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.URL; public class ReadWebPage { public static void main(String[] args) throws IOException { String urltext = "http://www.vogella.de"; URL url = new URL(urltext); BufferedReader in = new BufferedReader(new InputStreamReader(url .openStream())); String inputLine; while ((inputLine = in.readLine()) != null) { // Process each line. System.out.println(inputLine); } in.close(); } }
HTML return codes are standardized codes which a web server returns if a certain situation has occurred. For example the return code "200" means the HTML request is ok and the server will perform the require action, e.g. serving the webpage.
The following code will access web page and print the return code for the HTML access.
The most important HTML return codes are:
Table 1.
| Return Code | Explaination |
|---|---|
| 200 | Ok |
| 301 | Permanent redirect to another webpage |
| 400 | Bad request |
| 404 | Not found |
package de.vogella.web.html; import java.io.IOException; import java.net.HttpURLConnection; import java.net.URL; public class ReadReturnCode { public static void main(String[] args) throws IOException { String urltext = "http://www.vogella.de"; URL url = new URL(urltext); int responseCode = ((HttpURLConnection) url.openConnection()) .getResponseCode(); System.out.println(responseCode); } }
The Internet media type (short MIME) which is also called Content-type define the type of the web resource. The MIME type is a two-part identifier for file formats on the Internet. For html page the content-type is "text/html".
The following code will check for the return code of an URL and will get the content-type (MIME-Typ) for the web resource.
package de.vogella.web.html; import java.io.IOException; import java.net.HttpURLConnection; import java.net.URL; public class ReadMimeType { public static void main(String[] args) throws IOException { String urltext = "http://www.vogella.de"; URL url = new URL(urltext); String contentType = ((HttpURLConnection) url.openConnection()) .getContentType(); System.out.println(contentType); } }
Several websites offer services via Http get calls. For example your can send a get request to "http://tinyurl" or http://tr.im" and receive a short version of the Url you pass as parameter.
The following will demonstrate how to call the get service from "http://TinyUrl" or "http://tr.im" via Java.
Create the Java project "de.vogella.web.get" and create the following classes which will call a getService and return the result.
package de.vogella.web.get; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.URL; public class TinyURL { private static final String tinyUrl = "http://tinyurl.com/api-create.php?url="; public String shorter(String url) throws IOException { String tinyUrlLookup = tinyUrl + url; BufferedReader reader = new BufferedReader(new InputStreamReader(new URL(tinyUrlLookup).openStream())); String tinyUrl = reader.readLine(); return tinyUrl; } }
package de.vogella.web.get; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.URL; public class Trim { private static final String trimUrl = "http://api.tr.im/v1/trim_simple?url="; public String shorter(String url) throws IOException { String tinyUrlLookup = trimUrl + url; BufferedReader reader = new BufferedReader(new InputStreamReader(new URL(tinyUrlLookup).openStream())); String tinyUrl = reader.readLine(); return tinyUrl; } }
And a little test.
package de.vogella.web.get; import java.io.IOException; public class Test {/** * @param args * @throws IOException */public static void main(String[] args) throws IOException { String s = "http://www.vogella.de"; TinyURL tiny = new TinyURL(); System.out.println(tiny.shorter(s)); Trim trim= new Trim (); System.out.println(trim.shorter(s)); } }
You can define a proxy at startup via a start parameter.
java -Dhttp.proxyHost=proxy -Dhttp.proxyPort=8080 JavaProgram
In your code you can set a proxy via System.setProperty. For example if your proxy is called "proxy" and runs on port "8080" the following code will set the proxy.
System.setProperty("http.proxySet", "true"); System.setProperty("http.proxyHost", "proxy"); System.setProperty("http.proxyPort", "8080");
Before posting questions, please see the vogella FAQ. If you have questions or find an error in this article please use the www.vogella.de Google Group. I have created a short list how to create good questions which might also help you.