如何从Android的HTML链接获取页面的HTML源代码?

我正在研究一个需要从链接获取网页源代码的应用程序,然后parsing该网页中的html。

你可以给我一些例子,或者从哪里开始写这样一个应用程序的起点?

您可以使用HttpClient执行HTTP GET并检索HTML响应,如下所示:

HttpClient client = new DefaultHttpClient(); HttpGet request = new HttpGet(url); HttpResponse response = client.execute(request); String html = ""; InputStream in = response.getEntity().getContent(); BufferedReader reader = new BufferedReader(new InputStreamReader(in)); StringBuilder str = new StringBuilder(); String line = null; while((line = reader.readLine()) != null) { str.append(line); } in.close(); html = str.toString(); 

我会build议jsoup 。

根据他们的网站:

获取维基百科首页,将其parsing为DOM,然后从新闻部分中select标题列表为元素列表(在线示例):

 Document doc = Jsoup.connect("http://en.wikipedia.org/").get(); Elements newsHeadlines = doc.select("#mp-itn ba"); 

入门:

  1. 下载 jsoup jar核心库
  2. 阅读食谱介绍
  3. 请享用!

玩得开心,保罗

这个问题有点老,但我想我现在应该发布我的答案, DefaultHttpClientHttpGet等被弃用。 给定一个URL,这个函数应该得到并返回HTML。

 public static String getHtml(String url) throws IOException { // Build and set timeout values for the request. URLConnection connection = (new URL(url)).openConnection(); connection.setConnectTimeout(5000); connection.setReadTimeout(5000); connection.connect(); // Read and store the result line by line then return the entire string. InputStream in = connection.getInputStream(); BufferedReader reader = new BufferedReader(new InputStreamReader(in)); StringBuilder html = new StringBuilder(); for (String line; (line = reader.readLine()) != null; ) { html.append(line); } in.close(); return html.toString(); } 
 public class RetrieveSiteData extends AsyncTask<String, Void, String> { @Override protected String doInBackground(String... urls) { StringBuilder builder = new StringBuilder(100000); for (String url : urls) { DefaultHttpClient client = new DefaultHttpClient(); HttpGet httpGet = new HttpGet(url); try { HttpResponse execute = client.execute(httpGet); InputStream content = execute.getEntity().getContent(); BufferedReader buffer = new BufferedReader(new InputStreamReader(content)); String s = ""; while ((s = buffer.readLine()) != null) { builder.append(s); } } catch (Exception e) { e.printStackTrace(); } } return builder.toString(); } @Override protected void onPostExecute(String result) { } } 

如果你看看这里或者这里 ,你会发现你不能直接使用android API,你需要一个外部的库…

如果你需要一个外部的图书馆,你可以select这里的2。

这样称呼

 new RetrieveFeedTask(new OnTaskFinished() { @Override public void onFeedRetrieved(String feeds) { //do whatever you want to do with the feeds } }).execute("http://enterurlhere.com"); 

RetrieveFeedTask.class

 class RetrieveFeedTask extends AsyncTask<String, Void, String> { String HTML_response= ""; OnTaskFinished onOurTaskFinished; public RetrieveFeedTask(OnTaskFinished onTaskFinished) { onOurTaskFinished = onTaskFinished; } @Override protected void onPreExecute() { super.onPreExecute(); } @Override protected String doInBackground(String... urls) { try { URL url = new URL(urls[0]); // enter your url here which to download URLConnection conn = url.openConnection(); // open the stream and put it into BufferedReader BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream())); String inputLine; while ((inputLine = br.readLine()) != null) { // System.out.println(inputLine); HTML_response += inputLine; } br.close(); System.out.println("Done"); } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } return HTML_response; } @Override protected void onPostExecute(String feed) { onOurTaskFinished.onFeedRetrieved(feed); } } 

OnTaskFinished.java

 public interface OnTaskFinished { public void onFeedRetrieved(String feeds); }