核心提示:.net的http来爬数据还是相对简单的,下面介绍一下http的post请求。一、代码public static string HttpPost(string formUrl, string form...
.net的http来爬数据还是相对简单的,下面介绍一下http的post请求。
一、代码
public static string HttpPost(string formUrl, string formData) { try { //注意提交的编码 这边是需要改变的 这边默认的是Default:系统当前编码 byte[] postData = System.Text.Encoding.UTF8.GetBytes(formData); // 设置提交的相关参数 HttpWebRequest request = WebRequest.Create(formUrl) as HttpWebRequest; Encoding myEncoding = Encoding.UTF8; request.Method = "POST"; request.KeepAlive = false; request.AllowAutoRedirect = true; request.ContentType = "application/x-www-form-urlencoded"; request.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"; request.ContentLength = postData.Length; // 提交请求数据 System.IO.Stream outputStream = request.GetRequestStream(); outputStream.Write(postData, 0, postData.Length); outputStream.Close(); HttpWebResponse response; Stream responseStream; StreamReader reader; string srcString; response = request.GetResponse() as HttpWebResponse; responseStream = response.GetResponseStream(); //reader = new System.IO.StreamReader(responseStream, Encoding.GetEncoding("UTF-8")); reader = new System.IO.StreamReader(responseStream, Encoding.GetEncoding("GB2312")); srcString = reader.ReadToEnd(); string result = srcString; //返回值赋值 reader.Close(); return result; } catch { return "error"; } }
二、上面的方法需要注意的:
1) 调用方法HttpPost(string formUrl, string formData)分别是请求的URL和post请求的body参数数据
2)reader = new System.IO.StreamReader(responseStream, Encoding.GetEncoding("GB2312"));看看请求返回的数据是什么编码格式的,
可以对应改为GB2312或者UTF-8。
三、返回的数据可以进行解析
1)一般根据p或者table的id来对应获取想要的数据
2)如果获取的p里面嵌套很多p或者table时,也可以对应补充结束符或/table>
例如你获取到的数据如下:
<p id="test2">test2</p> <p id="test3"> </p>你只想要的数据只是
<p id="test">test <p id="test2">test2</p>这时你需要自动补充一下
的结束符
,要不然会影响到你页面数据的布局