如何获取动态网页的内容(50)

  • 主题发起人 主题发起人 jinguang
  • 开始时间 开始时间
J

jinguang

Unregistered / Unconfirmed
GUEST, unregistred user!
请问各位如何获取动态网页的内容。用IdHTTP1.Get只能获取源码,不行。建议大家给出先用webbrowser打开,再获取其中内容的一个方法。
 
来自:jinguang, 时间:2009-5-11 [red]5:16:00[/red], ID:3957742楼主好勤快。
 
你说的内容不是源码? 难道是页面上显示的文本?
 
是的,比如表格内显示的数据,比如像金属价格行情,这些数据是动态更新的,在每次用webbrowser打开时会显示不同,如何用程序获得呢?
 
刷新啊...重新获得页面....-_-
 
procedure TGetThread.Execute;var i, k: integer; q1, q2, q3, q4: integer; tmpHtml: WideString; PicUrl: string; tmpStream: TStringStream; //TMemoryStream不能输出成String CPUUsage: Double;begin SearchKeyWordListTH:=THashedStringList.Create; //关键字考虑优先顺序 //SearchKeyWordList.Sorted:=true; //SearchKeyWordList.Duplicates:=dupIgnore; //Sorted=true时起作用 SearchKeyWordListTH.Clear; if (SearchKeyWordsTH<>'') then ExtractStrings([';'], [' '], PChar(SearchKeyWordsTH), SearchKeyWordListTH); // FindLinkListTH:=THashedStringList.Create; FindLinkListTH.Sorted:=true; FindLinkListTH.Duplicates:=dupIgnore; //Sorted=true时起作用 FindPicListTH:=THashedStringList.Create; FindPicListTH.Sorted:=true; FindPicListTH.Duplicates:=dupIgnore; //Sorted=true时起作用 SucceedDownloadPicListTH:=THashedStringList.Create; SucceedDownloadPicListTH.Sorted:=true; SucceedDownloadPicListTH.Duplicates:=dupIgnore; //Sorted=true时起作用 FailedDownloadPicListTH:=THashedStringList.Create; FailedDownloadPicListTH.Sorted:=true; FailedDownloadPicListTH.Duplicates:=dupIgnore; //Sorted=true时起作用 // tmpHtml:=''; tmpStream:=TStringStream.Create(''); try while not Terminated do begin Sleep(SleepTime); Application.ProcessMessages; CPUUsage:=-1; if not GetCPUUsage(CPUUsage) then continue; if (CPUUsage<>-1) and (CPUUsage>=MaxCPUUsage) then continue; //取得需要处理的链接 WebPageUrl:=''; Synchronize(GetProcessLink); if WebPageUrl='' then continue; StatusStr:='Prepared to deal with WebPage:'+WebPageUrl; //准备处理网页 Synchronize(ShowStatusInfo); //Sleep(SleepTime); Application.ProcessMessages; if Terminated then break; IDP:=TIdHttp.Create(nil); //不每次重新创建,长时间运行后会Get不到网页 IDP.Request.UserAgent:=RequestUserAgent; IDP.Request.CacheControl:='no-cache'; IDP.HandleRedirects:=true; //IDP.Request.AcceptCharSet AcceptEncoding AcceptLanguage //接受任何的,不理会乱码 //IDP.Request.Clear; //不起作用 //IDP.Response.Clear; try //读取网页 tmpHtml:=''; tmpStream.WriteString(''); try IDP.Get(WebPageUrl, tmpStream); tmpStream.Position:=0; //必须要 tmpHtml:=tmpStream.ReadString(tmpStream.Size); //tmpHtml:=IDP.Get(WebPageUrl); //Utf8ToAnsi或AnsiToUtf8无效 tmpHtml:=LowerCase(tmpHtml); //转成小写,以便后续的网页过滤 //if trim(frmMain.memTest.Text)='' then // frmMain.memTest.Text:=tmpHtml; if Terminated then break; if IsAvaiWebPage(tmpHtml, SearchKeyWordListTH) then begin Synchronize(UpdateSucceedVisitLinkList); end else begin Synchronize(UpdateFailedVisitLinkList); StatusStr:='WebPage do not meet the conditions:'+WebPageUrl; //网页不符合条件 Synchronize(ShowStatusInfo); continue; end; except Synchronize(UpdateFailedVisitLinkList); StatusStr:='Failure to get the WebPage:'+WebPageUrl; //读取网页失败 Synchronize(ShowStatusInfo); continue; end; //Sleep(SleepTime); Application.ProcessMessages; if Terminated then break; //解释获得网页所有链接 if (MaxProcessLinkCountTH=0) or (CurProcessLinkCountTH<MaxProcessLinkCountTH) then begin FindLinkListTH.Clear; GetWebPageUrl(utLinkUrl, tmpHtml, WebPageUrl, FindLinkListTH); q1:=FindLinkListTH.Count; if Terminated then break; Synchronize(DelReElemOfFindLinkList); q2:=FindLinkListTH.Count; if Terminated then break; Synchronize(UpdateProcessLinkList); if Terminated then break; StatusStr:='WebPage:'+WebPageUrl+' Found '+IntToStr(q1)+' Links,Effective '+IntToStr(q2)+' Links. '; //找到 ? 个链接, 有效 ? 个链接 Synchronize(ShowStatusInfo); //Sleep(SleepTime); Application.ProcessMessages; end; if Terminated then break; //解释获得网页所有图片 FindPicListTH.Clear; GetWebPageUrl(utPicUrl, tmpHtml, WebPageUrl, FindPicListTH); q1:=FindPicListTH.Count; if Terminated then break; Synchronize(DelReElemOfFindPicList); q2:=FindPicListTH.Count; if Terminated then break; Synchronize(GetSaveFileNamePart); k:=0; SucceedDownloadPicListTH.Clear; FailedDownloadPicListTH.Clear; if not SaveFile(FindPicListTH, SavePathTH, SaveFileNamePrefix, MinFileSizeTH, SucceedDownloadPicListTH, FailedDownloadPicListTH, k) then ; q2:=q2-k; q3:=SucceedDownloadPicListTH.Count; q4:=FailedDownloadPicListTH.Count-k; if Terminated then break; Synchronize(UpdateSucceedDownloadPicList); if Terminated then break; Synchronize(UpdateFailedDownloadPicList); if Terminated then break; StatusStr:='WebPage processing completed:'+WebPageUrl //网页处理完成 +' Found '+IntToStr(q1)+' Pictures,Effective '+IntToStr(q2)+' Pictures,' //找到 ? 张图片, 有效 ? 张图片 +' Download success '+IntToStr(q3)+' Pictures,' //成功下载 ? 张 +' Download failure '+IntToStr(q4)+' Pictures.'; //失败下载 ? 张 Synchronize(ShowStatusInfo); //Sleep(SleepTime); Application.ProcessMessages; finally FreeAndNil(IDP); end; end; //while finally tmpStream.Free; FreeAndNil(SearchKeyWordListTH); FreeAndNil(FindPicListTH); FreeAndNil(SucceedDownloadPicListTH); FreeAndNil(FailedDownloadPicListTH); FreeAndNil(FindLinkListTH); end; Synchronize(NotifyWorkThreadTerminated);end;
 

Similar threads

S
回复
0
查看
3K
SUNSTONE的Delphi笔记
S
S
回复
0
查看
2K
SUNSTONE的Delphi笔记
S
D
回复
0
查看
2K
DelphiTeacher的专栏
D
后退
顶部