关于Html分析(200分)

newsweep · 2004-04-30

新接到一个任务，要对Html进行分析，从中取出需要的内容。
如在http://gongqiu.agri.gov.cn/gongqiu/gengduo?type=2中取出各"求购信息"对应的详细信息（点击即可看到）
现用IHTMLDocument2，但不知有没有不用浏览该Html文件即可对其进行分析的方法（因为这样太慢），或者干脆用其它的方法
还有，如果需要登陆一个网站，如以上网站如登录即可看到更多求购信息，在程序中如何设置登录？
有用过的朋友请帮忙，最好给出源代码
newsweep@tom.com，谢谢！

satanmonkey · 2004-04-30

推荐用idhttp把内容拖下来，不拖图片，然后使用正则表达式匹配。

newsweep · 2004-05-11

idhttp我还没用过呢，试试先，能不能给个例子？

newsweep · 2004-05-18

我现用IHTMLDocument2，基本已解决
只是取出如
<table>
<tr>
<td><a href='sohu.com'>sohu</a></td>
<td><a href='sina.com.cn'>sina</a></td>
</tr>
</table>
链接，已取出Table，定义成IHTMLTable2，（连接中的名称，个数等为动态查询出来的）
只是不知如何取TD下的链接了，各位能不能帮忙研究一下？

xuhao1 · 2004-05-18

s:=idttp.get('www.163.com') //s是返回的网页源码

wzboy1984 · 2004-05-18

用字符串操作函数！

wqhatnet · 2004-05-18

uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
StdCtrls, ToolWin, ComCtrls, Buttons, ExtCtrls,jpeg,shellapi,filectrl,Wininet
,URLMon,ShlObj, ActiveX,DDEman,ComObj, CheckLst, Menus;

function Downloadhtml(const aUrl: string): Boolean;
var
i:integer;
list:Tstrings;
hSession: HINTERNET;
hService: HINTERNET;
abc,swfPath,ToFile:String;
lpBuffer: array[0..1024 + 1] of Char;
dwBytesRead: DWORD;
begin
Result := False;
list:=TStringlist.Create;
hSession := InternetOpen('MyApp', INTERNET_OPEN_TYPE_PRECONFIG, nil, nil, 0);
try
if Assigned(hSession) then
begin
hService := InternetOpenUrl(hSession, PChar(aUrl), nil, 0, 0, 0);
if Assigned(hService) then
try while True do
begin
dwBytesRead := 1024;
Application.ProcessMessages;
InternetReadFile(hService, @lpBuffer, 1024, dwBytesRead);
if dwBytesRead = 0 then break;
lpBuffer[dwBytesRead] := #0;
list.Add(lpBuffer);
//List.SaveToFile('c:/abc.txt');
//list.LoadFromFile('c:/abc.txt');
abc:=list.text;
list.text:=StringReplace ( abc,#13#10,'', [rfReplaceAll] );
//list.SaveToFile('c:/ajc.txt');
abc:=list.text;
abc:=StringReplace ( abc,'http',#13#10+'http', [rfReplaceAll] );
abc:=StringReplace ( abc,'href="','href="'+#13#10, [rfReplaceAll] );
abc:=StringReplace ( abc,'src="','src="'+#13#10, [rfReplaceAll] );
list.text:=StringReplace ( abc,'.swf','.swf'+#13#10, [rfReplaceAll] );
list.SaveToFile('c:/eee.txt');
//deletefile('c:/abc.txt');
end;
Result := True;
finally
InternetCloseHandle(hService);
end;
end;
finally
InternetCloseHandle(hSession);
end;

//
该方法能先下载网页内容再将所有的链接内容分离出来
你可以用
for i:=0 to list.count-1 do
begin
if pos('http',list.strings)<>0 then
showMessage(list.strings);//就是你要的地址了
end;

hardware007 · 2004-05-18

字符串操作

newsweep · 2004-05-18

其实mshtml中提供了大量的对象，我自已解决了，
谢谢各位！

关于Html分析(200分)

newsweep

Unregistered / Unconfirmed

satanmonkey

Unregistered / Unconfirmed

newsweep

Unregistered / Unconfirmed

newsweep

Unregistered / Unconfirmed

xuhao1

Unregistered / Unconfirmed

wzboy1984

Unregistered / Unconfirmed

wqhatnet

Unregistered / Unconfirmed

hardware007

Unregistered / Unconfirmed

newsweep

Unregistered / Unconfirmed

Similar threads