怎么样把htm格式的文件中的内容读到TXT中(200分)

H

halei

Unregistered / Unconfirmed
GUEST, unregistred user!
我有20个htm文件,我要把他们中的内容导入进数据库.应该怎样做?
 
试试用WebBrowser操作。打开htm,然后读出源码。
对源码再进行操作修改,保存为txt。

“用WebBrowser读出网页的源码”以前有很多帖子,你可以搜一搜。
 
htm中的内容:
日期 时间 费用 合计
20020811 200004 0.5 0.5
20020811 200005 0.5 0.5
20020811 200005 0.5 0.5
20020812 200005 0.5 0.5
20020813 200007 0.5 0.5
20020814 200005 0.5 0.5
20020815 200008 0.5 0.5
20020816 200000 0.5 0.5
........ ...... ... ...


我想把内容中的数据读到TXT文件中,请帮帮忙!

 
给出具体的HTML文档,用 DHTML DOM 应该很好处理的。
 

webbrowser1.Navigate('a.htm');

然后在WebBrowser1在NavigateComplete2事件中写
webbrowser1.ExecWB(OLECMDID_SELECTALL,0);
webbrowser1.ExecWB(OLECMDID_COPY,0);
richedit1.PasteFromClipboard;
RichEdit1.RichEdit1.Lines.SaveToFile('a.txt');
 
zw84611:

我照做了,但是运行时弹出"试图吊销一个未注册的拖放目标"


procedure TForm1.Button1Click(Sender: TObject);
begin
webbrowser1.Navigate('d:/200/421225-5.htm');
end;

procedure TForm1.WebBrowser1BeforeNavigate2(Sender: TObject;
const pDisp: IDispatch; var URL, Flags, TargetFrameName, PostData,
Headers: OleVariant; var Cancel: WordBool);
begin
webbrowser1.ExecWB(OLECMDID_SELECTALL,0);
webbrowser1.ExecWB(OLECMDID_COPY,0);
richedit1.PasteFromClipboard;
RichEdit1.Lines.SaveToFile('d:/200/a.txt');

end;

end.
 
uses ActiveX;
加了吗?
 
事先CoInitialize
事后UnCoInitialize
 
unit Unit1;

interface

uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
StdCtrls, ComCtrls, OleCtrls, SHDocVw, Activex;

type
TForm1 = class(TForm)
RichEdit1: TRichEdit;
WebBrowser1: TWebBrowser;
Button1: TButton;
procedure FormCreate(Sender: TObject);
procedure Button1Click(Sender: TObject);
procedure WebBrowser1NavigateComplete2(Sender: TObject;
const pDisp: IDispatch; var URL: OleVariant);
private
{ Private declarations }
public
{ Public declarations }
end;

var
Form1: TForm1;

implementation

{$R *.DFM}

procedure TForm1.FormCreate(Sender: TObject);
begin
///RichEdit1.Lines.SaveToFile('a.txt');
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
WebBrowser1.Navigate('c:/序 言.htm');
end;

procedure TForm1.WebBrowser1NavigateComplete2(Sender: TObject;
const pDisp: IDispatch; var URL: OleVariant);
begin

RichEdit1.PlainText := true;
webbrowser1.ExecWB(OLECMDID_SELECTALL,0);
webbrowser1.ExecWB(OLECMDID_COPY,0);
richedit1.PasteFromClipboard;
RichEdit1.Lines.SaveToFile('c:/ab.txt');

end;
//注意下面
initialization
OleInitialize(nil);

finalization
OleUninitialize;

end.
 
用 DHTML DOM 读每项数据
 
我以加上
initialization
OleInitialize(nil);

finalization
OleUninitialize;

但还是运行到 webbrowser1.ExecWB(OLECMDID_COPY,0);
就弹出"试图吊销一个未注册的拖放目标"
 
把你的网页寄给我:zw84611@sina.com
 
uses
mshtml;
{WebBrowser1: TWebBrowser;}

webbrowser1.Navigate(网页);
Memo1.Lines.Add(IHtmlDocument2(WebBrowser1.Document).Body.OuterText);
 
是我搞错了.现在可以了.

但是输出到TXT中的内容,格式很乱.能不能和网页显示的一样.因为我要导入数据库.
 
要么自己用程序分析文件,然后重新输出。
 
其实影 子的方法更好。
 
我用影子的方法:
unit Unit1;

interface

uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs, StdCtrls, OleCtrls, SHDocVw, ComCtrls, ActiveX, mshtml;

type
TForm1 = class(TForm)
WebBrowser1: TWebBrowser;
Button1: TButton;
RichEdit1: TRichEdit;
Memo1: TMemo;
procedure Button1Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;

var
Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.Button1Click(Sender: TObject);
begin
webbrowser1.Navigate('d:/200/a.htm');
Memo1.Lines.Add(IHtmlDocument2(WebBrowser1.Document).Body.OuterText);
end;
end.

运行时出错"Access violation at address 0045e05b in module 'project1.exe'.read of adress 0000000"
 
顶部