如何提取网络中网站的公司资料(200分)

  • 主题发起人 主题发起人 kequan
  • 开始时间 开始时间
K

kequan

Unregistered / Unconfirmed
GUEST, unregistred user!
从GOOGLE查找出来的网站后想从网站中提取出公司名称,电话,联系人,地址等相关资料保存到数据中.如何实现.
 
用IHTML接口获取网页代码InnerText,然后搜索字符串,提取出你要的信息,入库

关键是你的搜索算法,因为网页内容太多,肯定有些无用的信息
 
用正则表达式,用正则表达式的前提是网页是具有固定格式的,网站一改版就不能用了,得重新生成正则表达式
 
正则表达式是什么啊,看见好多人说它?
这个问题应该是首先下来源文件,然后进行分析,取出需要的内容,不过你的网页太多,结构都不一样,分析算法不容易写啊,实现起来困难较大,呵呵[:)]
 
请问如何用IHTML接口获取网页代码InnerText,是所有搜索出来的网页.能不能详细列出些代码.
 
DIGoogleReader
DIGoogleReader is an advanced plugin for DIHtmlParser to parse Google web search result pages. It contains the TDIGoogleReader component class which can parse Google web search results pages and extract individual results. For each result, it fires the OnResult event. The detailed result properties can then be accessed by applications.

DIGoogleReader is fully Unicode enabled and returns results in all languages.

What DIGoogleReader is not
Initially, DIGoogleReader was (and still is) intended as a learning example of how to write advanced plugins for DIHtmlParser. Very soon, however, people found it extremely useful to analyze Google searches. Unfortunately, Google does not like this a great deal, so I encourage you to read Google's license terms before you put DIGoogleReader to practical use, especially commercial.

DIGoogleReader was tested to work fine with many Google web search result pages at the time of writing, which is demonstrated by the example result pages located right next to the demo project. However, there is no guarantee that the parsing algorithm works with all Google result pages, especially since Google may change its page layout without further notice at any time.

In the event that Google does one day introduce a new page layout which breaks the existing DIGoogleReader algorithm, I reserve the right not to adjust DIGoogleReader to those changes right away, maybe even not at all. Remember: DIGoogleReader is first and foremost a demonstration of how to solve complex tasks with DIHtmlParser easily.

Example Project
The screenshot shows the compiled demo project when running. It reads, extracts, and displays search results from a Google search results page previously saved to disk.

The demo's source code is included, as well as a precompiled binary.

Requirements
DIGoogleReader includes full sources for the plugin and demo application. To compile, DIHtmlParser is required for the low level HTML reading and parsing.

DIHtmlParser is available as a separate package on this site, so make sure to download it before you recompile the demo application or write your own.
 
http://www.yunqa.de/delphi/doku.php/products/googlereader/index
 

Similar threads

D
回复
0
查看
1K
DelphiTeacher的专栏
D
D
回复
0
查看
875
DelphiTeacher的专栏
D
D
回复
0
查看
1K
DelphiTeacher的专栏
D
后退
顶部