回复人: attacker2000(沙漠军刀) ( ) 信誉:100 2002-3-3 19:21:51 得分:0
用SQL语句不行么?
Top
回复人: lzw8077(皮卡丘) ( ) 信誉:100 2002-3-3 19:26:34 得分:0
不同数据库,太麻烦
Top
回复人: windindance(风舞轻扬) ( ) 信誉:100 2002-3-3 19:26:48 得分:0
不用数据库?
很难。
Top
回复人: yalongsoft(解放区来的同志) ( ) 信誉:被封杀 2002-3-3 19:28:16 得分:0
数据库也是用算法实现的啊!低层肯定是!
Top
回复人: liuziran(倒立旋风转的青蛙) ( ) 信誉:100 2002-3-3 19:28:58 得分:0
Hash
Top
回复人: yalongsoft(解放区来的同志) ( ) 信誉:被封杀 2002-3-3 19:29:31 得分:0
难道我们只能用现成的语句,现成的函数,现成的SQL!
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-3 19:31:39 得分:0
liuziran(做轮椅的青蛙)同志:有代码吗?本人另送300分!
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-3 19:33:12 得分:0
搜索引擎的算法?
Top
回复人: delphi_555(常常想起VB) ( ) 信誉:100 2002-3-3 19:36:02 得分:0
关注!强烈关注!
Top
回复人: freecgi(大树) ( ) 信誉:100 2002-3-3 19:42:01 得分:0
我也想知道,很少记录时用record
Top
回复人: tolimit(求学) ( ) 信誉:100 2002-3-3 19:54:59 得分:0
有没有高手能给出稍稍具体点的,能解决很多人的苦恼啊
Top
回复人: BasicUser(临火) ( ) 信誉:100 2002-3-3 19:56:22 得分:0
关注
Top
回复人: liuziran(倒立旋风转的青蛙) ( ) 信誉:100 2002-3-3 19:56:59 得分:0
to ss(捧着诗集的程序员) :
参看
http://www.vclxx.org/DELPHI/D32FREE/BKHASH.ZIP
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-3 23:48:43 得分:0
推
Top
回复人: hamzsy(十二真空间) ( ) 信誉:100 2002-3-4 1:42:59 得分:0
Ado.Locate()
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-4 20:48:21 得分:0
liuziran(做轮椅的青蛙)同志:那个代码没有例子,小弟E文差,不会用!
Top
回复人: delphi_555(常常想起VB) ( ) 信誉:100 2002-3-4 20:59:27 得分:0
upupupup!强烈关注!
Top
回复人: gs571(阿胜) ( ) 信誉:100 2002-3-4 21:38:10 得分:0
用存储过程吧!!
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-4 21:51:38 得分:0
gs571(阿胜):怎么做?
Top
回复人: yalongsoft(解放区来的同志) ( ) 信誉:被封杀 2002-3-4 22:20:12 得分:0
upup
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-4 23:03:51 得分:0
upup
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-4 23:05:37 得分:0
upup
Top
回复人: a12345(唯微) ( ) 信誉:100 2002-3-5 3:46:13 得分:0
用记录文件吧:
先定义一个记录类型:
type MyRecord=Record
name:string[60];
type:string[10];
...
end;
然后定义文件:MyFile:File of MyRecord;
然后对取得的文件记录的name比较,只比较第一个字符,符合就记录,再比较下一个,对最大符合的记录显示出来
大概就只能这样了
Top
回复人: robyman(Rob man) ( ) 信誉:100 2002-3-5 7:58:37 得分:0
不用数据库,这个算法有点难,通过字符串查找,如果量不多.......
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-5 12:48:37 得分:0
upup!
Top
回复人: amiao(海燕) ( ) 信誉:100 2002-3-5 13:21:11 得分:0
同意a12345(唯微) ,如果还想要迅速查出来的话,还要对书名进行排序
然后在用二分法等查找,不过100万条记录挺要命的,建议还是用数据库
把,呵呵!
Top
回复人: ChipHead(满脑袋芯片和程序的人) ( ) 信誉:100 2002-3-5 13:36:15 得分:0
查找相似的名称,我没有什么好办法。
查找完全相同的名称,的确可以采用liuziran(坐轮椅的青蛙) 的建议,用Hash实现。
具体说来就是对每一个名称,用Hash算法得到它的Hash值,排序后保存起来。然后用同样的算法对要查找的名称求Hash值,再根据二分法查找,就能够迅速找到对应的书了。这个算法的效率应该还是不错的,只是不知道Hash算法跟字符串比较相比,哪个更快了。
BTW:这么大型的数据,不用数据库似乎野心太大了。因为这样就等于你自己去实现数据库啊!
Top
回复人: djqx(杜鹃泣血) ( ) 信誉:100 2002-3-5 13:39:30 得分:0
用搜索引擎,类似sohu,sina,netease的那种
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-5 23:02:59 得分:0
UPUP!高手救命啊!
Top
回复人: szchengyu(cy) ( ) 信誉:97 2002-3-6 0:45:13 得分:0
对于数据库来讲,100多万条记录已经算不小的了,
居然还想不用数据库来处理。
即使用结构文件来处理,一样要建索引,光是维护就要了命,
我觉得这是不现实的,
还是用数据库吧。
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-6 12:48:04 得分:0
UPUP!高手救命啊!
Top
回复人: ss(捧着诗集的程序员) ( ) 信誉:被封杀 2002-3-6 12:57:02 得分:0
怎么在一个Dll中用数据库?
Top
回复人: victorchen_2000(微力) ( ) 信誉:100 2002-3-6 15:20:00 得分:0
www.torry.net 上有个 dbase 数据库实现的 delphi 源码。
拿来用就是。
Top
回复人: taoy(苑) ( ) 信誉:100 2002-3-6 15:49:37 得分:0
比如调用ODBC的API来访问数据库
Top
回复人: nylp(混子刘) ( ) 信誉:100 2002-3-6 15:50:40 得分:0
靠,这不是故意刁难大家嘛!不用数据库,那你的数据在哪里存放啊?
杀牛硬是不让用宰牛刀,岂不是...
Top
回复人: borz(习波) ( ) 信誉:100 2002-3-6 16:09:57 得分:0
这个算法确实有点难!!!
Top
回复人: xuejinlong(垃圾) ( ) 信誉:100 2002-3-6 16:52:53 得分:0
呵呵,问问题的人脑袋肯定有问题,有现成的数据库你不用,你非要自己写,为什么?哈哈……有你写算法的时间N个程序都写完了。
Top
回复人: sneerlover(冷笑情人) ( ) 信誉:100 2002-3-6 16:59:36 得分:0
太多的数据库公司费了成千上万人的心血做了几十年的查询和算法优化
得到了如今的oracle ,db2,sql server sYbase
用不着你去努力了
你也没有那能力一个人努力就能做的比这些产品好
Top
回复人: keyz(keyz) ( ) 信誉:100 2002-3-6 20:36:21 得分:0
可以使用这个方式试试:
给每个汉字建立一个文件,比如“数”“据”“库”,分别建立三个文件。事先把含有这个字的书名存在这个文件中。英文需要分词,比如“dephi”“oracle”等。另外建立一个引导文件,统计每个字文件分别有多少条记录。
查找时,把指定字符拆开,从引导文件中找到最少记录条数的字文件。然后打开字文件,逐条对照(或者使用别的优化算法)。
这样,一般情况我觉得范围可以缩小到百分之一。缺点是会建立使用很大的空间。
其实这种非精确查找的情况,使用传统数据库查找,效果一样不是很好,因为没有办法使用索引。
Top
回复人: M_D_NMAE(M_D_NMAE) ( ) 信誉:100 2002-3-6 21:48:22 得分:0
如果数据近似静态,在排序次数有限的情况下:
1。对原始数据排序生成排序后的文件
2。对数据按第一个字符生成索引
3。检索时先检索索引,再检索数据(在每个索引对应偏移相差不大时效果好)
4。每隔一定时间重新排序,更新索引
Top
回复人: Nizvoo(瓦匠泥★程序抗日不归路) ( ) 信誉:98 2002-3-7 9:17:42 得分:0
Good luck!!!!!
金山词霸
Top
回复人: dolphi(李江) ( ) 信誉:97 2002-3-7 10:17:30 得分:0
keyz(keyz) 确实提出了一条比较实际的思路。
我看到前面有很多人都说非用数据库不可,我觉得这是一种误解,也是一种偏见。其实数据库的很多算法都是非常明确的,基本上我们在教科书里都能找到。事实上,在实际应用中,我们会发现很多数据库程序速度都很慢。我的意思不是说这些数据库不好,一方面是因为编程人员没有充分发挥数据库的效率(最简单的例子就是很多人不建索引),另一方面是高级的数据库具有很多例如RollBack、数据关联检查、有效性检查等额外开销。以本命题来说,我个人认为使用文件做数据库,只要算法设计得当,能比任何数据库都快,尤其对于基于ODBC的。起码快100倍。
那么,为什么舍弃方便的数据库不用,非要用文件自己处理呢?我的看法是:本命题数据结构非常简单,只有一个表,没有任何数据关联。因此,用文件并不会比用数据库带来更多的困惑。
具体怎么实现呢?让我们来分析一下:
1M书名,假设平均每个书名长度为20bytes(是个汉字)。那么以索引形式存在的数据文件大小为 20M。索引文件的大小为 4M(1M * sizeof(int))。通过字符串比较,对索引进行排序。
如果是绝对比较查询,用log(2)1M就能够获得检索数据。
对于模糊查询,需要建立一个词汇表。例如以下书名
0、《windows编程详解》
1、《c++ for windows 编程宝典》
这里面,英文单词有window,c++,for。中文有:编、程、详、解、宝、典
分别对他们建立索引,就会有
0 编 0 1
1 程 0 1
2 详 1
3 解 1
4 宝 1
5 典 1
6 windows 0 1
7 c++ 1
8 for 1
按照惯例,估计100万书名里面会出现常用汉字GB2232里面的全部,也就是大概建立2千多个汉字索引,假设每个出现的平均概率是1%,就会产生
2000*1M*1%*sizeof(int) = 20兆 空间。
英文书籍估计也差不多。对于单词索引,如果由数千个文件组成,实际操作的效率会很低(频繁打开关闭文件开销太大),所以应该找一个二级索引。对单词进行排序,形成一个新的索引,原来的数千个索引合并成一个文件。
现在谈谈怎么进行模糊检索。给出书名《windows 程序员宝典》。名字里含有6个单词。通过检索可以得到:
0、windows 出现在 0,1中
1、程 出现在 0,1中
2、序 找不到
3、员 找不到
4、宝 出现在 1中
5、典 出现在 1中
在实际操作中,需要维护一个临时表,这个表就是书名的索引,是一个int的数组。
在搜索到windows时,windows指向的0,1位置分别加 1 。最后结果是;
0位置等于 2;
1位置等于 4;
0的模糊命中率是2/6 = 33%,1的命中率是4/6 = 66%。因此第2本书是最接近的解。对于100万本书,我们可以通过设定命中率范围来获得相应数量的检索结果。
事实上,如果你仔细分析一下MSDN的搜索机制,你就会明白在某些特定条件下,自己制作搜索引擎的意义了。为什么Microsoft自己开发了这么多数据库产品,MSDN却用不上呢?
还有一点,Index文件不要怕大。MSDN98的索引就超过了90兆。
Top
回复人: dolphi(李江) ( ) 信誉:97 2002-3-7 10:31:28 得分:0
强调一下,如果你的图书有分类的话。例如:文艺类、科教类....那么检索速度能成倍增加。
Top
回复人: AndyTse(温柔一刀) ( ) 信誉:100 2002-3-7 10:47:14 得分:0
oracle实现:select * from name like '%delphi6%'
Top
回复人: rutherking(rutherking) ( ) 信誉:100 2002-3-7 11:22:59 得分:0
登陆的数据要先排序,然后做成多级索引,这样查找的时候就会多了。
索引会降低查找所遍历的次数。
Top
回复人: mili_0816(上天下地盖世无敌神勇霹雳英俊侠) ( ) 信誉:100 2002-3-7 12:16:31 得分:0
去 Borland Down 个 InterBAse 的源码,研究一下查询算法!
Top
回复人: delphi_555(常常想起VB) ( ) 信誉:100 2002-3-7 12:37:35 得分:0
{
======================================================================
HashList.pas
------------
Copyright (c) 2000 Barry Kelly
barry_j_kelly@hotmial.com
Do whatever you like with this code, but don't redistribute modified
versions without clearly marking them so.
No warranties, express or implied. Use at your own risk.
======================================================================
A simple string / pointer associative array. Good performance.
======================================================================
Usage:
Creation:
---------
THashList.Create(<hash size>, <compare func>, <hash func>
<hash size> should be the expected size of the filled hash list. Hash
size cannot be changed after the fact. However, hash buckets are
implemented as binary trees so performance should not degrade by
too much for small overflows.
<compare func> should be the comparison function for strings. If nil
is passed, then StrCompare (case sensitive, ordinal value) is used.
This function (declared in this unit) simply calls CompareStr.
<hash func> should be the hash function for a string. If nil is
passed, then StrHash (case sensitive, ordinal value) is used.
This function (declared in this unit) uses a table of 32-bit numbers,
and xors them based on the offset of each character in the string.
It increments the accumulator each step, too. This means that:
* Permutations of a string have different hash values, but not wildly
different.
* No limit on amount of string hashed => long strings may degrade
performance.
For <compare func> and <hash func> the functions TextCompare and
TextHash are also declared. These are case insensitive versions. They
simply adjust to lower case and call the case sensitive versions.
You can, obviously, replace these functions without modifying the
source.
Addition
--------
procedure Add(const s: string; const p);
s is the string, p is interpreted as a pointer. It should, therefore,
be 4 bytes long to avoid garbage.
Property Access
---------------
property Data[const s: string]: Pointer;
You can also add implicitly by using the Data property:
myHashList.Data['MyString'] := Pointer($ABCDEF00);
This property type is default, so above could be
myHashList['MyString'] := Pointer($ABCDEF00);
Note that this is explicitly a pointer. Will return nil
if s isn't in the hash. Will only implicitly add on Set not Get.
Deletion
--------
function Remove(const s: string): Pointer;
procedure RemoveData(const p);
Remove returns the data reference, so it can be freed.
Deletion from hash lists is a strange business. The bucket must
be adjusted so that it is still a valid binary search tree. Therefore,
it is fairly slow. Also, random deletion followed by random insertion
destroys the randomness of the tree, affecting subsequent performance.
Random insertion with random deletion, then random insertion, will
mean the tree will have ~88% the performance of a tree using no
deletion.
However, if your hash list is big enough, you don't need to worry about
this.
Misc
----
function Has(const s: string): Boolean;
Returns whether hash contains string s.
----
function Find(const s: string; var p): Boolean;
Returns true if found, p set to value of data corresponding to s.
P is not set if not found.
----
function FindData(const p; var s: string): Boolean;
'Opposite' of Find: searches for key given a data value; Returns
true if found, s not set if not found. Only first key found is
returned: there may be other keys that have this data.
The first key found is not in any particular order, and is found
using the Iterate method.
----
procedure Iterate(AUserData: Pointer; AIterateFunc: TIterateFunc);
AIterateFunc = function(AUserData: Pointer; const AStr: string;
var APtr: Pointer): Boolean;
AIterateFunc is called for each item in the hash in no particular
order, and will terminate the iteration if the user function
ever returns false. The value of APtr can be adjusted, but *not*
AStr since that would involve destroying the iteration order.
Iterate_FreeObjects is a predefined function that will typecast
every Data to TObject and call the Free method. This is useful
to destroy associated objects in a hash before freeing the hash.
AUserData isn't used by this iterator.
Iterate_Dispose will call Dispose on each data object. AUserData
isn't used.
Iterate_FreeMem will call FreeMem on each data object. AUserData
isn't used.
IterateMethod is similar, but works with a method pointer (closure/
event) rather than a function pointer.
----
property Count: Integer;
This contains the number of items in the hash list.
}
Top
回复人: delphi_555(常常想起VB) ( ) 信誉:100 2002-3-7 12:41:38 得分:0
unit HashList;
interface
uses SysUtils, Classes;
type
EHashList = class(Exception);
type
TCompareFunc = function(const l, r: string): Integer;
THashFunc = function(const s: string): Integer;
{ iterate func returns false to terminate iteration }
TIterateFunc = function(AUserData: Pointer; const AStr: string;
var APtr: Pointer): Boolean;
TIterateMethod = function(AUserData: Pointer; const AStr: string;
var APtr: Pointer): Boolean of object;
PPHashNode = ^PHashNode;
PHashNode = ^THashNode;
THashNode = record
Str: string;
Ptr: Pointer;
Left: PHashNode;
Right: PHashNode;
end;
TNodeIterateFunc = procedure(AUserData: Pointer; ANode: PPHashNode);
PHashArray = ^THashArray;
THashArray = array[0..MaxInt div SizeOf(PHashNode) - 1] of PHashNode;
{
======================================================================
THashList
======================================================================
}
THashList = class
public
constructor Create(AHashSize: Integer; ACompareFunc: TCompareFunc;
AHashFunc: THashFunc);
destructor Destroy; override;
private
FHashFunc: THashFunc;
FCompareFunc: TCompareFunc;
FHashSize: Integer;
FCount: Integer;
FList: PHashArray;
FLeftDelete: Boolean;
{ private methods }
procedure SetHashSize(AHashSize: Integer);
protected
{
helper methods
}
{ FindNode returns a pointer to a pointer to the node with s,
or, if s isn't in the hash, a pointer to the location where the
node will have to be added to be consistent with the structure }
function FindNode(const s: string): PPHashNode;
function IterateNode(ANode: PHashNode; AUserData: Pointer;
AIterateFunc: TIterateFunc): Boolean;
function IterateMethodNode(ANode: PHashNode; AUserData: Pointer;
AIterateMethod: TIterateMethod): Boolean;
// !!! NB: this function iterates NODES NOT DATA !!!
procedure NodeIterate(ANode: PPHashNode; AUserData: Pointer;
AIterateFunc: TNodeIterateFunc);
procedure DeleteNode(var q: PHashNode);
procedure DeleteNodes(var q: PHashNode);
{ !!! NB: AllocNode and FreeNode don't inc / dec the count,
to remove burden from overridden implementations;
Therefore, EVERY time AllocNode / FreeNode is called,
FCount MUST be incremented / decremented to keep Count valid. }
function AllocNode: PHashNode; virtual;
procedure FreeNode(ANode: PHashNode); virtual;
{ property access }
function GetData(const s: string): Pointer;
procedure SetData(const s: string; p: Pointer);
public
{ public methods }
procedure Add(const s: string; const p{: Pointer});
procedure RemoveData(const p{: Pointer});
function Remove(const s: string): Pointer;
procedure Iterate(AUserData: Pointer; AIterateFunc: TIterateFunc);
procedure IterateMethod(AUserData: Pointer; AIterateMethod: TIterateMethod);
function Has(const s: string): Boolean;
function Find(const s: string; var p{: Pointer}): Boolean;
function FindData(const p{: Pointer}; var s: string): Boolean;
procedure Clear;
{ properties }
property Count: Integer read FCount;
property Data[const s: string]: Pointer read GetData write SetData; default;
end;
{ str=case sensitive, text=case insensitive }
function StrHash(const s: string): Integer;
function TextHash(const s: string): Integer;
function StrCompare(const l, r: string): Integer;
function TextCompare(const l, r: string): Integer;
{ iterators }
function Iterate_FreeObjects(AUserData: Pointer; const AStr: string;
var AData: Pointer): Boolean;
function Iterate_Dispose(AUserData: Pointer; const AStr: string;
var AData: Pointer): Boolean;
function Iterate_FreeMem(AUserData: Pointer; const AStr: string;
var AData: Pointer): Boolean;
implementation
function Iterate_FreeObjects(AUserData: Pointer; const AStr: string;
var AData: Pointer): Boolean;
begin
TObject(AData).Free;
AData := nil;
Result := True;
end;
function Iterate_Dispose(AUserData: Pointer; const AStr: string;
var AData: Pointer): Boolean;
begin
Dispose(AData);
AData := nil;
Result := True;
end;
function Iterate_FreeMem(AUserData: Pointer; const AStr: string;
var AData: Pointer): Boolean;
begin
FreeMem(AData);
AData := nil;
Result := True;
end;
Top
回复人: delphi_555(常常想起VB) ( ) 信誉:100 2002-3-7 12:42:45 得分:0
const
Hash_Table: array[Char] of Integer =
(
$4CBF5B63, $2A009AF6, $31A262D3, $65BCFC21, $5FC274A9, $1C483154, $7980CAA8, $694F5B4F,
$5422088F, $7998ACD2, $17B02C1F, $2A2D1A9D, $598AFD15, $06EA8B70, $7602FD34, $6E4A880B,
$35FAD83C, $0B496B2E, $652B53EA, $4C7A1199, $4C45C001, $08720A0C, $2FD0E641, $63DA4547,
$693C7A67, $5490460A, $13470A37, $0F63D115, $7D726D6D, $531D1D28, $53E2B5CC, $23978303,
$09A39F14, $2ADCAD66, $42F07F02, $644C4911, $23BAB55A, $76FC34C4, $2C4A9BD9, $009D313F,
$1A76F640, $11501142, $418EE24E, $02F7698D, $5DB247C5, $33B1C0E0, $38F4C865, $43483FFB,
$71472FEB, $4E7DE19E, $75C3641A, $094B2289, $5096D4B1, $1232317C, $30676B71, $2CF79F37,
$48AEFC17, $4A2B8E7A, $34293467, $230F6405, $6F100C1D, $4683F698, $6882B4FC, $03CC3EF3,
$7B130AC3, $331087D6, $7C158332, $39AE1E01, $67EF9E09, $597F8034, $2740D509, $26690F2F,
$65620BEF, $6C963DB2, $1C57807F, $2BF3407D, $3CF13A76, $1A8F3E50, $5B75FB94, $27B2FFEB,
$71D4AF9B, $4498200E, $3FF85C4B, $0DCCBB79, $017A9162, $596FA0EC, $3D9058A1, $72910127,
$4AADA5C7, $71239EEA, $62FB4696, $700A7EF5, $415B52CD, $67EF1808, $58581C2C, $75AC02E3,
$34F99E74, $56382A46, $2F1D6F63, $301E7AF1, $0F8D2EBA, $63AE13A4, $79AF7638, $572EF51F,
$134F49A0, $13873222, $409606AE, $4FDC9F6D, $3CF3D526, $47DF03C0, $0B5296C5, $3086C7DB,
$108F574C, $5A34267E, $52D63C7A, $553ADC69, $371CF612, $4706585C, $5397ADD0, $52226B17,
$72A47777, $0A94775A, $554940C6, $321121E5, $0F00417E, $6CBA8178, $1E2EEB5D, $0F32CED3,
$15435A24, $19EF94B6, $68144392, $33D95FE1, $27BF676A, $0763EF14, $4CE27F68, $116AE30F,
$0CAAAF50, $403EEE92, $40D674DE, $7B6F865D, $0D6617D6, $59FD1130, $505699F4, $34BF97CB,
$706326FB, $6DEDF4EE, $776904AA, $7CD18559, $73AA02C1, $15D257CC, $08C96B01, $6B27DD07,
$4DFF7127, $099A17CA, $3A9F22F7, $06DF4CD5, $5CAAD82D, $1C4232E8, $0ED3228D, $285CA2C3,
$43DC3DD3, $75D2C726, $6D05FFC2, $531ACCD1, $67B24819, $087D1284, $6425F098, $5598D8FF,
$43E03CFF, $5ED97302, $29A4CB0F, $1D67F54D, $47F40285, $014566A0, $0C4E0525, $4D596FBB,
$7E3C1EAB, $50618B5E, $30BCB4DB, $43CCB649, $634DB771, $52AF9F3C, $0D719031, $5F1D56F7,
$742A92D7, $1350803A, $3C88ED27, $6E30FFC5, $190716DE, $4FE22C58, $5910C1BC, $5B257EB3,
$32B04984, $537DC196, $3DFEA3F3, $3E1EC1C1, $3091D0CA, $7CD57DF4, $1CC5C9C9, $2634D6EF,
$355BF2B0, $0D72BF72, $018D093F, $0681EC3D, $70499535, $00140410, $7B04D854, $55504FAB,
$71063E5B, $442AE9CE, $3BDD4D0B, $0B686F39, $5C341421, $5C7A2EAC, $2BDC1D61, $517ED8E7,
$0691DC88, $7ED3B0AA, $7E929F56, $49C23AB5, $1CC0FD8E, $72F66DC8, $05B3C8ED, $038962A3,
$1DAB7D34, $0E8C8406, $150A3023, $47213EB1, $0D8A017A, $4C493164, $6E0E0AF8, $07BADCDF,
$6789D05F, $1C26D3E2, $491B2F6E, $69796B2D, $7412CFE5, $2AC4E980, $32471385, $69A0379B,
$49AD860B, $7DE6103E, $0FD6CD3B, $56E0B029, $5E8918D1, $640E061C, $48551290, $67C862D7,
$30A14E38, $553FA91A, $1E483987, $5D4EFDA5, $2A848C3E, $02DAF738, $7788381C, $3F844E93
);
function StrHash(const s: string): Integer;
var
i: Integer;
p: PChar;
begin
// comp.compilers
//hash = (hash ^ current_character) + ((hash<<26)+(hash>>6));
// Result := (Result xor Ord(p^)) + ((Result shl 26) + (Result shr 6));
Result := 0;
p := PChar(s);
i := Length(s);
if i > 0 then
repeat
Result := (Result xor Ord(p^)) + ((Result shl 26) + (Result shr 6));
Inc(p);
Dec(i);
until i = 0;
{ |Result| }
Result := Result and $7FFFFFFF;
// orig
// Result := 0;
// p := PChar(s);
//
// i := Length(s);
// if i > 0 then
// repeat
// Result := Result xor Hash_Table[p^];
// Inc(Result);
// Inc(p);
// Dec(i);
// until i = 0;
end;
function TextHash(const s: string): Integer;
begin
Result := StrHash(LowerCase(s));
end;
function StrCompare(const l, r: string): Integer;
begin
Result := CompareStr(l, r);
end;
function TextCompare(const l, r: string): Integer;
begin
Result := CompareText(l, r);
end;
{
======================================================================
THashList
======================================================================
}
constructor THashList.Create(AHashSize: Integer; ACompareFunc: TCompareFunc;
AHashFunc: THashFunc);
begin
SetHashSize(AHashSize);
if not Assigned(AHashFunc) then
FHashFunc := StrHash
else
FHashFunc := AHashFunc;
if not Assigned(ACompareFunc) then
FCompareFunc := StrCompare
else
FCompareFunc := ACompareFunc;
end;
destructor THashList.Destroy;
begin
Clear;
SetHashSize(0);
inherited Destroy;
end;
{
private methods
}
procedure THashList.SetHashSize(AHashSize: Integer);
begin
if FHashSize <> AHashSize then
begin
ReallocMem(FList, AHashSize * SizeOf(FList^[0]));
FillChar(FList^, AHashSize * SizeOf(FList^[0]), 0);
FHashSize := AHashSize;
end;
end;
{
helper methods
}
function THashList.FindNode(const s: string): PPHashNode;
var
i, r: Integer;
ppn: PPHashNode;
begin
{ we start at the node offset by s in the hash list }
i := FHashFunc(s) mod FHashSize;
ppn := @FList^
;
if ppn^ <> nil then
while True do
begin
r := FCompareFunc(s, ppn^^.Str);
{ left, then right, then match }
if r < 0 then
ppn := @ppn^^.Left
else if r > 0 then
ppn := @ppn^^.Right
else
Break;
{ check for empty position after drilling left or right }
if ppn^ = nil then
Break;
end;
Result := ppn;
end;
function THashList.IterateNode(ANode: PHashNode; AUserData: Pointer;
AIterateFunc: TIterateFunc): Boolean;
begin
if ANode <> nil then
begin
Result := AIterateFunc(AUserData, ANode^.Str, ANode^.Ptr);
if not Result then
Exit;
Result := IterateNode(ANode^.Left, AUserData, AIterateFunc);
if not Result then
Exit;
Result := IterateNode(ANode^.Right, AUserData, AIterateFunc);
if not Result then
Exit;
end else
Result := True;
end;
function THashList.IterateMethodNode(ANode: PHashNode; AUserData: Pointer;
AIterateMethod: TIterateMethod): Boolean;
begin
if ANode <> nil then
begin
Result := AIterateMethod(AUserData, ANode^.Str, ANode^.Ptr);
if not Result then
Exit;
Result := IterateMethodNode(ANode^.Left, AUserData, AIterateMethod);
if not Result then
Exit;
Result := IterateMethodNode(ANode^.Right, AUserData, AIterateMethod);
if not Result then
Exit;
end else
Result := True;
end;
procedure THashList.NodeIterate(ANode: PPHashNode; AUserData: Pointer;
AIterateFunc: TNodeIterateFunc);
begin
if ANode^ <> nil then
begin
AIterateFunc(AUserData, ANode);
NodeIterate(@ANode^.Left, AUserData, AIterateFunc);
NodeIterate(@ANode^.Right, AUserData, AIterateFunc);
end;
end;
Top
回复人: fangyifengshouqing(苦与甜到底哪个多一点) ( ) 信誉:100 2002-3-7 14:26:04 得分:0
OK SHANGKS!
Top
回复人: delphi_555(常常想起VB) ( ) 信誉:100 2002-3-7 20:01:54 得分:0
procedure THashList.DeleteNode(var q: PHashNode);
var
t, r, s: PHashNode;
begin
{ we must delete node q without destroying binary tree }
{ Knuth 6.2.2 D (pg 432 Vol 3 2nd ed) }
{ alternating between left / right delete to preserve decent
performance over multiple insertion / deletion }
FLeftDelete := not FLeftDelete;
{ t will be the node we delete }
t := q;
if FLeftDelete then
begin
if t^.Right = nil then
q := t^.Left
else
begin
r := t^.Right;
if r^.Left = nil then
begin
r^.Left := t^.Left;
q := r;
end else
begin
s := r^.Left;
if s^.Left <> nil then
repeat
r := s;
s := r^.Left;
until s^.Left = nil;
{ now, s = symmetric successor of q }
s^.Left := t^.Left;
r^.Left := s^.Right;
s^.Right := t^.Right;
q := s;
end;
end;
end else
begin
if t^.Left = nil then
q := t^.Right
else
begin
r := t^.Left;
if r^.Right = nil then
begin
r^.Right := t^.Right;
q := r;
end else
begin
s := r^.Right;
if s^.Right <> nil then
repeat
r := s;
s := r^.Right;
until s^.Right = nil;
{ now, s = symmetric predecessor of q }
s^.Right := t^.Right;
r^.Right := s^.Left;
s^.Left := t^.Left;
q := s;
end;
end;
end;
{ we decrement before because the tree is already adjusted
=> any exception in FreeNode MUST be ignored.
It's unlikely that FreeNode would raise an exception anyway. }
Dec(FCount);
FreeNode(t);
end;
procedure THashList.DeleteNodes(var q: PHashNode);
begin
{ ? use tail recursion? - Normal recursion is easier to understand;
We're not in a tearing hurry here... }
if q^.Left <> nil then
DeleteNodes(q^.Left);
if q^.Right <> nil then
DeleteNodes(q^.Right);
FreeNode(q);
q := nil;
end;
function THashList.AllocNode: PHashNode;
begin
New(Result);
Result^.Left := nil;
Result^.Right := nil;
end;
procedure THashList.FreeNode(ANode: PHashNode);
begin
Dispose(ANode);
end;
{
property access
}
function THashList.GetData(const s: string): Pointer;
var
ppn: PPHashNode;
begin
ppn := FindNode(s);
if ppn^ <> nil then
Result := ppn^^.Ptr
else
Result := nil;
end;
procedure THashList.SetData(const s: string; p: Pointer);
var
ppn: PPHashNode;
begin
ppn := FindNode(s);
if ppn^ <> nil then
ppn^^.Ptr := p
else
begin
{ add }
ppn^ := AllocNode;
{ we increment after in case of exception }
Inc(FCount);
ppn^^.Str := s;
ppn^^.Ptr := p;
end;
end;
{
public methods
}
procedure THashList.Add(const s: string; const p{: Pointer});
var
ppn: PPHashNode;
begin
ppn := FindNode(s);
{ if reordered from SetData because ppn^ = nil is more common for Add }
if ppn^ = nil then
begin
{ add }
ppn^ := AllocNode;
{ we increment after in case of exception }
Inc(FCount);
ppn^^.Str := s;
ppn^^.Ptr := Pointer(p);
end else
raise EHashList.CreateFmt('Duplicate hash list entry: %s', );
end;
type
PListNode = ^TListNode;
TListNode = record
Next: PListNode;
NodeLoc: PPHashNode;
end;
PDataParam = ^TDataParam;
TDataParam = record
Head: PListNode;
Data: Pointer;
end;
procedure NodeIterate_BuildDataList(AUserData: Pointer; ANode: PPHashNode);
var
dp: PDataParam absolute AUserData;
t: PListNode;
begin
if dp.Data = ANode^^.Ptr then
begin
New(t);
t^.Next := dp.Head;
t^.NodeLoc := ANode;
dp.Head := t;
end;
end;
procedure THashList.RemoveData(const p{: Pointer});
var
dp: TDataParam;
i: Integer;
n, t: PListNode;
begin
dp.Data := Pointer(p);
dp.Head := nil;
for i := 0 to FHashSize - 1 do
NodeIterate(@FList^, @dp, NodeIterate_BuildDataList);
n := dp.Head;
while n <> nil do
begin
DeleteNode(n^.NodeLoc^);
t := n;
n := n^.Next;
Dispose(t);
end;
end;
function THashList.Remove(const s: string): Pointer;
var
ppn: PPHashNode;
begin
ppn := FindNode(s);
if ppn^ <> nil then
begin
Result := ppn^^.Ptr;
DeleteNode(ppn^);
end
else
raise EHashList.CreateFmt('Tried to remove invalid node: %s', );
end;
procedure THashList.IterateMethod(AUserData: Pointer;
AIterateMethod: TIterateMethod);
var
i: Integer;
begin
for i := 0 to FHashSize - 1 do
if not IterateMethodNode(FList^, AUserData, AIterateMethod) then
Break;
end;
procedure THashList.Iterate(AUserData: Pointer; AIterateFunc: TIterateFunc);
var
i: Integer;
begin
for i := 0 to FHashSize - 1 do
if not IterateNode(FList^, AUserData, AIterateFunc) then
Break;
end;
function THashList.Has(const s: string): Boolean;
var
ppn: PPHashNode;
begin
ppn := FindNode(s);
Result := ppn^ <> nil;
end;
function THashList.Find(const s: string; var p{: Pointer}): Boolean;
var
ppn: PPHashNode;
begin
ppn := FindNode(s);
Result := ppn^ <> nil;
if Result then
Pointer(p) := ppn^^.Ptr;
end;
type
PFindDataResult = ^TFindDataResult;
TFindDataResult = record
Found: Boolean;
ValueToFind: Pointer;
Key: string;
end;
function Iterate_FindData(AUserData: Pointer; const AStr: string;
var APtr: Pointer): Boolean;
var
pfdr: PFindDataResult absolute AUserData;
begin
pfdr^.Found := (APtr = pfdr^.ValueToFind);
Result := not pfdr^.Found;
if pfdr^.Found then
pfdr^.Key := AStr;
end;
function THashList.FindData(const p{: Pointer}; var s: string): Boolean;
var
pfdr: PFindDataResult;
begin
New(pfdr);
try
pfdr^.Found := False;
pfdr^.ValueToFind := Pointer(p);
Iterate(pfdr, Iterate_FindData);
Result := pfdr^.Found;
if Result then
s := pfdr^.Key;
finally
Dispose(pfdr);
end;
end;
procedure THashList.Clear;
var
i: Integer;
ppn: PPHashNode;
begin
for i := 0 to FHashSize - 1 do
begin
ppn := @FList^;
if ppn^ <> nil then
DeleteNodes(ppn^);
end;
FCount := 0;
end;
end.
Top
回复人: delphi_555(常常想起VB) ( ) 信誉:100 2002-3-7 20:03:54 得分:0
不好意思,CSDN的帖子不能太长,我要分几段贴这个UNIT!
希望对SS有帮助!
Top
回复人: delphi_555(常常想起VB) ( ) 信誉:100 2002-3-7 21:50:28 得分:0
{$define debug}
unit HashTable;
interface
uses Windows, SysUtils;
const
LeafSize = 256;
BucketSize = 8;
type
TLinkedItem = class
private
Value: DWORD;
Data: DWORD;
Next: TLinkedItem;
constructor Create(FValue,FData: DWORD; FNext: TLinkedItem);
public
destructor Destroy; override;
end;
THashTable = class; // forward
TTraverseProc = procedure (UserData,UserProc: Pointer;
Value,Data: DWORD; var Done: Boolean) of object;
TTreeItem = class
private
Owner: THashTable;
Level: integer;
Filled: integer;
Items: array[0..LeafSize-1] of TObject;
constructor Create(AOwner: THashTable);
function ROR(Value: DWORD): DWORD;
function RORN(Value: DWORD; Level: integer): DWORD;
procedure AddDown(Value,Data,Hash: DWORD);
procedure Delete(Value,Hash: DWORD);
function Find(Value,Hash: DWORD; var Data: DWORD): Boolean;
function Traverse(UserData,UserProc: Pointer; TraverseProc: TTraverseProc): Boolean;
public
destructor Destroy; override;
end;
THashTable = class
private
Root: TTreeItem;
protected
function HashValue(Value: DWORD): DWORD; virtual; abstract;
procedure DestroyItem(var Value,Data: DWORD); virtual; abstract;
function CompareValue(Value1,Value2: DWORD): Boolean; virtual; abstract;
procedure AddDown(Value,Data,Hash: DWORD);
procedure Delete(Value,Hash: DWORD);
function Find(Value,Hash: DWORD; var Data: DWORD): Boolean;
procedure Traverse(UserData,UserProc: Pointer; TraverseProc: TTraverseProc);
public
constructor Create; virtual;
destructor Destroy; override;
end;
TStrHashTraverseProc = procedure (UserData: Pointer; const Value: string;
Data: TObject; var Done: Boolean);
TStrHashTraverseMeth = procedure (UserData: Pointer; const Value: string;
Data: TObject; var Done: Boolean) of object;
TStringHashTable = class(THashTable)
private
FCaseSensitive: Boolean;
FAutoFreeObjects: Boolean;
protected
function HashValue(Value: DWORD): DWORD; override;
procedure DestroyItem(var Value,Data: DWORD); override;
function CompareValue(Value1,Value2: DWORD): Boolean; override;
function HashStr(const S: string): DWORD;
procedure TraverseProc(UserData,UserProc: Pointer;
Value,Data: DWORD; var Done: Boolean);
procedure TraverseMeth(UserData,UserProc: Pointer;
Value,Data: DWORD; var Done: Boolean);
public
constructor Create; override;
procedure Add(const S: string; Data: TObject);
procedure Delete(const S: string);
function Find(const S: string; var Data: TObject): Boolean;
procedure Traverse(UserData: Pointer; UserProc: TStrHashTraverseProc); overload;
procedure Traverse(UserData: Pointer; UserProc: TStrHashTraverseMeth); overload;
property CaseSensitive: Boolean read FCaseSensitive write FCaseSensitive default False;
property AutoFreeObjects: Boolean read FAutoFreeObjects write FAutoFreeObjects default False;
end;
function CalcStrCRC32(const S: string): DWORD;
{$ifdef debug}
type
TLenStat = array[1..BucketSize] of integer;
procedure Stat(ht: THashTable; var MaxLevel, PeakCnt, FillCnt, EmptyCnt: integer;
var LenStat: TLenStat);
{$endif}
implementation
function _NewStrInHeap(const S: string): PString;
begin
if S = '' then Result := NullStr else
begin
New(Result);
Result^ := S;
end;
end;
procedure _DisposeStrInHeap(P: PString);
begin
if (P <> nil) and (P^ <> '') then Dispose(P);
end;
{$ifdef debug}
procedure Stat(ht: THashTable; var MaxLevel, PeakCnt, FillCnt, EmptyCnt: integer;
var LenStat: TLenStat);
procedure TreeStat(ht: TTreeItem);
var j,i: integer;
LinkedItem: TLinkedItem;
begin
Inc(PeakCnt);
if ht.Level+1 > MaxLevel then
MaxLevel:=ht.Level+1;
for j:=0 to LeafSize-1 do
if ht.Items[j] <> nil then begin
Inc(FillCnt);
if ht.Items[j] is TTreeItem then begin
TreeStat(TTreeItem(ht.Items[j]));
end else begin
i:=0;
LinkedItem:=TLinkedItem(ht.Items[j]);
while LinkedItem <> nil do begin
Inc(i);
LinkedItem:=LinkedItem.Next;
end;
LenStat:=LenStat+1;
end;
end else
Inc(EmptyCnt);
end;
begin
if ht.Root <> nil then
TreeStat(ht.Root);
end;
{$endif}
{ TTreeItem }
procedure TTreeItem.AddDown(Value, Data, Hash: DWORD);
var i,j: integer;
TreeItem: TTreeItem;
LinkedItem: TLinkedItem;
begin
i:=Hash and $FF;
if Items = nil then begin
Items:=TLinkedItem.Create(Value,Data,nil);
Inc(Filled);
end else if Items is TTreeItem then begin
TTreeItem(Items).AddDown(Value,Data,ROR(Hash));
end else begin
j:=0;
LinkedItem:=TLinkedItem(Items);
while LinkedItem <> nil do begin
if Owner.CompareValue(LinkedItem.Value,Value) then begin
// found
LinkedItem.Data:=Data;
Exit;
end;
LinkedItem:=LinkedItem.Next;
Inc(j)
end;
if j >= BucketSize then begin
// full
TreeItem:=TTreeItem.Create(Owner);
TreeItem.Level:=Level+1;
LinkedItem:=TLinkedItem(Items);
while LinkedItem <> nil do begin
TreeItem.AddDown(LinkedItem.Value, LinkedItem.Data,
RORN(Owner.HashValue(LinkedItem.Value), Level+1));
LinkedItem:=LinkedItem.Next;
end;
TreeItem.AddDown(Value,Data,ROR(Hash));
TLinkedItem(Items).Free;
Items:=TreeItem;
end else
Items:=TLinkedItem.Create(Value,Data,TLinkedItem(Items));
end;
end;
Top
回复人: jiangchun_xn(一盒烟) ( ) 信誉:100 2002-3-8 10:29:17 得分:0
我有一个软件单词猎手,能够较快的处理这种字符串库,支持*?搜索,速度挺快的,而却作成了 OPENLOOK .如果需要,我们可以联系。。
Top
回复人: delphi_555(常常想起VB) ( ) 信誉:100 2002-3-8 12:33:20 得分:0
jiangchun_xn(一盒烟)同志:能公开吗?
Top
回复人: delphi_555(常常想起VB) ( ) 信誉:100 2002-3-8 18:21:29 得分:0
jiangchun_xn(一盒烟)同志:能公开吗?怎么联系?
Top
回复人: anamnesis(化神奇为腐朽) ( ) 信誉:100 2002-3-8 22:35:37 得分:0
看看这个
希望对你有点用
正在做相似的东西
其实核心还是要先做分词
下面的是和水木的几个网友讨论的结果
其分词表是有限的
其实还有种不要分词表的分词法
query效率比较低
匹配度也比较差
搜索引擎目标:其基本目的是对数据库内的论文进行检索,论文有中英两种文字,主要为图像处理
数字信号处理,图形学。提供 标题,作者(包括缩写),杂志,时间,摘要,全文,主题词检索
概要设计:
搜索引擎部分:
首先对用户输入作分词
使其粒度比库内文章的某些检索条件的粒度小
不做分词有很多问题
比如用户query “图像处理系统”
则含有“图像系统”或“处理系统”或“图像处理”的就无法找到
会使搜索引擎基本没有使用价值
分词算法设计:
1 英文无分词表的通用分词法
因为英文虚词数目有限,约200个左右
而且每个单词是个独立语义单位
只有词组情况下才和其它词共同构成语义单位
而论文中这种情况很少出现
由于基本各词以空格分开
则去除虚词,可得分完的词序列
2 中英文带分词表的分词法
分词表的词汇应该按层次分组,先匹配尽可能完整的词,如“太平间”,
如果匹配不上,再用它的子词汇,这样就避免了“华人”在“中华人
民共和国”里面。但是“纸里面包不住火”中含“里面”和“面包”就
无法区分,应该予以容忍,或建立一成语词典、歇后语词典,把这这个
词包含进去。但是方案越完整,代价就越高。由于是内部使用建议平衡一下。
特别是做全文检索时
分词表的来源: 专业词库,论文本身的信息
然后向database query
Top
回复人: ztgbasil(野狼) ( ) 信誉:100 2002-3-9 14:48:19 得分:0
可以参阅日本有一家在中国大陆做的语料库来完成,在题目中这个相似如果要做好的话需要分析其中的含义也是否相近,非的用语料库不可