请教字符串算法,很有意思的问题 (300分)

A

Adnil

Unregistered / Unconfirmed
GUEST, unregistred user!
copy_paste兄,你的理解能力比我的表达能力已经好多了,呵呵,正如你所想象的。
 
C

copy_paste

Unregistered / Unconfirmed
GUEST, unregistred user!
这个变形只有空格、回车、TAB,还有没其它字符为可能的变形字符?
 

太平公主

Unregistered / Unconfirmed
GUEST, unregistred user!
有个问题想请教:有A,B两个表,这两个表有两个字段是有关联的
现在想从A表删除满足下列条件的记录,该如何做。从A里删除
A1=B1,A2=B2(A1,A2是A的两个字段,B1,B2是B的两个字段)的记录
请SQL高手指点!
 
A

Adnil

Unregistered / Unconfirmed
GUEST, unregistred user!
考虑最坏的情况,有可能将单引号变成双引号,或者将双引号变成单引号。
暂时就不能先考虑了,空格、回车、TAB考虑进去就ok了

太平公主,你帮我出分啊?
delete a from b where a.A1=b.B1 and a.A2=b.B2
 
D

doll_paul

Unregistered / Unconfirmed
GUEST, unregistred user!
楼主,看来大家都没有真明白你的问题哦~

建议你看一下程序员这本书,关于GOOGLE技术的文章,也许有帮助!
 
A

Adnil

Unregistered / Unconfirmed
GUEST, unregistred user!
GOOGLE技术。。。杀鸡用牛刀了
 
D

doll_paul

Unregistered / Unconfirmed
GUEST, unregistred user!
呵,意思是一样的!
 
L

LiChaoHui

Unregistered / Unconfirmed
GUEST, unregistred user!
问题很简单,但并不是总能容易的写出算法,
经过长时间的思考和测试我写出了初步的算法(算法很复杂,138行),
使用了状态的处理方法,对要进行查找的文本和关键字均不进行予处理
var
im: Integer;
len: Integer;
s1: String;
s2: String;
begin
im := 1;
s1 := '<font face="Arial" ><small> ' + #13#10 +
'2001 .4. 1</small></font>';
s2 := '<font face="Arial"><small>2001.4.1</small></font>';
len := LcScan(s1, s2, im);
Memo1.Lines.Clear;
Memo1.Lines.Add('Search Text:');
Memo1.Lines.Add(s1);
Memo1.Lines.Add('');
Memo1.Lines.Add('KeyWord:');
Memo1.Lines.Add(s2);
Memo1.Lines.Add('');
Memo1.Lines.Add('Search Result:');
Memo1.Lines.Add('StartPos: ' + IntToStr(im));
Memo1.Lines.Add('MatchLen: ' + IntToStr(len));
Memo1.Lines.Add('FoundTxt: ' + Copy(s1, im, len))

我的运行结果:
sample1:
Search Text:
ad<fa fa = "fd me" df >1234 fasdf fdas

KeyWord:
d<fa fa="fd me" df >1234

Search Result:
StartPos: 2
MatchLen: 36
FoundTxt: d<fa fa = "fd me" df >1234

sample2:
Search Text:
<font face="Arial" ><small>
2001 . 4 . 1 </small></font>

KeyWord:
<font face="Arial"><small>2001.4.1</small></font>

Search Result:
StartPos: 1
MatchLen: 64
FoundTxt: <font face="Arial" ><small>
2001 . 4 . 1 </small></font>

sample3:
Search Text:
<font face="Arial" ><small>
2001 .4. 1</small></font>

KeyWord:
<font face="Arial"><small>2001.4.1</small></font>

Search Result:
StartPos: 1
MatchLen: 80
FoundTxt: <font face="Arial" ><small>
2001 .4. 1</small></font>

算法还不是很完善,希望各位能提供一些特殊的测试数据,以测试我的算法
 
S

shenloqi

Unregistered / Unconfirmed
GUEST, unregistred user!
exe地址:http://loqi.myetang.com/findtext.exe
代码如下:
unit Unit1;

interface

uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs, StdCtrls;

type
TForm1 = class(TForm)
Memo1: TMemo;
Button1: TButton;
Edit1: TEdit;
CheckBox1: TCheckBox;
CheckBox2: TCheckBox;
procedure Button1Click(Sender: TObject);
private
{ Private declarations }
function FindInText(const SubStr, FindStr: string;
const IgnoreCase: Boolean = True;
const IgnoreCRLFSPACE: Boolean = False): Integer;
public
{ Public declarations }
end;

const
IgnoreChar = [#9, #10, #13, #32];

var
Form1: TForm1;

implementation

{$R *.DFM}

procedure TForm1.Button1Click(Sender: TObject);
var
i: Integer;
begin
i := FindInText(Edit1.Text,
Memo1.Lines.Text,
CheckBox1.Checked,
CheckBox2.Checked);
ShowMessage(IntToStr(i) + #13#10 + Copy(Memo1.Lines.Text, i, MaxInt));
end;

function CheckLeadChar(var P: PChar
const Add: Boolean): Boolean;
var
ByteType: TMbcsByteType;
begin
ByteType := StrByteType(P, 0);
Result := ByteType = mbLeadByte;
if Result and Add then
begin
Inc(P);
end;
end;

function IsSingle(const P: PChar): Boolean;
var
ByteType: TMbcsByteType;
begin
ByteType := StrByteType(P, 0);
Result := ByteType = mbSingleByte;
end;

function MyPos(const SubStr, s: string
StartPos: Cardinal = 0;
IgnoreCase: Boolean = False): Integer;
begin
if StartPos = 0 then
begin
if IgnoreCase then
Result := Pos(AnsiUpperCase(SubStr), AnsiUpperCase(s))
else
Result := Pos(SubStr, s)
end
else
begin
if IgnoreCase then
Result := Pos(
AnsiUpperCase(SubStr), AnsiUpperCase(Copy(s, StartPos, MaxInt)))
else
Result := Pos(SubStr, Copy(s, StartPos, MaxInt));
if Result > 0 then Result := (Result + Integer(StartPos)) - 1;
end;
end;

function GetFirstStr(const Str: string): string;
var
P: PChar;
i: Integer;
begin
Result := '';
P := PChar(Str);
i := Integer(P);
if P <> nil then
while P^ <> #0 do
begin
if not (P^ in IgnoreChar {[#32, #13, #10, #9]}) {SPACE,CR,LF,TAB} then
begin
CheckLeadChar(P, True);
Result := Copy(Str, 1, Integer(P) - i + 1);
Break;
end;
Inc(P);
end;
end;

function TForm1.FindInText(const SubStr, FindStr: string;
const IgnoreCase: Boolean;
const IgnoreCRLFSPACE: Boolean): Integer;
function IsSameText(SubStr, FindStr: string;
const StartPos: Integer;
const IgnoreCase: Boolean): Boolean;
var
P: PChar;
PFS: PChar;
//FS:string;
begin
Result := False;
if IgnoreCase then
begin
SubStr := AnsiUpperCase(SubStr);
FindStr := AnsiUpperCase(FindStr);
end;
P := PChar(SubStr);
if P <> nil then //其实不用判断了
begin
//FS := Copy(FindStr, StartPos, MaxInt);
if StartPos > Length(FindStr) then Exit
//越界
PFS := PChar(FindStr);
Inc(PFS, StartPos - 1);
while P^ <> #0 do
begin
if PFS^ <> #0 then
begin
if IsSingle(P) then
begin
if P^ <> PFS^ then
begin
repeat
if PFS^ in IgnoreChar {[#32, #13, #10, #9]} then
begin
Inc(PFS);
end
else
Exit;
if PFS^ = #0 then Exit;
until PFS^ = P^;
end;
Inc(P);
Inc(PFS);
end
else
begin
Assert(CheckLeadChar(P, False));
if (P^ <> PFS^) or ((P + 1)^ <> (PFS + 1)^) then
begin
repeat
if IsSingle(PFS) and (PFS^ in IgnoreChar) then
begin
Inc(PFS);
end
else
Exit;
if PFS^ = #0 then Exit;
until (PFS^ = P^) and ((P + 1)^ = (PFS + 1)^);
end;
Inc(P, 2);
Inc(PFS, 2);
end;
end
else
Exit;
end;
Result := True;
end;
end;
var
iPos: Integer;
FirstStr: string;
begin
Result := 0;
iPos := 0;
if (SubStr = '') or (FindStr = '') then Exit;
if not IgnoreCRLFSPACE then
begin
Result := MyPos(SubStr, FindStr, iPos, IgnoreCase);
Exit;
end
else
begin
FirstStr := GetFirstStr(SubStr);
repeat
iPos := MyPos(FirstStr, FindStr, iPos, IgnoreCase);
if iPos <> 0 then
begin
if IsSameText(SubStr, FindStr, iPos, IgnoreCase) then
begin
Result := iPos;
Exit;
end
else
Inc(iPos, Length(FirstStr));
end;
until iPos = 0;
Result := 0;
end;
end;

end.
 
S

shenloqi

Unregistered / Unconfirmed
GUEST, unregistred user!
对于LiChaoHui的测试用例完全通过:)
ScanStr函数没用到,可删除
刚刚发出是没有贴上代码,现在加上了:)
不好意思,IgnoreCase参数没有处理好,下午修改一下,已经好了。
现在还没有Unicode支持,我的代码改起来比较容易的,下午要工作,我就不改了
 
C

copy_paste

Unregistered / Unconfirmed
GUEST, unregistred user!
const
MAX_CHAR = 256
//这是查度变形字符串最大的长度
SizeInt = SizeOf(Integer);
//下面两个是过滤的字符,现有Tab, 回车,和空格,你可以加入 双引号,单引号。
//这样它就不会比较双、单引号的区别了。
DistortionByte = [9, 10, 13, 32]

DistortionChar = [#9, #10, #13, #32];

type
PByteArr = ^TByteArr;
TByteArr = array [0..MaxInt - 1] of Byte;
PCharArr = ^TCharArr;
TCharArr = array [0..MaxInt - 1] of Char;

PDistortion = ^TDistortion;
TDistortion = record
Position: Integer;
Length: Integer;
end;

const
SizeDist = SizeOf(TDistortion);

// Stream保存的内容是TDistortion格式的数据。
// TextStr是你要查找的源字符串。
// SubStr是变形的种子
// IgnoreCase是否忽略大小写。

procedure FindPos(Stream: TMemoryStream
const TextStr: string;
SubStr: string
IgnoreCase: Boolean = False);

function _Trim(const S: string): string;
var
Src, Dst: PChar;
SrcLen, DstLen: Integer;
begin
SrcLen := Length(S);
SetLength(Result, SrcLen);
if SrcLen = 0 then Exit;
DstLen := 0;
Src := PChar(S);
Dst := PChar(Result);

while SrcLen > 0 do
begin
if not (Src^ in DistortionChar) then
begin
Dst^ := Src^;
Inc(Dst);
Inc(DstLen);
end;
Inc(Src);
Dec(SrcLen);
end;
SetLength(Result, DstLen);
end;

var
Text, Sub: PByte;
Distortion: TDistortion;
Buffer: array [0..MAX_CHAR - 1] of Integer;
I, J, CurrPos, SubLen, TextLen: Integer;
begin
Stream.Clear;
SubStr := _Trim(SubStr);
SubLen := Length(SubStr);
TextLen := Length(TextStr);
if SubLen > TextLen then
Exit;

Sub := @SubStr[1];
Text := @TextStr[1];

if IgnoreCase then
begin
GetMem(Sub, SubLen);
Move(SubStr[1], Sub^, SubLen);
Sub := PByte(StrUpper(PChar(Sub)));
end;

for I := 0 to MAX_CHAR - 1 do
Buffer := SubLen;
for I := 0 to SubLen - 2 do
Buffer[PByteArr(Sub)^] := SubLen - I - 1;

CurrPos := SubLen - 1;
try
while CurrPos < TextLen do
begin
I := CurrPos;
J := SubLen - 1;
while (J >= 0) do
begin
if PByteArr(Text)^ in DistortionByte then
begin
Dec(I);
Continue;
end;
if (PByteArr(Text)^ = PByteArr(Sub)^[J]) or
(IgnoreCase and (UpCase(PCharArr(Text)^) = PCharArr(Sub)^[J])) then
begin
Dec(J);
Dec(I);
end else
break;
end;

if -1 = J then
begin
Distortion.Position := I + 2;
Distortion.Length := CurrPos - I;
Stream.WriteBuffer(Distortion, SizeDist);
end;
if IgnoreCase then
Inc(CurrPos, Buffer[Byte(UpCase(PCharArr(Text)^[CurrPos]))])
else
Inc(CurrPos, Buffer[PByteArr(Text)^[CurrPos]]);
end;
finally
if IgnoreCase then
FreeMem(Sub);
end;
end;

procedure TForm1.Button1Click(Sender: TObject);
const
S1 = '<font face="Arial"><small>2001.4.1</small></font>';
S2 = '<font face="Arial" ><small> ' + #13#10 +
'2001.4.1</small></font>';
S3 = '<font face="Arial" ><small> ' + #13#10 +
'2001. 4. 1</small></font>';
Source = 'other msg' + S1 + 'other msg' + S2 + 'asdfdsf' + S3;
var
Stream: TMemoryStream;
Count: Integer;
P: PDistortion;
begin
Stream := TMemoryStream.Create;
try
FindPos(Stream, Source, S1, True);

Count := Stream.Size div SizeDist
//找到多少个S1的变形
Caption := Format('Found: %d', [Count]);
P := Stream.Memory;
while Count > 0 do
begin
Memo1.Lines.Add(Format('Pos: %d, Len: %d', [P^.Position, P^.Length]));
Memo1.Lines.Add(Copy(Source, P^.Position, P^.Length));
Inc(P);
Dec(Count);
end;
finally
Stream.Free;
end;
end;

算法是BM(Boyer-Moore),你可以到Google找到关于它的C语言的描述,我是将它转成D语法。
 
C

copy_paste

Unregistered / Unconfirmed
GUEST, unregistred user!
这么多人都写了,呵呵,一会试试效率。
 
A

Adnil

Unregistered / Unconfirmed
GUEST, unregistred user!
shenloqi的有些错误
" <td width=50%><font color="#215dc6"><b><img border="0" src="/images/25.gif" width="32" height="32">角色权限管理</b></font></td> "
中查找"角色权限管理</b>"没能找出来

copy_paste兄的我正在测试!
 
A

Adnil

Unregistered / Unconfirmed
GUEST, unregistred user!
copy_paste兄也有些错误

<font face="Arial"><small>大多份 2001.4.1</small>


</font>

中查找"<font face="Arial"><small>大多份 2001.4.1</small></font>"

另外,有一点不好意思我忘了提了,最终给出的pos最好是unicode支持的。
也就是说,xxxPos("字体<p> 字体 <p>", "<p>字体<p>")得出的结果应该是3,而不是ansi
中的5。
 
C

copy_paste

Unregistered / Unconfirmed
GUEST, unregistred user!
const
S1 = '<font face="Arial"><small>大多份2001.4.1</small></font>';
S2 = '<font face="Arial" ><small> ' + #13#10 +
'大多份2001.4.1</small></font>';
S3 = '<font face="Arial" ><small> ' + #13#10 +
'大多份 2001. 4. 1</small>' + #13#10#13#10 + '</font>';
Source = 'other msg' + S1 + 'other msg' + S2 + 'asdfdsf' + S3;

我这边好像没有错啊,我看看你的const定义。

unicode的不知改的麻烦不。。。
 
A

Adnil

Unregistered / Unconfirmed
GUEST, unregistred user!
我把你的代码改了一下,用了一个memo放全文,一个Edit放关键字串,出来这样的结果。
 
C

copy_paste

Unregistered / Unconfirmed
GUEST, unregistred user!
你直接LoadFromFile(fileName),好像拷贝过来会有问题,我也说不清。

unicode不容易改,你先用成功再说吧。
 
A

Adnil

Unregistered / Unconfirmed
GUEST, unregistred user!
unicode暂时不考虑了,我通过Length(WideString(Copy(sKeyword, 1, iFindPos)))来取
就是了
 
C

copy_paste

Unregistered / Unconfirmed
GUEST, unregistred user!
有什么问题没,我是懒得调试。
[:D][:D][:D][:D][:D][:D]
 
L

LiChaoHui

Unregistered / Unconfirmed
GUEST, unregistred user!
我的代码经过部分修正,通过了上面的两个测试用例
sample1:
Search Text:
<td width=50%><font color="#215dc6"><b><img border="0" src="/images/25.gif" width="32" height="32">角色权限管理</b></font></td>


KeyWord:
角色权限管理</b>

Search Result:
StartPos: 110
MatchLen: 16
FoundTxt: 角色权限管理</b>


sample2:
Search Text:

<font face="Arial"><small>大多份 2001.4.1</small>


</font>


KeyWord:
<font face="Arial"><small>大多份 2001.4.1</small></font>

Search Result:
StartPos: 3
MatchLen: 62
FoundTxt: <font face="Arial"><small>大多份 2001.4.1</small>


</font>

 
顶部