高分寻求支持中文的快速字符串操作函数(加CSDN中的200分) (300分)

ysai · 2003-02-22

寻求下列函数,速度第一,当然不能出错,要支持中文
判断字符串A在字符串B中出现的次数
f('aabbccaaabbbcccaaaabbbbcccc','aa')=4
取得字符串A中字符串B之前的字符串
f('abc','b')='a'
取得字符串A中字符串B之后的字符串
f('abc','b')='c'
取得字符串A中字符串B和字符串C之间的字符串(不包含字符串B和字符串C)
f('abc','a','c')='c'
获取字符串A中最后一个字符串B后面的字符串值,并且返回这个字符串间隔符B的起始位置
f('abcabc','b',s)=5 s='c'
获取字符串A中最后一个字符串B前面的字符串值,并且返回这个字符串间隔符B的起始位置
f('abcabc','b',s)=5 s='abca'
快速替换函数,类似StringReplace
FastReplace

读取字符串A中字符串B与字符串C之间的的字符到列表(TStrings)中
f('aacabcacc','a','c',aList)=3 aList='a','b','c'
批替换函数:
根据分界符找('a','c')到要替换的字符串并替换成相应的值,利用上面的函数.
aList中的Names值在源字符串中的排列是按顺序的(上面的函数取出的)
aList.Names[0]:='a';
aList.Names[1]:='b';
aList.Names[2]:='c';
aList.Strings[0]:='x';
aList.Strings[1]:='y';
aList.Strings[2]:='z';
f('aacabcacc','a','c',aList)='axcaycazc'

哪位收藏有的话请发给我一份,谢谢
ysai_cn@hotmail.com
CSDN.NET中的200分在
http://expert.csdn.net/Expert/topic/1455/1455425.xml?temp=.7183191

Milpas · 2003-02-22

自己写吧，写好了，贴出来，我给300分

Pc 狂迷 · 2003-02-22

LiChaoHui · 2003-02-22

难道Delphi自己的不够快吗？
用基础函数可以完成的

悲酥清风 · 2003-02-22

CSDN的分什么时候成了流通货币了[

]
开个玩笑

vivee · 2003-02-22

关注！！

doll_paul · 2003-02-22

beta · 2003-02-23

呵呵，好久没见到悲酥清风上来了啊：）

//哪位收藏有的话请发给我一份,谢谢
呵呵，我还是直接贴出来吧，其他富翁恐怕也想要一份：）

我用了 FastPos 作为基础改的，加几个外壳就可以了嘛：
（注意这个 FastPos 也是我改过的，修正了个 bug，如果你有以前版本的请更新为这个）

// 快速字符串查找函数
// This TYPE declaration will become apparent later.
//The first thing to note here is that I’m passing the SourceLength and FindL
//ength. As neither Source nor Find will alter at any point during FastReplace
//, there’s no need to call the LENGTH subroutine each time!
function FastPos(
const aSourceString, aFindString : String;
const aSourceLen, aFindLen, StartPos : Integer
) : Integer;
begin
// Next, we determine how many bytes we need to
// scan to find the "start" of aFindString.
// Remove by SunLujiang
{
SourceLen := aSourceLen;
SourceLen := SourceLen - aFindLen;
if (StartPos-1) > SourceLen then begin
Result := 0;
Exit;
end;
SourceLen := SourceLen - StartPos;
SourceLen := SourceLen +2;
}
// Remove end

// The ASM starts here.
asm
// Delphi uses ESI, EDI, and EBX a lot,
// so we must preserve them.
push ESI
push EDI
push EBX

// Add by SunLujiang
Mov ECX, aSourceLen
Mov EAX, aFindLen
// Add by beta begin
//make sure that aFindLen > 0, or it may cause a exception
Cmp EAX, 0
Jle @Result0
// Add by beta end
Sub ECX, EAX
JL @Result0
Mov EAX, StartPos
Dec EAX
Sub ECX, EAX
JL @Result0
Inc ECX
// Add end

// Get the address of sourceString[1]
// and Add (StartPos-1).
// We do this for the purpose of finding
// the NEXT occurrence, rather than
// always the first!
mov EDI, aSourceString
add EDI, StartPos
Dec EDI
// Get the address of aFindString.
mov ESI, aFindString
// Note how many bytes we need to
// look through in aSourceString
// to find aFindString.

// Remove by SunLujiang
// mov ECX, SourceLen
// Remove end

// Get the first char of aFindString;
// note how it is done outside of the
// main loop, as it never changes!
Mov Al, [ESI]
// Now the FindFirstCharacter loop!
@ScaSB:
// Get the value of the current
// character in aSourceString.
// This is equal to ah := EDI^, that
// is what the [] are around [EDI].
Mov Ah, [EDI]
// Compare this character with aDestString[1].
cmp Ah,Al
// If they're not equal we don't
// compare the strings.
jne @NextChar
// If they're equal, obviously we do!
@CompareStrings:
// Put the length of aFindLen in EBX.
mov EBX, aFindLen
// We DEC EBX to point to the end of
// the string
that is, we don't want to
// add 1 if aFindString is 1 in length!
dec EBX

// add by ShengQuanhu
// If EBX is zero, then we've successfully
// compared each character
i.e. it's A MATCH!
// It will be happened when aFindLen=1
Jz @EndOfMatch
//add end

//Here’s another optimization tip. People at this point usually PUSH ESI and
//so on and then POP ESI and so forth at the end–instead, I opted not to chan
//ge ESI and so on at all. This saves lots of pushing and popping!
@CompareNext:
// Get aFindString character +
// aFindStringLength (the last char).
mov Al, [ESI+EBX]
// Get aSourceString character (current
// position + aFindStringLength).
mov Ah, [EDI+EBX]
// Compare them.
cmp Al, Ah
Jz @Matches
// If they don't match, we put the first char
// of aFindString into Al again to continue
// looking for the first character.
Mov Al, [ESI]
Jmp @NextChar
@Matches:
// If they match, we DEC EBX (point to
// previous character to compare).
Dec EBX
// If EBX <> 0 ("J"ump "N"ot "Z"ero), we
// continue comparing strings.
Jnz @CompareNext

//add by Shengquanhu
@EndOfMatch:
//add end

// If EBX is zero, then we've successfully
// compared each character
i.e. it's A MATCH!
// Move the address of the *current*
// character in EDI.
// Note, we haven't altered EDI since
// the first char was found.
mov EAX, EDI
// This is an address, so subtract the
// address of aSourceString[1] to get
// an actual character position.
sub EAX, aSourceString
// Inc EAX to make it 1-based,
// rather than 0-based.
inc EAX
// Put it into result.
mov Result, EAX
// Finish this routine!
jmp @TheEnd
@NextChar:
//This is where I jump to when I want to continue searching for the first char
//acter of aFindString in aSearchString:
// Point EDI (aFindString[X]) to
// the next character.
Mov Ah, [EDI]//先把第一个字符移到Ah中，后面判断是否中文
Inc EDI
// Dec ECX tells us that we've checked
// another character, and that we're
// fast running out of string to check!
dec ECX
// If EBX <> 0, then continue scanning
// for the first character.

//add by shengquanhu
//if ah is chinese char,jump again
jz @Result0

cmp ah, $80
jb @ScaSB
Inc EDI
Dec ECX
//add by shengquanhu end

jnz @ScaSB

//add by shengquanhu
@Result0:
//add by shengquanhu end

// If EBX = 0, then move 0 into RESULT.
mov Result,0
// Restore EBX, EDI, ESI for Delphi
// to work correctly.
// Note that they're POPped in the
// opposite order they were PUSHed.
@TheEnd:
pop EBX
pop EDI
pop ESI

end;
end;

function MyPos(substr, str: string): Integer;
begin
Result := FastPos(str, substr, Length(str), Length(substr), 1);
end;

// 字符串反向查找，思路是将源字符串和目标字符串都反序，然后查找
function FastPosUp(
const aSourceString, aFindString : String;
const aSourceLen, aFindLen, StartPos : Integer
) : Integer;
var
i, tstart: Integer;
tstr, tsub: string;
begin
SetLength(tstr, aSourceLen);
SetLength(tsub, aFindLen);
for i := 1 to aSourceLen do //reverse the source string
tstr := aSourceString[aSourceLen + 1 - i];
for i := 1 to aFindLen do //reverse the find string
tsub := aFindString[aFindLen + 1 - i];
tstart := aSourceLen + 1 - StartPos;
tstart := FastPos(tstr, tsub, aSourceLen, aFindLen, tstart);
if tstart > 0 then
Result := aSourceLen + 1 - tstart - aFindLen + 1
else
Result := 0;
end;

function MyBackPos(substr, str: string): Integer;
begin
Result := FastPosUp(str, substr, Length(str), Length(substr), Length(str));
end;

// 字符串替换
//replace all the "ss" in "text" to "ds"
procedure StringReplaceAll(var text: string
const ss, ds: string);
var
p: Integer;
begin
p := MyPos(ss, text);
while p > 0 do
begin
Delete(text, p, Length(ss));
Insert(ds, text, p);
p := MyPos(ss, text);
end;
end;

// 出现频率统计
function MyCount(substr, str: string): Integer;
var
Start: Integer;
begin
Result := 0;
Start := 1;
repeat
Start := FastPos(str, substr, Length(str), Length(substr), Start);
if Start > 0 then
begin
Inc(Result);
Inc(Start, Length(substr));
end
else
Break;
until Start = 0;
end;

// 将 pFlag 和 nFlag 分别当做左右括号，查找括号内的内容的起始位置和长度，
// 为后面的去字符串中间部分作准备
//return the range of the matching pFlag and nFlag in aText, start with aStart
procedure GetRange(aText: string
aStart: Integer
pFlag, nFlag: string;
var sStart, sLength: Integer
LevelSearch: Boolean = False);
var
pCount, tStart, tEnd, sLevel: Integer;
ts: string;
begin
sStart := -1;
sLength := 0;
if (Length(aText) > 2) and (aStart < Length(aText) - 1) then
begin
sStart := FastPos(aText, pFlag, Length(aText), Length(pFlag), aStart + 1);
Dec(sStart)
//convert to index
if sStart > -1 then //found the pFlag
begin
tStart := sStart + Length(pFlag);
tEnd := FastPos(aText, nFlag, Length(aText), Length(nFlag), tStart + 1);
Dec(tEnd)
//convert to index
if tEnd > -1 then //found the nFlag
begin
tEnd := tEnd + Length(nFlag);
if LevelSearch then
begin //level search
ts := Copy(aText, tStart + 1, tEnd - tStart + 1);
sLevel := MyCount(pFlag, ts);
while sLevel > 0 do //not in the same level
begin
tStart := tEnd + Length(nFlag);
tEnd := FastPos(aText, nFlag, Length(aText), Length(nFlag), tStart + 1);
Dec(tEnd)
//convert to index
if tEnd > -1 then
begin
Dec(sLevel);
tEnd := tEnd + Length(nFlag);
ts := Copy(aText, tStart + 1, tEnd - tStart + 1);
pCount := MyCount(pFlag, ts);
if pCount > 0 then //more pFlags, so, get into further level
sLevel := sLevel + pCount;
end
else
Exit;
end;
end;
sLength := tEnd - sStart;
end;
end;
end;
end;

// 去字符串 Src 中在 Prefix 之后且在 Postfix 之前的内容(不含)
function StrBetween(Src, Prefix, Postfix: string): string;
var
Start, Len: Integer;
begin
GetRange(Src, 1, Prefix, Postfix, Start, Len, True);
Result := Copy(Src, Start + Length(Prefix), Len - Length(Prefix)
- Length(Postfix));
end;

// 返回 Src 中在字符串 Flag 后面的部分
function StrAfter(Src, Flag: string): string;
var
p: Integer;
begin
p := MyPos(Flag, Src);
Result := Copy(Src, p + Length(Flag), MaxInt);
end;

// 返回 Src 中在字符串 Flag 前面的部分
function StrBefore(Src, Flag: string): string;
var
p: Integer;
begin
p := MyPos(Flag, Src);
Result := Copy(Src, 1, p - Length(Flag));
end;

//获取字符串A中最后一个字符串B后面的字符串值,并且返回这个字符串间隔符B的起始位置
//f('abcabc','b',s)=5 s='c'
function StrAfterEx(Src, Flag: string
var Res: string): Integer;
begin
Result := MyBackPos(Flag, Src);
Res := Copy(Src, Result + Length(Flag), MaxInt);
end;

//获取字符串A中最后一个字符串B前面的字符串值,并且返回这个字符串间隔符B的起始位置
//f('abcabc','b',s)=5 s='abca'
function StrBeforeEx(Src, Flag: string
var Res: string): Integer;
begin
Result := MyBackPos(Flag, Src);
Res := Copy(Src, 1, Result - Length(Flag));
end;

呼，累啊，写的差不多了吧，我直接在这里写的，没有调试哦，自己试一试吧：）

beta · 2003-02-23

由于 FastPos 支持中文，而且很快，所以上面的函数都不慢。
（其中部分代码是大一的时候写的，不敢保证效率[

]）

ysai · 2003-02-23

谢谢!
粗看了一下,除了FastPos之外,与我现在用的相差不大,不知道效率能提高多少,现在没有
DELPHI,不能测试,星期一回公司再试试

beta · 2003-02-23

//不知道效率能提高多少
如果要处理的字符串（文件）比较大（单位 M），则可以提高不少效率。

另外，替换部分还可以提速，去看一下 Another_eYes 的心得，思想是先
把要替换的位置记录下来，最后统一替换。

上面写那些都没有特别考虑效率问题。
对于对效率要求极高的个别函数，可以提出来仔细研究一下。

saoren · 2003-02-23

汇编的头大大。。。

那后两个函数写起来有点麻烦。。。

读取字符串A中字符串B与字符串C之间的的字符到列表(TStrings)中
f('aacabcacc','a','c',aList)=3 aList='a','b','c'
如果是：
F('AddddAfffffC', 'A', 'C')应该是什么了？
是 ddddAfffff还是fffff了？
还有你说的是字符还是字符串到列表中？

最后一个想速度就难度大了。。。

ysai · 2003-02-23

我对汇编一点都不懂

读取字符串A中字符串B与字符串C之间的的字符到列表(TStrings)中
f('aacabcacc','a','c',aList)=3 aList='a','b','c'
f('AddddAfffffC', 'A', 'C',aList)=1 aList='ddddAfffff'
是添加字符串到TStrings列表中

最后一个难度最大,要频繁地用到前面的函数,是建立在前面列出函数的基础上的,
也是我最后想实现的功能,所以前面的函数的效率很重要!

您就是copy_paste吧,谢谢在CSDN上的答复

saoren · 2003-02-23

前两天心血来潮，想换MAIL。。。。现在欲哭无泪啊。。。
[

]

ysai · 2003-02-23

呵,我用的hotmail的帐号能收到确认信,不过早在N月前已经拒收delphibbs的邮件了,
不然我那可怜的2M空间可经不起折腾

我看像您这样的高手,怎么也得弄个版主当当呀,这么晚了还在为富翁们尽力,我的偶像!

saoren · 2003-02-23

function GetBewteenString(const TextStr, FirstSubStr, LastSubStr: string

IgnoreCase: Boolean
List: TStrings): Integer;
var
Text: PByte;
TextLen: Integer;
FirstBuffer: array [0..MAX_CHAR - 1] of Integer;
LastBuffer: array [0..MAX_CHAR - 1] of Integer;

function FindMatchIndex(const Sub: PByte

Buffer: array of Integer
SubLen, CurrPos: Integer): Integer;
var
I, J: Integer;
begin
Result := -1;
while CurrPos < TextLen do
begin
I := CurrPos;
J := SubLen - 1;
while (J >= 0) and
((PByteArr(Text)^ = PByteArr(Sub)^[J]) or
(IgnoreCase and (UpCase(PCharArr(Text)^) = PCharArr(Sub)^[J]))) do
begin
Dec(J);
Dec(I);
end;
if -1 = J then
begin
Result := CurrPos - SubLen + 2;
break;
end else
begin
if IgnoreCase then
Inc(CurrPos, Buffer[Byte(UpCase(PCharArr(Text)^[CurrPos]))])
else
Inc(CurrPos, Buffer[PByteArr(Text)^[CurrPos]]);
end;
end;
end;

var
FirstSub, LastSub: PByte;
I, CurrPos, LastPos, FirstPos, FirstLen, LastLen: Integer;
begin
Result := 0;
List.Clear

FirstLen := Length(FirstSubStr);
LastLen := Length(LastSubStr);
if (FirstLen = 0) or (LastLen = 0) then Exit;
TextLen := Length(TextStr);

FirstSub := @FirstSubStr[1];
LastSub := @LastSubStr[1];
Text := @TextStr[1];

if IgnoreCase then
begin
GetMem(FirstSub, FirstLen);
Move(FirstSubStr[1], FirstSub^, FirstLen);
FirstSub := PByte(StrUpper(PChar(FirstSub)));

GetMem(LastSub, LastLen);
Move(LastSubStr[1], LastSub^, LastLen);
LastSub := PByte(StrUpper(PChar(LastSub)));
end;

for I := 0 to MAX_CHAR - 1 do
begin
FirstBuffer := FirstLen;
LastBuffer := LastLen;
end;

for I := 0 to FirstLen - 2 do
FirstBuffer[PByteArr(FirstSub)^] := FirstLen - I - 1;
for I := 0 to LastLen - 2 do
LastBuffer[PByteArr(LastSub)^] := LastLen - I - 1;

List.BeginUpdate;
CurrPos := FirstLen - 1;

try
while CurrPos < TextLen do
begin
FirstPos := FindMatchIndex(FirstSub, FirstBuffer, FirstLen, CurrPos);
if FirstPos <> -1 then
begin
LastPos := FindMatchIndex(LastSub, LastBuffer, LastLen, CurrPos + FirstLen);
if LastPos <> -1 then
begin
Inc(Result);
List.Add(Copy(TextStr, FirstPos + FirstLen, LastPos - FirstPos - FirstLen));
CurrPos := LastPos + LastLen;
end else
begin
if IgnoreCase then
Inc(CurrPos, FirstBuffer[Byte(UpCase(PCharArr(Text)^[CurrPos]))])
else
Inc(CurrPos, FirstBuffer[PByteArr(Text)^[CurrPos]]);
end;
end else
break
//如果没有找到FirstSub，那退出.
end

finally
List.EndUpdate;
if IgnoreCase then
begin
FreeMem(FirstSub);
FreeMem(LastSub);
end;
end;
end

Index := GetBewteenString('AddddfAdddC AbbbbCCddddAC', 'A', 'C', False, Memo1.Lines);
Caption := IntToStr(Index);

好像是行了，[][]

saoren · 2003-02-23

最后一个我不敢想了。。。。[

]

俺只是对这些有点兴趣，也许过段时间就没了。。。[

][

]

beta · 2003-02-23

呵呵，你倒数第二个函数的功能我看了好久才看懂：）
aList.Names[0]:='a';
aList.Names[1]:='b';
aList.Names[2]:='c';
aList.Strings[0]:='x';
aList.Strings[1]:='y';
aList.Strings[2]:='z';
关键是你 x y z 的确定。如果有简单的对应关系（设计时可以确定），则完全
可以将后面两个过程合并，以提高速度。

如果可以合并，我倒是可以试一试，否则的话，我没有把握(性能方面)：）

小雨哥 · 2003-02-23

beta · 2003-02-23

呵呵，小雨哥，你还说我昼伏夜出：）

高分寻求支持中文的快速字符串操作函数(加CSDN中的200分) (300分)

ysai

Unregistered / Unconfirmed

Milpas

Unregistered / Unconfirmed

Pc 狂迷

Unregistered / Unconfirmed

LiChaoHui

Unregistered / Unconfirmed

悲酥清风

Unregistered / Unconfirmed

vivee

Unregistered / Unconfirmed

doll_paul

Unregistered / Unconfirmed

beta

Unregistered / Unconfirmed

beta

Unregistered / Unconfirmed

ysai

Unregistered / Unconfirmed

beta

Unregistered / Unconfirmed

saoren

Unregistered / Unconfirmed

ysai

Unregistered / Unconfirmed

saoren

Unregistered / Unconfirmed

ysai

Unregistered / Unconfirmed

saoren

Unregistered / Unconfirmed

saoren

Unregistered / Unconfirmed

beta

Unregistered / Unconfirmed

小雨哥

Unregistered / Unconfirmed

beta

Unregistered / Unconfirmed

Similar threads