下面这篇文章可能就是你要找的...
How much looks John like Jon? (Updated)
Question: You know the problem: A simple type mismatch and the search in
the database goes fail, because the strings did not match...
Answer: This function compares the single characters, counts the identical characters an calculates a procentual value, how similar the both strings are.
To bring in a little unsharp factor, the function checks if any
identical characters is in the "near" of the actual compare position. This is calculated in a formula depending on the length of the strings (diff).
Some results:
'John' and 'John' = 100%
'John' and 'Jon' = 75%
'Jim' and 'James' = 40%
"Luke Skywalker" and 'Darth Vader' = 0%
Extension (24. May 2000):
- If parameter tolerant is true, the rules to compare single
characters are relaxed. Look for function CompChar. This is
not a phonetic compare, just relaxed rules on similarty
chracters. The selection is straight german, so feel free to
replace with your specific language characters.
- Instead of a static array of boolean, I use the class TBits
(unit classes in D4) to get unlimited string length. Maybe
this class is not available in earlier Delphi versions.
- function Max is from unit Math which is not included in D3
Standard. Use instead this:
function Max (i1, i2: Integer): Integer;
begin
if i1 < i2 then Max:= i2 else Max:= i1;
end (*Max*);
function StrSimilar (s1, s2: string
tolerant: Boolean): Integer;
var hit: Integer
// Number of identical chars
p1, p2: Integer
// Position count
l1, l2, l: Integer
// Length of strings
diff: Integer
// unsharp factor
hstr: string
// help var for swapping strings
// Array shows if position is already tested
test: Classes.TBits;
function CompChar (ch1, ch2: Char): Boolean;
// german "umlauts" and similar charactes
begin
if tolerant then begin
ch1:= UpCase (ch1)
// compare case insensitive
ch2:= UpCase (ch2);
case ch1 of
'?, '?, 'E': Result:= ch2 in ['?, 'E', '?];
'B', 'P': Result:= ch2 in ['B', 'P'];
'C', 'Z': Result:= ch2 in ['C', 'Z'];
'D', 'T': Result:= ch2 in ['D', 'T'];
'F', 'V': Result:= ch2 in ['F', 'V'];
'G', 'K': Result:= ch2 in ['G', 'K'];
'S', '?: Result:= ch2 in ['S', '?];
'I', 'J',
'Y', '?, '?: Result:= ch2 in ['I', 'J', 'Y', '?, '?];
else Result:= ch1 = ch2;
end;
end else begin
Result:= ch1 = ch2;
end;
end;
begin
l1:= Length (s1);
l2:= Length (s2);
if (l1 <= 0) or (l2 <= 0) then begin Result:= 0
Exit
end;
// Test Length and swap, if s1 is smaller
if l1 < l2 then begin
hstr:= s2
s2:= s1
s1:= hstr;
l:= l2
l2:= l1
l1:= l;
end;
p1:= 1
p2:= 1
hit:= 0;
// calc the unsharp factor depending on
// the length of the strings
diff:= Max (l1, l2) div 3 + ABS (l1 - l2);
// init the test array
test:= Classes.TBits.Create;
// Calc size of TBits. Must be two bigger, because we're 0-Based
// counting from 1, and we need one more then stringlength
test.Size:= l1 + 2;
// loop through the string
repeat
// position tested?
if not test.Bits[p1] then begin
// found a matching character?
if CompChar (s1[p1], s2[p2]) and
(ABS(p1-p2) <= diff) then begin
test.Bits[p1]:= True;
Inc (hit)
// increment the hit count
// next positions
Inc (p1)
Inc (p2);
if p1 > l1 then p1:= 1;
end else begin
// Set test array
test.Bits[p1]:= False;
Inc (p1);
// Loop back to next test position
if p1 > l1 then begin
while (p1 > 1) and not (test[p1]) do Dec (p1);
Inc (p2)
end;
end;
end else begin
Inc (p1);
// Loop back to next test position
if p1 > l1 then begin
repeat Dec (p1)
until (p1 = 1) or test.Bits[p1];
Inc (p2);
end;
end;
until p2 > l2;
test.Free
// Release Booleanlist
// calc procentual value
Result:= 100 * hit DIV l1;
Uploader: Peter Hellinger