HexToInt的实现&速度大比拼 (300分)

唉,达到极致了
 
晕,tseug你也太过分了吧,不过你这个程序不公平哦,前面的都有非法字符的判断,你
这个没有。把判断加上吧,还有,大家不同的机器可以都测一下,看看在不同的机器上
各个程序的执行效率如何。
你的程序的pascal版本如下:
procedure initDoubleWood_Table;
var
i,j: Integer;
begin
FillMemory(@Convert_Table, Sizeof(Convert_Table), $FF);
for i := 0 to 7 do
begin
for j := ord('0') to ord('9') do
begin
Convert_Table[i*256+j] := (j-ord('0')) shl (i*4);
end;
for j := ord('A') to ord('F') do
begin
Convert_Table[i*256+j] := (j-ord('A')+10) shl (i*4);
end;
for j := ord('a') to ord('f') do
begin
Convert_Table[i*256+j] := (j-ord('a')+10) shl (i*4);
end;
end;
end;

function HexToInt_DoubleWood2(const S: string): DWORD;
var
i : Integer;
P : Pchar;
begin
Result := 0;
if Pointer(s) = nil then exit;
P := @Convert_Table;
for i := PInteger(Integer(s) - 4)^ downto 1 do
begin
Result := Result or PDword(P + Ord(S)*4)^;
P := P+1024;
end;
end;

循环10000000次的测试结果:

HexToInt_tseug2 (281ms).
HexToInt_DoubleWood2 (329ms).

HexToInt_tseug2 (266ms).
HexToInt_DoubleWood2 (312ms).

HexToInt_tseug2 (297ms).
HexToInt_DoubleWood2 (312ms).
 
呵呵,我一看你的“更过分”就知道你要用 大 表了:)
我去检查一下我的 beta3
 
呵呵,大叔,我那是一个“笔误”,已修正。

 
没有错误检查的,一律不算数:)

 
呵呵,刚发现,DoubleWood2 的结果才不对啊:)

 
关于机器差异的问题,主要表现在 LTD 搜索上。在 2 级缓存比较大的机器上,可以装入更
多的 LTD 条目,对使用 Pascal 这样的高级语言有利。对于汇编来讲,它的寻址很明确,
大的 2 级缓存获得的利益相对较小,但如果汇编也使用预定义表或在一个程序段内套用表
结构的话,估计在大 2 级缓存下也会有收益。
 
这就怪了,结果正确我才放上去的阿,是不是你测试前没有初始化?
嗯,把漏的定义部分补上:
var
Convert_Table: array [0..256*8-1] of DWORD;
发现pascal版本DoubleWood2 加错误判断的话似乎没太大影响,看来编译器的优化相当
好阿:)
修改过的DoubleWood2:
function HexToInt_DoubleWood2(const S: string): DWORD;
var
i : Integer;
P : Pchar;
begin
Result := 0;
if Pointer(s) = nil then exit;
P := @Convert_Table;
for i := PInteger(Integer(s) - 4)^ downto 1 do
begin
if PDword(P + Ord(S)*4)^=$FFFFFFFF then
begin
Result := 0;
exit;
end;
Result := Result or PDword(P + Ord(S)*4)^;
P := P+1024;
end;
end;

修改后的测试结果:
HexToInt_tseug0 (594ms). (2882343476)
HexToInt_tseug1 (641ms). (2882343476)
HexToInt_tseug2 (281ms). (2882343476)
HexToInt_DoubleWood (469ms). (2882343476)
HexToInt_DoubleWood2 (328ms). (2882343476)
HexToInt_beta1 (500ms). (2882343476)
HexToInt_beta2 (578ms). (2882343476)
HexToInt_beta3 (500ms). (1094922547)
HexToInt_tseug0 (609ms). (2882343476)
HexToInt_tseug1 (750ms). (2882343476)
HexToInt_tseug2 (266ms). (2882343476)
HexToInt_DoubleWood (484ms). (2882343476)
HexToInt_DoubleWood2 (328ms). (2882343476)
HexToInt_beta1 (484ms). (2882343476)
HexToInt_beta2 (563ms). (2882343476)
HexToInt_beta3 (500ms). (1094922547)
HexToInt_tseug0 (563ms). (2882343476)
HexToInt_tseug1 (687ms). (2882343476)
HexToInt_tseug2 (266ms). (2882343476)
HexToInt_DoubleWood (469ms). (2882343476)
HexToInt_DoubleWood2 (344ms). (2882343476)
HexToInt_beta1 (484ms). (2882343476)
HexToInt_beta2 (562ms). (2882343476)
HexToInt_beta3 (500ms). (1094922547)
HexToInt_tseug0 (578ms). (2882343476)
HexToInt_tseug1 (719ms). (2882343476)
HexToInt_tseug2 (281ms). (2882343476)
HexToInt_DoubleWood (469ms). (2882343476)
HexToInt_DoubleWood2 (328ms). (2882343476)
HexToInt_beta1 (500ms). (2882343476)
HexToInt_beta2 (578ms). (2882343476)
HexToInt_beta3 (500ms). (1094922547)
我把计算结果显示出来了,结果明明是beta3不对嘛……
 
beta3 我已经修改过了(修改了上面的帖子)[:D]
哦,你的 DoubleWood2 我当然出示化了,不过表定义错了,以修正:)

我的测试结果:

HexToInt_tseug0 (160ms): 2882343476.
HexToInt_tseug1 (381ms): 2882343476.
HexToInt_tseug2 (140ms): 2882343476.
HexToInt_DoubleWood1(160ms): 2882343476.
HexToInt_DoubleWood2(171ms): 2882343476.
HexToInt_beta1 (150ms): 2882343476.
HexToInt_beta2 (170ms): 2882343476.
HexToInt_beta3 (140ms): 2882343476.

HexToInt_tseug0 (161ms): 2882343476.
HexToInt_tseug1 (380ms): 2882343476.
HexToInt_tseug2 (150ms): 2882343476.
HexToInt_DoubleWood1(161ms): 2882343476.
HexToInt_DoubleWood2(180ms): 2882343476.
HexToInt_beta1 (160ms): 2882343476.
HexToInt_beta2 (160ms): 2882343476.
HexToInt_beta3 (151ms): 2882343476.

和你的大不一样吧:)

 
我又想到一个更快的方法,不过不知道有没有机会实现了,
马上要回学校了,这个机器也要物归原主了:(

 
确实如此,特别是HexToInt_tseug2和HexToInt_DoubleWood2两个使用大表的程序,在你
的机器上并没有表现出大表的优势,而在我的机器上优势明显,和HexToInt_tseug0相比
有1倍左右的提高,而在你的机器则除了HexToInt_tseug1外,其他程序基本相当,差距不
大。看来还是和CPU的关系很大阿。有没有人能提供其他CPU上的测试结果呢?
tseug的CPU是什么呢?能把结果贴出来吗?
beta你可以把思路描述一下,我看我有没有兴趣来实现:)
 
有这个必要吗??
为什么不走捷径???
var
ss:string;
i:integer;
begin
ss:='0F0A';
i:=StrToInt('$'+ss);
难道这样实现不行???
 
哈哈,已经搞定了:)

var
BetaTable4: array [0..Ord('f') * 256 + 255] of Word;

procedure InitBetaTable4;
const
ValidChars = ['0'..'9', 'A'..'F', 'a'..'f'];
var
i, j: Integer;
begin
FillChar(BetaTable4[0], SizeOf(BetaTable4), $FF);
for i := 0 to Ord('f') do
begin
if Chr(i) in ValidChars then
for j := 0 to Ord('f') do
begin
if Chr(j) in ValidChars then
BetaTable4[j * 256 + i] := StrToInt('$' + Chr(i) + Chr(j));
end;
end;
end;

function HexToInt_beta4(const S: string): DWORD;
const
SmallTbl: array [0..255] of Byte = (
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 16, 16, 16, 16, 16, 16,
16, 10, 11, 12, 13, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 10, 11, 12, 13, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16);
asm
push ebx
push ecx
push edx
push esi
push edi

mov edi, esp
push 0

mov esi, eax //字符串地址
mov ecx, [eax - 4] //读取字符串长度
test esi, esi //判断是否为空指针
jz @Err
test ecx, ecx //判断字符串是否为空
jle @Err

lea eax, BetaTable4
xor edx, edx
xor ebx, ebx

test ecx, 1
jz @LeftBytes

//FirstByte:
lea eax, SmallTbl
mov bl, [esi]
mov bl, [eax][ebx]
test ebx, 16
jnz @Err
dec edi
mov [edi], bl
inc esi
lea eax, BetaTable4

dec ecx
jz @Ext

@LeftBytes:
mov bx, [esi]
cmp bl, 'f'
ja @Err
mov dx, [eax][ebx * 2]
test dh, dh
jnz @Err

dec edi
mov [edi], dl

inc esi
inc esi
dec ecx
dec ecx
jnz @LeftBytes

@Ext:
pop eax

pop edi
pop esi
pop edx
pop ecx
pop ebx
ret

@Err:
mov [esp], 0
pop eax

pop edi
pop esi
pop edx
pop ecx
pop ebx
end;


测试结果如下:

HexToInt_tseug0 (170ms): 2882343476.
HexToInt_tseug2 (140ms): 2882343476.
HexToInt_DoubleWood1(150ms): 2882343476.
HexToInt_DoubleWood2(151ms): 2882343476.
HexToInt_beta4 (120ms): 2882343476.

HexToInt_tseug0 (171ms): 2882343476.
HexToInt_tseug2 (130ms): 2882343476.
HexToInt_DoubleWood1(160ms): 2882343476.
HexToInt_DoubleWood2(150ms): 2882343476.
HexToInt_beta4 (100ms): 2882343476.


注:我前面的 beta2, beta3 处理奇数长度字符串时有问题,没时间改了,
反正这个 beta4 很好用,而且不出错,就行了:)

 
请问: Base64解码用 查表法怎么做啊?

下面的这个方法,好像效率不高

Const Base64Table = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';

Function Base64Decode(Value : String) : String;
Var AIn : Array[1..4] Of Byte;
AOut : Array[1..3] Of Byte;
AWork : Array[1..3] Of Byte;
I : Integer;
C : Integer;
Begin
Result := '';
I := 1;
While I < Length(Value) Do
Begin
C := 3;
FillChar(AWork, SizeOf(AWork), #0);
FillChar(AOut, SizeOf(AWork), #0);
AIn[1] := Byte(Pos(Value,Base64Table)-1);
AIn[2] := Byte(Pos(Value[I+1],Base64Table)-1);
AIn[3] := Byte(Pos(Value[I+2],Base64Table)-1);
AIn[4] := Byte(Pos(Value[I+3],Base64Table)-1);
If Value[I+3]='=' Then
Begin
C := 2;
AIn[4] := 0;
If Value[I+2]='=' Then
Begin
C := 1;
AIn[3] := 0;
End;
End;
AWork[2] := Byte(AIn[1] Shl 2);
AWork[3] := Byte(AIn[2] Shr 4);
AOut[1] := Byte(AWork[2] Or AWork[3]);
AWork[2] := Byte(AIn[2] Shl 4);
AWork[3] := Byte(AIn[3] Shr 2);
AOut[2] := Byte(AWork[2] Or AWork[3]);
AWork[2] := Byte(AIn[3] Shl 6);
AOut[3] := Byte(AWork[2] Or AIn[4]);
Result := Result + Char(AOut[1]);
If C > 1 Then
Result := Result + Char(AOut[2]);
If C > 2 Then
Result := Result + Char(AOut[3]);
Inc(I, 4);
End;
End;
 
很遗憾……
在我的机器上仍然没有明显的进步……
最过分的是——表又变大了……
HexToInt_tseug0 (562ms). (2882343476)
HexToInt_tseug2 (282ms). (2882343476)
HexToInt_DoubleWood (484ms). (2882343476)
HexToInt_DoubleWood2 (328ms). (2882343476)
HexToInt_beta1 (500ms). (2882343476)
HexToInt_beta4 (485ms). (2882343476)

HexToInt_tseug0 (594ms). (2882343476)
HexToInt_tseug2 (266ms). (2882343476)
HexToInt_DoubleWood (484ms). (2882343476)
HexToInt_DoubleWood2 (344ms). (2882343476)
HexToInt_beta1 (484ms). (2882343476)
HexToInt_beta4 (484ms). (2882343476)

HexToInt_tseug0 (547ms). (2882343476)
HexToInt_tseug1 (750ms). (2882343476)
HexToInt_tseug2 (281ms). (2882343476)
HexToInt_DoubleWood (485ms). (2882343476)
HexToInt_DoubleWood2 (328ms). (2882343476)
HexToInt_beta1 (484ms). (2882343476)
HexToInt_beta4 (484ms). (2882343476)

HexToInt_tseug0 (640ms). (2882343476)
HexToInt_tseug1 (766ms). (2882343476)
HexToInt_tseug2 (266ms). (2882343476)
HexToInt_DoubleWood (469ms). (2882343476)
HexToInt_DoubleWood2 (312ms). (2882343476)
HexToInt_beta1 (484ms). (2882343476)
HexToInt_beta4 (484ms). (2882343476)
 
最新 beta5 抢鲜出炉,表更小(由字表改为字节表),速度更快(寻址不需要乘法了)!

var
BetaTable5: array [0..Ord('f') * 256 + Ord('f')] of Byte;

procedure InitBetaTable5;
const
ValidChars = ['0'..'9', 'A'..'F', 'a'..'f'];
var
i, j: Integer;
begin
FillChar(BetaTable5[0], SizeOf(BetaTable5), $FF);
for i := 0 to Ord('f') do
begin
if Chr(i) in ValidChars then
for j := 0 to Ord('f') do
begin
if Chr(j) in ValidChars then
BetaTable5[j * 256 + i] := StrToInt('$' + Chr(i) + Chr(j));
end;
end;
end;

function HexToInt_beta5(const S: string): DWORD;
const
SmallTbl: array [0..255] of Byte = (
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 16, 16, 16, 16, 16, 16,
16, 10, 11, 12, 13, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 10, 11, 12, 13, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16);
asm
push ebx
push ecx
push edx
push esi
push edi

mov edi, esp
push 0

mov esi, eax //字符串地址
mov ecx, [eax - 4] //读取字符串长度
test esi, esi //判断是否为空指针
jz @Err
test ecx, ecx //判断字符串是否为空
jle @Err

lea eax, BetaTable5
xor edx, edx
xor ebx, ebx

test ecx, 1
jz @LeftBytes

//FirstByte:
lea eax, SmallTbl
mov bl, [esi]
mov bl, [eax][ebx]
test ebx, 16
jnz @Err
dec edi
mov [edi], bl
inc esi
lea eax, BetaTable5

dec ecx
jz @Ext

@LeftBytes:
mov bx, [esi]
cmp bl, 'f'
ja @Err
cmp bh, 'f'
ja @Err
mov dl, [eax][ebx]

dec edi
mov [edi], dl

inc esi
inc esi
dec ecx
dec ecx
jnz @LeftBytes

@Ext:
pop eax

pop edi
pop esi
pop edx
pop ecx
pop ebx
ret

@Err:
mov [esp], 0
pop eax

pop edi
pop esi
pop edx
pop ecx
pop ebx
end;


测试结果如下:

HexToInt_tseug0 (170ms): 2882343476.
HexToInt_tseug1 (380ms): 2882343476.
HexToInt_tseug2 (111ms): 2882343476.
HexToInt_DoubleWood1(150ms): 2882343476.
HexToInt_DoubleWood2(170ms): 2882343476.
HexToInt_beta4 (110ms): 2882343476.
HexToInt_beta5 (100ms): 2882343476.

HexToInt_tseug0 (181ms): 2882343476.
HexToInt_tseug1 (380ms): 2882343476.
HexToInt_tseug2 (110ms): 2882343476.
HexToInt_DoubleWood1(150ms): 2882343476.
HexToInt_DoubleWood2(161ms): 2882343476.
HexToInt_beta4 (130ms): 2882343476.
HexToInt_beta5 (100ms): 2882343476.

HexToInt_tseug0 (170ms): 2882343476.
HexToInt_tseug1 (381ms): 2882343476.
HexToInt_tseug2 (110ms): 2882343476.
HexToInt_DoubleWood1(160ms): 2882343476.
HexToInt_DoubleWood2(151ms): 2882343476.
HexToInt_beta4 (110ms): 2882343476.
HexToInt_beta5 (100ms): 2882343476.


呵呵,DoubleWood 兄,反正我这里的结果就是和你的不一样:)我估计应该是 CPU 的
问题。只有看其他富翁的测试结果如何了。

 
从理论上来说,我的算法应该比你们快:)看你们的代码,都是对于源字符串的
每一个字符查一次表;而我是每次取两个字符出来查一次表,就是说我对内存的
访问次数比你们的少一半,而其他地方的开销几乎相同,所以。。。

本来想用 MMX 寄存器一次把 8 个字符全部读入寄存器,然后就不用访问内存了
这样来优化的,结果发现为了给它移位,效率被抵消了,遂放弃该方法。

 
to lich:你那个做法当然效率低了,我现在在外地,学校里面有一份我优化过
的 Base64 的代码。如果你要的急就只好请其他富翁帮忙找了,基本思想是把
整个字符串看成是 01 序列(每字符可拆解成 8 个),然后每次取出其中 6 个,
然后再把这 6 个看成一个整数,根据此整数查表即可。要是你要的不急,我下
周回学校后再发给你:)


 
hehe,诸位写了这么多,我也就不献丑了
 

Similar threads

I
回复
0
查看
618
import
I
I
回复
0
查看
681
import
I
I
回复
0
查看
662
import
I
I
回复
0
查看
653
import
I
I
回复
0
查看
777
import
I
顶部