从一个文本文件中读取字符，如何判断是英文字符，还是中文字符，还是其他字符？(100分)

Lion_sj · 2001-11-12

急！！！！

tseug · 2001-11-12

The IsDBCSLeadByte function determines whether a character is a lead byte ?that is, the first byte of a character in a double-byte character set (DBCS).

BOOL IsDBCSLeadByte(

BYTE TestChar // character to test
);

Parameters

TestChar

Specifies the character to be tested.

Return Values

If the character is a lead byte, the return value is nonzero.
If the character is not a lead byte, the return value is zero. To get extended error information, call GetLastError.

Remarks

Lead bytes are unique to double-byte character sets. A lead byte introduces a double-byte character. Lead bytes occupy a specific range of byte values. The IsDBCSLeadByte function uses the ANSI code page to check lead-byte ranges.

Lion_sj · 2001-11-12

麻烦说清楚一点好不好？

谢了！！！

bluerain · 2001-11-12

判断是否是英文字符是最简单的,这儿就不说了.
下面是判断是否是中文的.其他字符就是not 英文,not 中文了吧.
给你看篇文章：

演示程序中主要是用了 IsDBCSLeadByte 这个 API 来判断某字节是否在双字节字符集
(例如汉字)的前导字节集中(GB 2312-80 汉字编码中的第一个字节范围 0xA1-0xFe)

( The IsDBCSLeadByte function determines whether a character is a lead byte ?that is, the first byte

of a character in a double-byte character set (DBCS). )

procedure TForm1.Button1Click(Sender: TObject);

var

CutLengthOfLine{ 被处理字符串的总长度 }, i, j: integer;

sLine{ 被处理的源字符串 }: string;

sCuted{ 按固定长度分割出来的部分字符串 }: string;

iCutLength{ 按固定长度分割出来的部分字符串的长度 }: integer;

bIsDBCS{ 是否是汉字的前半字节 }: boolean;

begin

if edit1.text='' then begin

exit;

end;

CutLengthOfLine:=strtoint(edit1.text);

if CutLengthOfLine < 2 then begin

showmessage('CutLengthOfLine 必须大于等于 2 !');

Exit;

end;

Memo2.Lines.Clear;

for i := 0 to Memo1.Lines.Count - 1 do

begin

sLine := Memo1.Lines;

if Length(sLine) = 0 then

Memo2.Lines.Add(#13+#10)

else

repeat //开始处理字符串

iCutLength := CutLengthOfLine;

sCuted := Copy(sLine, 1, iCutLength);//从头取出 iCutLength 长的字符串

bIsDBCS := False;//先假设没有半个字符串

for j := 1 to iCutLength do //从头到尾逐个检查，至于为什么?

//原作者是这样解释的

//1. 为什麽不直接抓最後一个字元判断? 因为中文字的 Trail-byte, 其内码也可能落在 Lead-byte

// 的内码区间内.

//2. 为什麽不直接抓最後两个字元来判断? 因为前一个字的 Trail-byte 加上後一个字的 Lead-byte,

// 可能又是一个中文字.

begin

if bIsDBCS then //如果上一个字节是汉字的前半部分

bIsDBCS := False //则此时本字节是汉字的后半部分，

//所以将是否前半个汉字检测标志设为假

else

if Windows.IsDBCSLeadByte(byte(sCuted[j])) then

bIsDBCS := True;//否则检查本字节，并根据结果设置标志

end; //end of for

//如果最后一个字节的上一个字节是汉字的前半部分，则结束时

//检测标志为假，

if bIsDBCS then Dec(iCutLength);

//如果最后一个字节是汉字的前半部分，则少截取一个字符，避免乱码

Memo2.Lines.Add(Copy(sLine, 1, iCutLength));

sLine := Copy(sLine, iCutLength + 1, Length(sLine) - iCutLength);

//拷贝出下一部分固定长度的字符串，循环处理

until Length(sLine) <= 0;

end;

memo2.setfocus;

memo2.selstart:=0;

memo2.SelLength:=0;

end;

飘摇客 · 2001-11-12

我们可以通过把字符转换为数值来进行判断，Ord()函数把字符转换为对应的数值，
值33-126为键盘可使用字符，值127以上的为未知字符，即为汉字
procedure TForm1.Button1Click(Sender: TObject);
var s:string;
i,e,c:integer;
begin
s:=memo1.text;
e:=0;c:=0;
for i:=1 to length(s) do
begin
if (ord(s)>=33)and(ord(s)<=126) then
begin
inc(e);
label1.caption:='英文个数：'+inttostr(e);
end
else
if (ord(s)>=127) then
begin
inc(c);
label2.caption:='中文个数：'+inttostr(c div 2);
end;
end;
end;

Lion_sj · 2001-11-12

to 飘摇客,

如果是其他字符呢，如：，；（如何判断

chur · 2001-11-12

先读一个字符，再读第二个字符，两个字符加起来判断是否等于一下子读取前两个字符。

sword_liu · 2001-11-12

用ascii码呀。

飘摇客 · 2002-01-09

将你说的这些特殊的字符取出就行了，毕竟这些字符不多。

Lion_sj · 2002-02-09

多人接受答案了。

从一个文本文件中读取字符，如何判断是英文字符，还是中文字符，还是其他字符？(100分)

Lion_sj

Unregistered / Unconfirmed

tseug

Unregistered / Unconfirmed

Lion_sj

Unregistered / Unconfirmed

bluerain

Unregistered / Unconfirmed

飘摇客

Unregistered / Unconfirmed

Lion_sj

Unregistered / Unconfirmed

chur

Unregistered / Unconfirmed

sword_liu

Unregistered / Unconfirmed

飘摇客

Unregistered / Unconfirmed

Lion_sj

Unregistered / Unconfirmed

Similar threads