以前读到过一篇文章,感觉不错,大家有兴趣可以参考一下。
PChars: no strings attached
In the public newsgroups on the Borland server, I often see that there is still great confusion about the PChar type on the one, and the string or AnsiString type on the other hand. In this article I would like to discuss the similarites and the differences of both types, as well as some things you should or shouldn't do with them.
PChar
PChars were inspired by strings, as used in the C language. Most Windows API functions have a C interface, and accept C style strings. To be able to use APIs, Borland had to introduce a type that mimicked them, in the ancestor of Delphi, Turbo Pascal.
In C, there is no real string type, like there is in Delphi. Strings are just arrays of characters, and the end of the text is marked by a character with ASCII code zero. This allows them to be very long (unlike Turbo Pascal's string type, which was limited to 255 characters and a length byte - this is Delphi's ShortString type now), but a bit awkward to use. The beginning of the array is simply marked by a pointer to a char, which has become a PChar in Delphi. To traverse the string, in C one can use the pointer as if it were an array (this is true for all pointers in C), and use s[20] to indicate the 21st character (counting starts at 0). But C pointer arithmetic not only allows incrementing and decrementing the pointer, it also allows calculating the sum of a pointer and a number, or the difference between two pointers. In C, *(s + 20) is equivalent to s[20] (* is the C character for pointers, like Delphi's ^). Borland allowed almost the same syntax for the PChar type.
A PChar is just a pointer, like in C. And also like in C, you can use it as if it were an array (i.e. the pointer points to the first character in the array). But it isn't! A PChar has no automatic storage, like the convenient Delphi string. If you copy text to a PChar-"string", you must always make sure that the PChar actually points to a valid array, and that the array is large enough to hold the text.
var
S: PChar;
begin
S[0] := 'D';
S[1] := '6';
The code above did not allocate storage for the string, so it tries to store the characters starting at some random location in memory. This can cause problems, or even a program crash. It is your responsibility to ensure that the array exists. The easiest way is to use a local array:
var
S: PChar;
A: array[0..100] of Char;
begin
S := A;
S[0] := 'D'
// this is equivalent to A[0] := 'D';
S[1] := '6'
// you could also write: (S + 1)^ := '6';
The above code stores the characters in the array. But if you try to display the string at S, it will probably display lots of nonsense. That is because the string didn't end in a #0 character. OK, you could simply add another line:
S[2] := #0
// or: (S + 2)^ := #0;
and you would get a display of the text "D6". But storing characters one by one is really inconvenient. To display a text via a PChar is much simpler: you simply set the PChar to an already existing array with a text in it. Luckily, string constants like 'Delphi' are also such arrays, and can be used with PChars:
var
S: PChar;
begin
S := 'Delphi';
You should however be aware that that only changes the value of the pointer S. No text is moved or copied around. The text is simply stored somewhere in the program (and has a #0 delimiter), and S is pointed to its start address. If you do:
// WARNING: BAD EXAMPLE
var
S: PChar;
A: array[0..100] of Char;
begin
S := A;
S := 'Delphi';
this does not copy the text 'Delphi' to the array A. The first line after begin points S to the array A, but immediately after that, the next line only changes S to the address of the literal string. If you want to copy text to the array, you must do that using for instance StrCopy or StrLCopy:
var
S: PChar;
A: array[0..100] of Char;
begin
S := A;
StrCopy(S, 'Delphi');
or
StrLCopy(S, 'Delphi', 100);
In this case it is obvious that 'Delphi' will fit in the array, so the use of StrLCopy seems a bit overdone, but in other occasions, where you don't know the size of the string, you should use StrLCopy to avoid overrunning the array bounds.
An array like A is useful as a text buffer for small strings of a known size, but often you'll have strings of a size which is unknown when the program is compiled. In that case you'll have to use dynamic allocation of a text buffer. You can for instance use StrAlloc or StrNew to create a buffer, or GetMem, but then you'll have to remember to free the memory again, using StrDispose or FreeMem. You can also use a Delphi string as a buffer, but before I describe how to do that, I want to discuss that type first.
String
Allow me to confuse you: a string or, more precise, AnsiString is in fact a PChar. Just as a PChar, it is a pointer to an array of characters, terminated by a #0 character. But there is one big difference. You normally don't have to think about how they work. They can be used almost like any other variable. The compiler takes care that the appropriate code to allocate, copy and free the text is called. So instead of calling routines like StrCopy, the compiler will take care of that for you.
But there is more. Although the text is sure to be always terminated by a #0, just to make AnsiStrings compatible with C-style strings, the compiler doesn't need it. In front of the text in memory, at a negative offset, the length of the string is stored, as an Integer. So to know the length of the string, the compiler only has to read that Integer, and not count characters until it finds a #0. That means that you can store #0 characters in the middle of the string without confusing the compiler. But some output routines, which rely on the #0 and not on the length, might be confused.
Normally, each time you'd assign one string to another variable, the compiler would have to allocate memory and copy the entire string to it. Because AnsiStrings can be quite long (theoretically, up to 2GB), this could be slow. To avoid the copying, Delphi knows a concept that is called "copy on demand". Each string has another field of information stored in front of it: the reference count. This is the count of variables that actually reference that particular string in memory. Only if it becomes 0, the string text is not used anymore, and the memory can be freed.
The compiler takes care that the reference count is always correct (but you can confuse the compiler by casting - more on that later). If a string variable is declared in a var section, or as a field of a class or record, it will start its life as nil, the internal representation of the empty string (''). As soon as string text is created and assigned to one of these variables, the reference count of the string will be 1. Each additional assignment of that particular string to a new variable will increment the reference count. If a string variable leaves its scope (when the function or class in which it was declared ends), or is pointed to a new string, the reference count of the text is decremented.
A simple example:
function PlayWithStrings: string;
var
S1, S2: string;
begin
S1 := IntToStr(123456);
Now S1 points to the text '123456' and has a reference count of 1.
S2 := S1;
No text is copied yet, S2 is simply set to the same address as S1, but the reference count of the text '123456' is 2 now.
S2 := 'The number is ' + S2;
Now a new, larger buffer is allocated, the text 'The number is ' is copied to it, and the text from '123456' concatenated. But, since S2 doesn't point to the text '123456' anymore, the reference count of that text is decremented to 1 again.
Result := S2;
Result will be set to point to the same address as S2, and the reference count of the text 'The number is 123456' is incremented to 2.
end;
Now S1 and S2 leave their scope. The reference count for '123456' will be decremented to 0, and the text buffer will be freed. The reference count for 'The number is 123456' will also be decremented, but only to 1, since the function result still points to it. So although the function has ended, the string is still around.
Complicated? Yes, it is complicated, and can get even more complicated with var, const and out parameters. But fortunately, you normally don't have to worry about this. Only if you access strings in assembler, or using a typecast to a PChar, this can become important to know. But using strings with a typecast to PChar is something which is not uncommon.
The most importants things to remember about strings are
that text is only copied to a new string buffer if it is modified
that the reference count and the length are not connected to a string variable, but to a specific text buffer, to which more than one string variable can point
that the reference count is always correct unless you fool the compiler by casting to a different type
that assignments to a variable decrement the reference count of the text buffer it previously pointed to
that if the reference count becomes 0, the string buffer is freed.
Using strings and PChars together
PChars and character arrays are awkward to use. Most of the time, you must allocate memory, and not forget to free it. If you want to add text, you must first calculate the size of the resulting text, reallocate the text buffer if it is too small, and use StrCat or StrLCat to finally add the text. You must use StrComp or StrLComp to compare strings, etc. etc.
Strings, on the other hand, are much simpler to use. Most things are done automatically. But many Windows (or Linux) API functions require PChars, and not strings. Fortunately, since strings are also pointers to zero-terminated text, you can use them as a PChar by simply casting them:
var
S: string;
begin
S := ExtractFilePath(ParamStr(0)) + 'MyDoc.doc';
ShellExecute(0, 'open', PChar(S), nil, nil, SW_SHOW);
end;
Don't forget, that an AnsiString variable is a pointer to text, and not a text buffer itself. If the text is modified, it will often be copied to a new location, and the address in the variable is adjusted accordingly. That means that you should not use a PChar to point to the string and then modify the string. It is best to avoid doing something like:
// WARNING: BAD EXAMPLE
var
S: string;
P: PChar;
begin
S := ParamStr(0)
// say, this returns 'C:/Test.exe';
P := PChar(S);
S := 'Something else';
If S is changed to 'Something else', P will not be changed with it, and still point to 'C:/Test.exe'. Since P is not a string reference to that text, and there is no other string variable pointing to it, its reference count will become 0, and the text will be discarded. That means, that P now points to invalid memory.
It is wise not confuse the compiler by mixing PChar and string variables, unless you know what you do. The compiler does not recognize a PChar as a string, so it will not change the reference count of the string memory, if you point a PChar to it. It is often better not to use a PChar variable like this at all. Simply use the string as much as possible, and only cast at the last moment. Functions accepting a PChar parameter should copy the text they receive to their own buffer.
Normally, string buffers are only as large as necessary to contain the text assigned to them. But using SetLength you can set the string buffer to any size you need. This makes string buffers useful as text buffers to receive text. Windows API functions that return a text in a character array can be used like this:
function WindowsDirectory: string;
begin
SetLength(Result, MAX_PATH);
GetWindowsDirectory(PChar(Result), Length(Result));
SetLength(Result, StrLen(PChar(Result)));
end;
Alternatively, since you can assign a PChar to a string, and that will result in a new string with a copy of the text, you can set the length of the string just as well with this functionally equivalent code:
Result := PChar(Result);
The last line of the function sets the length of the string back to the length of the C-style string that was stored in the buffer. If you need the result as a PChar anyway, to be processed by further API routines, you may perhaps be tempted to do this instead:
// WARNING: BAD EXAMPLE
function WindowsDirectoryAsPChar: PChar;
var
Buffer: array[0..MAX_PATH] of Char;
begin
GetWindowsDirectory(Buffer, MAX_PATH);
Result := Buffer;
end;
This will however fail. Because Buffer is a local variable, the entire buffer is in local memory (the processor stack). As soon as the function ends, the local memory is reused for other routines, so the text to which the result now points is turned into complete gibberish. Local buffers should never be used to return text.
But even if you had used a dynamic allocation with StrAlloc or a similar routine, the user would have to free the string. It generally is not a good idea to return PChars like that. Better follow the example of GetWindowsDirectory, and let the user of the function provide a buffer and its length. You then simply fill the buffer (using StrLCopy) up to the given length.
There is an alternative to the function WindowsDirectory, that could use a local buffer. This relies on the fact that you can assign a PChar to a string directly. To make the text a Delphi string (with length and reference count fields), a Delphi string buffer will be allocated to the required length, and the text copied to that. So even if the local buffer is discarded, the text in the string buffer is still there:
function WindowsDirectory: string;
var
Buffer: array[0..MAX_PATH] of Char;
begin
GetWindowsDirectory(Buffer, MAX_PATH);
Result := Buffer
// StrLen(Buffer) characters copied!
end;
But how would you write a function, for instance in a DLL, that must pass back data as a PChar, yourself? I think you should take the example of GetWindowsDirectory again. Here is a simple DLL function, returning a version string that is stored in our DLL:
// Having a separate function to get the length is clearer than
// asking GetDLLVersion to provide that length if parameters are nil.
function GetDLLVersionLength: Integer;
begin
Result := Length(DLLVersion + IntToStr(VersionNum));
end;
// Returns number of characters copied, excluding zero byte
function GetDLLVersion(Buffer: PChar
MaxLen: Integer): Integer;
begin
if (Buffer <> nil) and (MaxLen > 1) then
begin
StrLCopy(Buffer, PChar(DLLVersion +IntToStr(VersionNum)), MaxLen - 1);
Result := StrLen(Buffer);
end
else
Result := 0;
end;
As you can see, the string is simply copied to the provided buffer with StrLCopy. Because the user must provide the buffer, you will avoid any memory management problems. If you provided it, the user would have to know how to free it. FreeMem doesn't work across a DLL boundary. But even if it did, a user of the DLL that used C or Visual Basic would not know how to free the buffer in that language, since memory management is different in each language. Letting the user provide the buffer makes him or her independent of your implementation.
Conclusions
Although AnsiStrings and PChars are both string types, they are quite different. Strings are easier to use, whereas for PChars you must do almost everything yourself. You can use them together, and cast a string as PChar, and assign a PChar to a string, but because strings change their address when they are changed, you should not hold on very long to the address you obtain by casting a string to a PChar. Assigning a PChar to a string is less hazardous.
As the previous text demonstrated, allocating text in a function and then returning a PChar to the new buffer is ususally not a good idea. It is even worse if it is done across a DLL boundary, since the user can perhaps not even free the memory - the DLL and the user probably use a different memory manager, and each has a different heap. It is also not a very good idea to use a local buffer to return text.
If you must use PChars, because a function requires them, you should use strings as much as possible, and only cast to PChar when you use the string as a parameter. Using strings is much easier, and less error prone, than using the C-style string functions.
Disclaimer
I hope I have lifted a bit of the fog regarding PChars. I have not told everything there is to be known, and perhaps even twisted the exact truth a bit (for instance, not every Delphi string is reference counted - string literals always have a reference count of -1), but those internal details are not important for the big picture, and have no bearing on the safe use and interaction of strings and PChars.
Rudy Velthuis