Set of Char in XE8 gives compiler warnings - how to re-code? [Edit]

I have code like this:

{code}
const
  {Characters not allowed in file names.}
  FileNameForbiddenChars: set of Char = ['<', '>', '|', '"', '\', '/', ':', '*', '?','&']; // note: Some of these, while technically legal, cause issues elsewhere


function IsValidFileName(Filename:String):Boolean;
{
  Returns "True" if file name is legal/valid in Windows.

  Note: This is for FILE NAMES not full file paths, so will return false
        for a path like "C:\MyFile.txt"
}
var
  I: integer;
begin
  Result:=Filename<>'';
  for I:=1 to Length(Filename) do
      Result:=Result and not (Filename[I] in FileNameForbiddenChars);

end;
{code}


This gives the compiler warning:

+[dcc32 Warning] Unit1.pas(312): W1050 WideChar reduced to byte char in set expressions.  Consider using 'CharInSet' function in 'SysUtils' unit.+

What would you suggest I change in order to eliminate this compiler warning, while still keeping my constant defined outside of the function as a set?

Note: The above is just a sample of code for a general issue.

Thanks!

Carl.

Edited by: Carl Olsen on Jun 30, 2015 11:39 AM
0
Carl
6/30/2015 6:40:23 PM
📁 embarcadero.delphi.general
📃 4258 articles.
⭐ 0 followers.

💬 5 Replies
👁️‍🗨️ 1645 Views

Carl wrote:

> I have code like this:
<snip>
> This gives the compiler warning:
> 
> +[dcc32 Warning] Unit1.pas(312): W1050 WideChar reduced to byte char
> in set expressions.  Consider using 'CharInSet' function in 'SysUtils'
> unit.+

As it should be.  In Delphi 2009, Char was changed from AnsiChar to WideChar. 
 A "Set of Char" would thus produce a Set of up to 65535 elements, not 256 
elements like before.  However, a Set cannot hold more than 256 elements, 
so the compiler has no choice but to truncate your Char values to 8bit instead, 
thus the warning.  Which is fine as long as your filenames only contain ASCII 
characters.  But if they contain non-ASCII characters, you are going to lose 
precision in your comparisons.

> What would you suggest I change in order to eliminate this compiler
> warning, while still keeping my constant defined outside of the
> function as a set?

You could do what the compiler warning says - use the SysUtils.CharInSet() 
function:

{code}
const
  {Characters not allowed in file names.}
  FileNameForbiddenChars: TSysCharSet = ['<', '>', '|', '"', '\', '/', ':', 
'*', '?', '&']; // note: Some of these, while technically legal, cause issues 
elsewhere

function IsValidFileName(Filename:String):Boolean;
{
Returns "True" if file name is legal/valid in Windows.

Note: This is for FILE NAMES not full file paths, so will return false
for a path like "C:\MyFile.txt"
}
var
  I: integer;
begin
  Result := Filename <> '';
  for I := 1 to Length(Filename) do
    Result := Result and not CharInSet(Filename[I], FileNameForbiddenChars);
end;
{code}

However, that is not really any better, as each Char will still be truncated 
to 8bit, CharInSet() merely hides the warning from you.

If you support Unicode characters, you can't use a Set anymore.  Re-write 
the code, eg:

{code}
const
  {Characters not allowed in file names.}
  FileNameForbiddenChars: string = '<>|"\/:*?&'; // note: Some of these, 
while technically legal, cause issues elsewhere

function IsValidFileName(Filename:String):Boolean;
{
Returns "True" if file name is legal/valid in Windows.
Note: This is for FILE NAMES not full file paths, so will return false
for a path like "C:\MyFile.txt"
}
var
  I, J: integer;
begin
  Result := Filename <> '';
  for I := 1 to Length(Filename) do
    Result := Result and (Pos(Filename[I], FileNameForbiddenChars) = 0);
end;
{code}

However, there is a small performance hit when passing a single Char to Pos(). 
 You might not notice it for small filenames.  But, if you want to avoid 
that hit, you can do this instead:

{code}
const
  {Characters not allowed in file names.}
  FileNameForbiddenChars: string = '<>|"\/:*?&'; // note: Some of these, 
while technically legal, cause issues elsewhere

function IsValidFileName(Filename:String):Boolean;
{
Returns "True" if file name is legal/valid in Windows.
Note: This is for FILE NAMES not full file paths, so will return false
for a path like "C:\MyFile.txt"
}
var
  I: integer;
  Tmp: String;
begin
  Result := False;
  if Filename <> '' then
  begin
    SetLength(Tmp, 1);
    for I := 1 to Length(Filename) do
    begin
      Tmp[1] := Filename[I];
      if Pos(Tmp, FileNameForbiddenChars) <> 0 then
        Exit;
    end;
    Result := True;
  end;
end;
{code}

Or this:

{code}
const
  {Characters not allowed in file names.}
  FileNameForbiddenChars: string = '<>|"\/:*?&'; // note: Some of these, 
while technically legal, cause issues elsewhere

function IsValidFileName(Filename:String):Boolean;
{
Returns "True" if file name is legal/valid in Windows.
Note: This is for FILE NAMES not full file paths, so will return false
for a path like "C:\MyFile.txt"
}
var
  I, J: integer;
  C: Char;
begin
  Result := False;
  if Filename <> '' then
  begin
    for I := 1 to Length(Filename) do
    begin
      C := Filename[I];
      for J := 1 to Length(FileNameForbiddenChars) do begin
        if FileNameForbiddenChars[J] = C then Exit;
      end;
    end;
    Result := True;
  end;
end;
{code}

-- 
Remy Lebeau (TeamB)
0
Remy
6/30/2015 8:36:08 PM
Wow, Remy - you are a fountain of awesomeness!

Thank you once again!

> Remy Lebeau (TeamB)
0
Carl
6/30/2015 8:51:01 PM
Remy,

Thanks for your earlier reply on this.  I have things working nicely on my side thanks to your input.  One thing has been bugging me a bit, though, that I don't understand in your statement below:

>As it should be. In Delphi 2009, Char was changed from AnsiChar to WideChar. 
>A "Set of Char" would thus produce a Set of up to 65535 elements, not 256 
>elements like before. However, a Set cannot hold more than 256 elements, 
>so the compiler has no choice but to truncate your Char values to 8bit instead, 
>thus the warning. 

I would think that a set would still be 256 elements regardless of whether Char, AnsiChar, WideChar, etc., and that the penalty of a set of WideChar vs AnsiChar is that you would have a set of 256 Words instead of a set of 256 bytes, but in both cases, a set would be limited to 256 elements.

Why is it that a set of WideChar would become 65535 elements all of a sudden?  I would think that each individual element would be able to store a value of 65535, since it's two bytes, but I would not expect the set to become an array[0..65535] of WideChar.
0
Carl
7/6/2015 8:32:18 PM
Carl wrote:

> I would think that a set would still be 256 elements regardless of
> whether Char, AnsiChar, WideChar, etc., and that the penalty of
> a set of WideChar vs AnsiChar is that you would have a set of
> 256 Words instead of a set of 256 bytes

Only if you explicitly restrict the Set to 256 elements, but you are not 
doing that.  You are declaring an unbound "Set of Char", which creates a 
Set that encompasses the entire range of Low(Char)..High(Char) (and you are 
then populating the Set with only a few select characters, but space for 
the other characters still has to be allocated).  When Char was an alias 
for AnsiChar in Delphi 2007 and earlier, High(Char) was 255, and all was 
fine as Low(Char)..High(Char) was 256 elements.  But now that Char is an 
alias for WideChar in Delphi 2009+, High(Char) is 65535 now, so Low(Char)..High(Char) 
is 65536 elements, which is too many.

> Why is it that a set of WideChar would become 65535 elements all of
> a sudden? I would think that each individual element would be able to
> store a value of 65535, since it's two bytes, but I would not expect
> the set to become an array[0..65535] of WideChar.

Because High(WideChar) is 65535, thus Low(Char)..High(Char) (0..65535) is 
65536 elements.

It seems you have a fundamental misunderstanding what how a Set really works. 
 A Set is not an array of values, like you are thinking.  It is actually 
an array of bits, where each allowed value within the Set's range is represented 
by a single bit, the Set does not hold the actual values.  So, 8 values in 
the Set can be stored in a single Byte in memory, up to 32 bytes total for 
a 256-element Set.  Set operations like Include()/Exclude(), "value in Set", 
"Set +/- [values]", etc are very fast and efficient, because they are simply 
querying/twiddling individual bits, not whole values.

A Set is physically restricted to 256 bits max, and thus can only represent 
256 values, regardless of the byte size of the element type.  So, when Char 
is WideChar, a "Set of Char" cannot represent all of the values that a WideChar 
can contain, thus the compiler warning when it has to truncate WideChar values 
in Set expressions.

If Sets larger than 256 elements were allowed, a "Set of Char" for WideChar 
values would take up 8K of memory (65536 bits) max.  With your thinking, 
an array of 65536 2-byte whole values would take up 128K of memory, and be 
extremely slow in comparison.

-- 
Remy Lebeau (TeamB)
0
Remy
7/6/2015 9:36:49 PM
>It seems you have a fundamental misunderstanding what how a Set really works.

Wow - you are absolutely right!  I am glad we had this conversation, because I had it all wrong in my head.  Thanks!!
0
Carl
7/6/2015 10:22:29 PM
Reply: