Issue with TIdTextEncoding.Default

Hi,

Since Indy 10.5.5, I explicitely use TIdTextEncoding.Default as default 
string encoding for the sockets, in order to transfer ANSI-based 
messages between the peers. Nevertheless, some characters are still not 
transfered properly, while there was no problem with the 10.2.3 
version.

For instance, consider the string "ABC’DEF". Using 
TIdTextEncoding.Default, when the client sends this string, the server 
receives "ABC?DEF". AFAIK, the character "’" is a legal ANSI character 
which should be handled by TIdTextEncoding.Default.

If I modify IdGlobal.Indy8BitEncoding() to use the codepage 1252 
instead of the 28591 one, the string is received correctly. 
Nevertheless, reading the comment on the top of the function, using the 
codepage 28591 seems to be a deliberate choice.

What is the exact purpose of Indy8BitEncoding() ?
What should I do in order to make TIdTextEncoding.Default work ?

All seems to work fine when I use TIdTextEncoding.UTF8.

I'm running D2007 and Vista SP2 French.

Regards,

A.R.
-1
Adrien
4/17/2009 8:12:52 PM
embarcadero.delphi.winsock 1874 articles. 2 followers. Follow

3 Replies
1723 Views

Similar Articles

[PageSpeed] 17

"Adrien Reboisson" <adrien-reboissonATastaseDOTcom@example.org> wrote in
message news:105689@forums.codegear.com...

> For instance, consider the string "ABC’DEF". Using
> TIdTextEncoding.Default, when the client sends this string,
> the server receives "ABC?DEF".

That means the character does not exist in the OS's default codepage.  The 
OS, not Indy, is doing the actual conversions between Ansi and Unicode.

> AFAIK, the character "’" is a legal ANSI character

No, it is not.  You are thinking of the ASCII $27 character, however you are 
actually using the Unicode $2019 character instead (which is evident by the 
fact that when you posted the character here, your newsreader encoded the 
character using UTF-8 as $E2 $80 $99).  Since $2019 is not an ASCII/Latin-1 
character, its encoding in Ansi is dependant on the particular codepage that 
is used (on my machine, that character is encoded as $92).

> which should be handled by TIdTextEncoding.Default.

Your machine is not using a default codpage that has that character, hense 
why it is being changed to "?".

> If I modify IdGlobal.Indy8BitEncoding() to use the codepage 1252
> instead of the 28591 one, the string is received correctly.

We can't use codepage 1252 for Indy8BitEncoding().  We already were 
originally, and had to change it to fix other problems it caused.  Read the 
comments in the source code:

// We need a class that converts UFT-16 codeunits in the $00-$FF range
// to/from their numeric values as-is.  Was previously using "Windows-1252"
// (codepage 1252) for that, which does so for most codeunits, however
// codeunits $80-$9F in Windows-1252 map to different codepoints in Unicode,
// which we don't want.  "ISO 8859-1" (codepage 28591), on the other hand,
// treats codeunits $00-$FF as-is, and seems to be just as widely supported
// as codepage 1252 on most systems, so we'll use that for now...

There is also a TODO in that same code section to eventually replace 
codepage 28591 with a custom TEncoding descendant class to handle 8bit data 
manually

> Nevertheless, reading the comment on the top of the function, using the
> codepage 28591 seems to be a deliberate choice.

Yes, it is.

> What is the exact purpose of Indy8BitEncoding() ?

To allow Indy to support 8-bit character data.  Most email systems require 
character data to be restricted to 7-bit (which is why encoding algorithms 
like base64 and quoted-printable exist), however some newer systems support 
8-bit characters (which is necessarily for transmitting UTF-8 and binary 
data natively without further encoding).  D2009's SysUtils.TEncoding class, 
and .NET's System.Text.Encoding class, do not natively support 8-bit 
character data, and must use a suitable 8-bit character set, such as ISO 
8859-1, to handle it.  By introducing Indy8BitEncoding(), all of Indy's 
string-to-bytes and bytes-to-string handling can support 8-bit data without 
having to process it differently than other kinds of string data.

> What should I do in order to make TIdTextEncoding.Default work ?

Stop trying to send characters that are not supported by your OS's default 
codepage to begin with.

> All seems to work fine when I use TIdTextEncoding.UTF8.

As it should be, especially since you are sending Unicode characters, and 
UTF-8 is a Unicode encoding.

-- 
Remy Lebeau (TeamB)
1
Remy
4/19/2009 7:48:16 AM
Hi,

> No, it is not.  You are thinking of the ASCII $27 character, however you are 
> actually using the Unicode $2019 character instead (which is evident by the 
> fact that when you posted the character here, your newsreader encoded the 
> character using UTF-8 as $E2 $80 $99).  Since $2019 is not an ASCII/Latin-1 
> character, its encoding in Ansi is dependant on the particular codepage that 
> is used (on my machine, that character is encoded as $92).

Indeed this character comes from a file name created by an user of my 
software. He's using Windows XP French, and I don't really understand 
how he could have inserted a such non ANSI character in the filename.

Nevertheless, I could reproduce the problem with this simple snippet of 
code (Indy 10.5.5, Delphi 2007, Vista French) :

  // Client code
  IdTCPClient.IOHandler.DefStringEncoding := TIdTextEncoding.Default;
  try
    IdTCPClient.IOHandler.WriteLn(Chr(145));
  finally
    IdTCPClient.Disconnect;
  end;

  /// Server code
var
  S: string;
begin
  AContext.Connection.IOHandler.DefStringEncoding := 
TIdTextEncoding.Default;
  S := AContext.Connection.IOHandler.ReadLn();
  MessageBox(Handle, PChar(S), nil, 0);

MessageBox() displays a '?' instead of the appropriate Windows-1252 
character.

AFAIK, Chr(145) is not an unicode character. It's a character which 
could be used in any ANSI application, and which was transfered 
properly using Indy 10.2.3. If we must use TIdTextEncoding.UTF8 to 
exchange ANSI messages without alteration under Delphi 2007, that 
should be clearly advised somewhere in the code or in the 
documentation. Or maybe I'm just missing another codepage subtility :-)

Best regards,

A.R.
1
Adrien
4/19/2009 12:19:28 PM
"Adrien Reboisson" <adrien-reboissonATastaseDOTcom@example.org> wrote in 
message news:106339@forums.codegear.com...

> Indeed this character comes from a file name created by an user of my
> software. He's using Windows XP French, and I don't really understand
> how he could have inserted a such non ANSI character in the filename.

XP is a Unicode-based OS.  The user may have only typed a single character 
on the keyboard, but the encoding of that character depends on whether the 
Edit control was created as Ansi or Unicode, and how the keyboard itself is 
configured to handle the user's language.  The apostrophe key on the user's 
keyboard may be mapping the character to Unicode character $2019, or it may 
be mapping it to Ansi character $92 and then the app maps it back to Unicode 
character $2019 before Indy gets it.

> Nevertheless, I could reproduce the problem with this simple snippet
> of code (Indy 10.5.5, Delphi 2007, Vista French) :

That is not an Indy bug, either.  You are passing a Char to a String 
parameter.  In D2009, the RTL's Chr() function converts non-ASCII Ansi 
values to their Unicode equivilents, thus WriteLn() is seeing a fully 
Unicode string value, which is then encoded back to Ansi using 
TIdTextEncoding.Default during transmission.  So right off, you are not 
guaranteed to transmit Chr(145) to the receiving end.  Also, if the 
receiving machine does not use the same default Ansi codepage as the sending 
machine, you are not guaranteed to receive Chr(145), either.  That is why 
you should never rely on TIdTextEncoding.Default when transmitting data 
between machines.  Use a standardized encoding instead, such as UTF-8.

> MessageBox() displays a '?' instead of the appropriate Windows-1252
> character.

That is because the character you tried to send could not be represented in 
Windows-1252.  What likely happened is Chr(145) returned a Unicode value 
that Windows-1252 does not support, so WriteLn() ended up receiving a string 
of '?' to begin with, and that is what got transmitted.  Use a packet 
sniffer, such as Wireshark, to verify that, though.

> AFAIK, Chr(145) is not an unicode character.

Actually, it is.  Character $91 in Windows-1252 maps to character $2018 in 
Unicode:

    cp1252 to Unicode table
    http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

Anything above $7F is outside the ASCII/Latin-1 range of characters, and is 
thus codepage-specific and may or may not (usually not) map to the same 
numeric value in Unicode.  Chr() in D2009 converts codepage-specific values 
to their Unicode equivilents, based on the compiler's codepage.  Marco Cantu 
touches on this issue in his Unicode article:

    White Paper: Delphi and Unicode
    http://edn.embarcadero.com/article/38980

> It's a character which could be used in any ANSI application, and which
> was transfered properly using Indy 10.2.3.

Indy 10.2.3 was not Unicode-aware.  Everything was still AnsiString-based. 
That is no longer the case.  You have to watch out for Ansi-Unicode 
differences now.

-- 
Remy Lebeau (TeamB)
1
Remy
4/20/2009 5:49:15 PM
Reply:

Similar Artilces:

Default Issue Notifications and other Defaults
I'm currently implementing the Issue Tracker in my office and have built out a control that will let Administrators set a default list of notifications on a per project basis. I have found this very useful and am wondering whether something similar will be implemented. Another useful feature I am currently working on is a way to set a default value for each of the 'Pick' controls such that the user creating a new issue will not /have/ to select each of the boxes every time, especially if one of them is used regularly (perhaps 'Assigned To' a certain developer, or default status of new is...

Delphi 2009 automatic bug reports appear as Delphi.NET issues
For automatic bug reports, Delphi 2009 seems to put 'Delphi.NET' into the field for 'Project'. I will try to verify it and enter it in QC. Unfortunately this means I have to use the QC database for 'testing', but will delete my test entries if I have verified the problem. Michael Justin Michael Justin wrote: > For automatic bug reports, Delphi 2009 seems to put 'Delphi.NET' into > the field for 'Project'. AFAIK this is as expected. There is no project "RAD Studio" and so "Delphi.NET" has been choosen as proj...

Delphi 2010: AV when switching from Default to debug layout and when closing Delphi
Hello, I experience strange AVs in rtl140.bpl when switching from the Default to the Debug Layout (e.g. when starting the application) or when closing Delphi 2010. I have a bunch of Addins (Modelmaker Code Explorer, EurekaLog, DDevExtensions, JCL, etc...), components installed. Any ideas/tools to possibly track down where the culprit is? Possibly simply an addin in combination with docked windows etc. Thanks, Thomas > {quote:title=Thomas Steinmaurer wrote:}{quote} > I experience strange AVs in rtl140.bpl when switching from the Default > to the Debug Layout (...

netcat on 12.1 issues ? ..like "-q" default issue (in listen mode -l) ?
Hi I'm trying to get netcat working .. (havent't used it before) It seems the "original" netcat has been removed from Suse 12.1 .. (?) On my 11.4 I got the "old" one wich start listening (I think) when I start it with "-l -p1234", but on 12.1 it seem i only got the openbsd-version wich exits when started with the same options and returning the list of options for nc (like if i used the wrong options). I searched the net for the openbsd issues and found some relating to some issue about a default that was defaulting to 0 and should have bee...

Indy 10.1.15 (Delphi 2007) and POP3 dl issues (continuing IMAP4 thread with message DL issues)...
Hi, Are there known problems with Indy 10.1.15 and POP3 downloads...? I see that the same program compiled with Indy 10.1.15 and Indy 10.5.x behaves differently with POP3 servers and messages... In short, seems Indy 10.1.15 has the same issues with POP3 Indy 10.5.x has with IMAP4 and TidMessage/MessageClient, meaning that some messages or attachments do not get downloaded if there are dots at the begining of the line... Kind Regards, B. "Zlatibor Urosevic" <zlatibor.urosevic@gmail.com> wrote in message news:206328@forums.codegear.com... > In short...

Default language issue
Name: zergius@yahoo.com Email: zergiusatyahoodotcom Product: Minefield Summary: Default language issue Comments: I'd like to have ability to switch between regional locale and English one (I prefer the latter, but can't find how/where to get it). Appreciate if you could fix it. Browser Details: Mozilla/5.0 (Windows; Windows NT 6.1; ru; rv:2.0b2pre) Gecko/20100706 Ant.com Toolbar 2.0.1 Minefield/4.0b2pre From URL: http://hendrix.mozilla.org/ Note to readers: Hendrix gives no expectation of a response to this feedback but if you wish to provide one you must BCC...

Making default issue
Window XP sp2, 2 accounts on the laptop, one for me, one for the bride. Using FF 2.0.1, I like FF and want it as the default browser, my wife prefers IE7. When I make it default, it makes it default on her user, if she changes hers to IE, it changes mine as well. Can this not be set seperately across different user accounts on one PC? Thanks. Ron P wrote: > Window XP sp2, 2 accounts on the laptop, one for me, one for the bride. > Using FF 2.0.1, I like FF and want it as the default browser, my wife > prefers IE7. When I make it default, it makes it default on h...

Rounding Issues in Delphi
Why delphi doesn't round well? Anybody there who have solutions? How can i round 28.255 - 28.259 as 28.26. How can i round 28.250 - 28.254 as 28.25? Roundto doesn't round well, I also use getting the fraction of it then add 0.01 if needed. Anyone who done this rounding problem accurately? Thanks. I think you need to read this http://docs.sun.com/source/806-3568/ncg_goldberg.html To summarise, computers don't handle float numbers very well. You probably need to change the datatype to currency or something else. > {quote:title=Jojo de la Cuesta wrote:}{qu...

Delphi Connection Issue?
I am creating a new ASE 11.9.2 on NT. This server will be used with a Borland Delphi client. In the past we have always named our ASE the same as the server/network name. So, for ex, the box is named server1 and the ASE is named server1. Now we want to name the box serverxx and the ASE on box serverxx something else, for example asexx. This works fine with creating ASE, just using the defaults. I can also connect via SQL Advantage with no problem. However, I can't get the BDE Delphi administrator to connect to ASE. Does anyone have any ideas? Is this even feasibl...

Default button issue
 On every page of my site there is a textbox with a button next to it which links to a site search page.When a user is filling out a form on one of the pages and presses return, the search button is triggered rather than the contact form for example.I know I can put a panel around all my forms and specify the DefaultButton, but this would be very time consuming as the site contains 100s of sites.Is there a way I can put something like NotDefaultButton="SearchButton" for example?Thanks for your time,Curt. Regards, Curt Set the property UseSubmitBehavior ="false" Kam...

Default Font Issues
Posting this for a client: Thunderbird 3, Windows Vista. Default font setting (Verdana) works for default account, but selecting, from the dropdown, any other account to send from (whether account or identity) causes body font to go back to 'variable width'. All accounts and identities have plain text signature files and are set to compose in html -- Jay Garcia - Netscape/Flock Champion www.ufaq.org Netscape - Flock - Firefox - Thunderbird - Seamonkey Support On 3/28/2010 9:45 PM, Jay Garcia wrote: > Posting this for a client: > > Thunderbird 3, Windows...

Default Button Issue
Hi Everyone I have a asp Panel with few textboxes and two ImageButtons - imgbtnSave and imgbtnCancel Now what i want that whenever the user presses enter button as long as he is inside that asp Panel, the default button (say I want "imgbtnSave" as Default Button) works. Note : -  I am using my own JavaScript code for DateValidation. So if i use inbuilt defaultButton property of asp Panel then DateValidation fails. I cant change the DateValidation function now as my project is on deliverey stage. So i want JavaScript Code for Default Button Thanx in advance   &n...

Default button issue
I have a web page consiting of 2 sections: the 1st section is the top menu bar containing search textbox , go button and other buttons ex. login and Welcome and a 2nd page for my content.   When i click on login, the login page displays and the 1st section is always there. After entering username & password i press enter, the Welcome button click even is raised instead instead of the login page button.   i have the login controls inside a panel with default button the login. But still the Welcome button is raised instead of the login!! What do i do?    In...

default email issues
Name: Katherine Buell Email: kbuellatongwanadadotcom Product: SeaMonkey Summary: default email issues Comments: My agency uses "FirstClass" as it's email service ...I've tried the "about:config" approach to identify it as the email of preference and I can't seem to get it to work for me... any suggestions? Browser Details: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.9) Gecko/20061211 SeaMonkey/1.0.7 ...

Web resources about - Issue with TIdTextEncoding.Default - embarcadero.delphi.winsock

Resources last updated: 12/13/2015 1:19:23 PM