Hi, I'm using the last snapshot of Indy 10 (Rev 3539) with Delphi 2007 under a French version of Windows Vista. I just discovered that some characters wasn't transferred correctly, probably since the Unicode update (AFAIK that worked fine with Indy 10.2.3). They are replaced by '?' on the other end of the connection. For instance, this code : procedure TForm2.Button1Click(Sender: TObject); begin IdTCPClient1.Connect; IdTCPClient1.IOHandler.WriteLn('£'); end; procedure TForm2.IdTCPServer1Execute(AContext: TIdContext); var S: string; begin S := AContext.Connection.IOHandler.ReadLn; MessageBox(Handle, PChar(S), nil, 0); end; The string '?' is displayed by MessageBox, instead of '£'. I suppose there is something wrong with the encoding stuff on Delphi versions which uses ANSI as internal string type. Anyway, that's a blocking issue (at least for me) since data exchanged between peers can be altered silently. Any hint appreciated ! Best regards, A.R.
![]() |
0 |
![]() |
"Adrien Reboisson" <adrien-reboissonATastaseDOTcom@example.org> wrote in message news:95600@forums.codegear.com... > IdTCPClient1.IOHandler.WriteLn('£'); You are not specifying any Encoding for the string, so Indy won't know how to convert it correctly to/from Unicode internally. Indy 10.5.5 uses 7-bit ASCII by default when no encoding to specified, however that character cannot be represented in ASCII, thus it gets converted to '?' prior to sending. You need to specify an Encoding, either in the call to WriteLn() itself, ie: {code:delphi} // TIdTextEncoding.Default uses the OS default codepage instead of ASCII IdTCPClient1.IOHandler.WriteLn('£', TIdTextEncoding.Default); {code} Or via the TIdIOHandler.DefStringEncoding property instead, ie: {code} IdTCPClient1.IOHandler.DefStringEncoding := TIdTextEncoding.Default; IdTCPClient1.IOHandler.WriteLn('£'); {code} > S := AContext.Connection.IOHandler.ReadLn; Same issue. You have to specify an Encoding so Indy knows how to interpret the bytes it receives so it can convert them to String properly, ie: {code:delphi} S := AContext.Connection.IOHandler.ReadLn(TIdTextEncoding.Default); {code} {code:delphi} AContext.Connection.IOHandler.DefStringEncoding := TIdTextEncoding.Default; S := AContext.Connection.IOHandler.ReadLn; {code} You can use whatever Encoding you want. However, if the client and server are running on different machines that use different default codepages at the OS level, then you will have to use an Encoding that is portable, such as UTF-8, ie: {code:delphi} IdTCPClient1.IOHandler.WriteLn('£', TIdTextEncoding.UTF8); // or: // IdTCPClient1.IOHandler.DefStringEncoding := TIdTextEncoding.UTF8; // IdTCPClient1.IOHandler.WriteLn('£'); {code} {code:delphi} S := AContext.Connection.IOHandler.ReadLn(TIdTextEncoding.UTF8); // or: // AContext.Connection.IOHandler.DefStringEncoding := TIdTextEncoding.UTF8; // S := AContext.Connection.IOHandler.ReadLn; {code} -- Remy Lebeau (TeamB)
![]() |
0 |
![]() |
Hi ! > You are not specifying any Encoding for the string, so Indy won't know how to > convert it correctly to/from Unicode internally. I'm using Delphi 2007, so no strings should be converted from/to Unicode. Or does Indy use Unicode internally instead of letting the string type "floating" automaticaly from ANSI to Unicode ? > Indy 10.5.5 uses 7-bit ASCII by default when no encoding to specified, > however that character cannot be represented in ASCII, > thus it gets converted to '?' prior to sending. I don't think using a 7-bit ASCII encoding by default is a really good idea (at least for us, the Europeans !). By doing this you're broking all the code that was written for older Indy versions by altering common European characters such as à,é,à,ê, etc. I understand the technical reasons behind this reason, but IMHO compatibility should be kept for non-unicode users that just expect having a same and consistent I/O behavior accross Indy versions. Why not using TIdTextEncoding.Default as default text encoder ? > You can use whatever Encoding you want. However, if the client and server > are running on different machines that use different default codepages at the > OS level, then you will have to use an Encoding that is portable, such as > UTF-8. UTF-8 seems good for me but I can't change the encodings without asking all my users to upgrade their clients and their servers, since non UTF-8 text can't be read by an UTF-8 IOHandler, and vice-versa. That's not possible now, since the next major (and breaking) release of my product is too far from now. With Indy 10.2.3 (and all the other snapshots I used since five years !), I was able to exchange ANSI text between various platforms and operating system without issues. When you say that I should use a "portable" encoding, does it mean that this old way to send strings wasn't "safe" ? If I want to revert to the "old" Indy behavior (until migrating to D2009 or using the UTF8 encoder), is TIdTextEncoding.Default the encoding I'm searching for ? Thanks for your help ! Adrien
![]() |
0 |
![]() |
"Adrien Reboisson" <adrien-reboissonATastaseDOTcom@lala.com> wrote in message news:96026@forums.codegear.com... > I'm using Delphi 2007, so no strings should be converted > from/to Unicode. Actually, they are. Indy 10 is Unicode-enabled now. On D2007 and earlier versions, AnsiStrings get internally converted to/from WideString to ensure proper handling of encodings, such as UTF-8, that require Unicode handling. > I don't think using a 7-bit ASCII encoding by default is a really > good idea (at least for us, the Europeans !). Most internet protocols are based on 7-bit ASCII, and require special handling for non-ASCII characters. That is why Indy uses ASCII as its default encoding when you do not specify your own encoding. All of Indy's string-based reading/writing operations allow an encoding to be specified, though. For backwards compatibility with legacy code, setting the TIdIOHandler.DefStringEncoding property would be the simpliest change to make to your code. That encoding will then be used as the default by all the IOHandler's String-based methods. > By doing this you're broking all the code that was written for older Indy > versions by altering common European characters such as à,é,à,ê, etc. Most internet protocols do not support those characters to begin with. Such characters usually have to be encoded as UTF-8 or other negotiated encoding, which means converting them to/from proper Unicode in between. If you are writing your own protocol, you have to take internationalization into account, so you are better off explicitally specifying encodings as needed. > Why not using TIdTextEncoding.Default as default text encoder ? See above. > UTF-8 seems good for me but I can't change the encodings without > asking all my users to upgrade their clients and their servers, since non > UTF-8 text can't be read by an UTF-8 IOHandler, and vice-versa. > That's not possible now, since the next major (and breaking) release of > my product is too far from now. If you are migrating an existing project to a new version of Indy, then you have to release new versions of your client/server software anyway. Otherwise, you will have to stick with the older version you were using before. > With Indy 10.2.3 (and all the other snapshots I used since five years > !), I was able to exchange ANSI text between various platforms and > operating system without issues. 10.2.3 did not support Unicode generically yet. It had some limited support for UTF-8 only. > When you say that I should use a "portable" encoding, does it mean that > this old way to send strings wasn't "safe" ? Not really, no. Not when internationalization was concerned, anyway. Older versions of Indy essentially treated AnsiString values as raw byte buffers without any regard to how the character values were encoded by the RTL/OS. The raw memory of the AnsiString contents was transmitted and received as-is. If the two parties were not using the same language, the AnsiString values may not be interpretted correctly from one machine to the next. In D2009, CodeGear changed AnsiString to be codepage-aware now (which has some big impacts on legacy code if you are not careful). Indy 10.5.x targets D2009 primarily, since we needed a new Unicode-enabled version to support all of the RTL changes that D2009 introduces. Along the way, we have been trying to back-port Indy 10's new functionality to work in D2007 and older versions as much as we can. But there have been some bumps along the way because of the differences between Unicode and Ansi handling. The issue you ran into is one of them. > If I want to revert to the "old" Indy behavior (until migrating to D2009 > or using the > UTF8 encoder), is TIdTextEncoding.Default the encoding I'm searching for ? Yes. -- Remy Lebeau (TeamB)
![]() |
0 |
![]() |
Hi, > For backwards compatibility with legacy code, setting the > TIdIOHandler.DefStringEncoding property would be the simpliest change to make > to your code. That encoding will then be used as the default by all the > IOHandler's String-based methods. OK, thanks. > In D2009, CodeGear changed AnsiString to be codepage-aware now (which has > some big impacts on legacy code if you are not careful). Do you have some links or tutorials where I can get more information about potential issues related to this change ? > But there have been some bumps along the way because of > the differences between Unicode and Ansi handling. The issue you ran into is > one of them. Is there other issues I should be aware of, using Indy 10.5.5 and D2007 ? (I'm only using custom descendants of TIdTCPClient and TIdTCPServer). Thanks for your help ! Adrien
![]() |
0 |
![]() |
"Adrien Reboisson" <adrien-reboissonATastaseDOTcom@example.org> wrote in message news:96084@forums.codegear.com... > Do you have some links or tutorials where I can get more > information about potential issues related to this change ? Look at the various D2009 Unicode documents that are available on CDN. -- Remy Lebeau (TeamB)
![]() |
0 |
![]() |
Hi, A last question :-) > You can use whatever Encoding you want. However, if the client and server > are running on different machines that use different default codepages at the > OS level, then you will have to use an Encoding that is portable, such as > UTF-8, ie: Suppose that : - Both peers have been compiled with Delphi 2007 and Indy 10.5.5, - They don't share the same codepage at the OS level. That quite unusual in my case (client and servers should all be using a French build of Windows), but I'm trying to understand what happens in "corner cases" (i.e. Turkish Windows user using my software and speaking to a French server). If the client send a character which cannot be translated to the server codepage, is there any difference if I use TIdTextEncoding.UTF8 or TIdTextEncoding.Default ? Does TIdTextEncoding.UTF8 detects the codepage used by the peer to send the string, then set the appropriate codepage when decoding the text ? As you said that AnsiStrings on D2007 aren't "code page aware", it seems to be impossible. So as a result, running a client and a server on two different ANSI codepages build with D2007 (for instance, a French client [1252] connected to a Greek [1253] server) is not safe, and characters not supported by one peer are lost when decoded on the other end of the connection -- even if I use TIdTextEncoding.UTF8 or TIdTextEncoding.Default. Is that true ? So in this case, why do you say that TIdTextEncoding.UTF8 is "portable" ? Sorry if this post is meaningless, but I'm very new at Windows ANSI codepages :-) Thanks. Adrien
![]() |
0 |
![]() |
"Adrien Reboisson" <adrien-reboissonATastaseDOTcom@example.org> wrote in message news:96519@forums.codegear.com... > Suppose that : > > - Both peers have been compiled with Delphi 2007 and Indy 10.5.5, > - They don't share the same codepage at the OS level. That quite > unusual in my case (client and servers should all be using a French > build of Windows), but I'm trying to understand what happens in > "corner cases" (i.e. Turkish Windows user using my software and > speaking to a French server). They would not be able to handle each other's string data correctly, unless they both have French or both have Turkish installed and if you updated your code to explicitally use one or the other. This is why most Internet protocols only use ASCII or UTF-8 for maximum interoperability, and provide mechanisms for describing a given text block's actual character set in other situations. > If the client send a character which cannot be translated to the server > codepage, is there any difference if I use TIdTextEncoding.UTF8 or > TIdTextEncoding.Default ? Yes. At least, as far as the transmission of the character goes, anyway. If you send the character using TIdTextEncoding.Default, then it will be sent using the default Ansi codepage that is local to the sending machine. The receiver would have to understand that particular Ansi codepage in order to interpret the character correctly. The Euro symbol is a good example of why relying on Ansi data can be troublesome when exchanging data between machines. In the Windows-1252 charset (codepage 1252), the Euro symbol has a numeric value of $80. In the IBM850 charset (codepage 850), it has a numeric value of $D5 instead. In many other Ansi codepages, they don't even have a Euro symbol defined. If the Euro symbol were sent and received using TIdTextEncoding.Default, the meaning of the numeric value $80 is relative to whatever Ansi codepage each machine is using. If those codepages do not match, then the receiver would not interpret the received character as a Euro symbol. On the other hand, if you send the character using TIdTextEncoding.UTF8, the character is first translated to its standardized Unicode numeric value ($20AC) before then being encoded to UTF-8. If the receiver then receives the character using TIdTextEncoding.UTF8, it would receive character $20AC correctly. In the case of D2007 and earlier, that value would then get converted to its proper Ansi equivilent for that machine, either the correct Ansi numeric value for the Euro symbol, or the '?' character if the Euro symbol is not known to the default Ansi codepage. > Does TIdTextEncoding.UTF8 detects the codepage used by the peer to > send the string TIdTextEncoding.UTF8 deals with standardized Unicode, not Ansi. All of TIdTextEncoding's parameters deal with Unicode strings only, even in D2007 and earlier. That means the RTL, not Indy, automatically converts string values from Ansi to Unicode before passing the Unicode data to TIdTextEncoding, and likewise automatically converts string values from Unicode to Ansi when assigning TIdTextEncoding's return values to AnsiString variables. > So as a result, running a client and a server on two different ANSI > codepages build with D2007 (for instance, a French client [1252] > connected to a Greek [1253] server) is not safe Not really, no. When language differences are an issue, you pretty much have to use Unicode and its defined encodings (UTF-7, UTF-8, UTF-16). That is what Unicode is all about - one common standard for all languages. > characters not supported by one peer are lost when decoded on the > other end of the connection -- even if I use TIdTextEncoding.UTF8 > or TIdTextEncoding.Default. Is that true ? Potentially, since Indy is mostly AnsiString-based in D2007 and earlier. At least if you use UTF-8, you might have a lesser chance of data loss, as characters in one language may have suitable equivilents in the other language, and converting to Unicode in between allows for that conversion to be attempted. > So in this case, why do you say that TIdTextEncoding.UTF8 is "portable" ? Because it is - at the transmission layer, anyway. What you send is what you get. If you stick with Ansi encodings, you lose the guarantee that the bytes will be interpretted correctly, unless both parties are using the same language. > Sorry if this post is meaningless, but I'm very new at Windows ANSI > codepages :-) Most people don't pay attention to codepages until they have to deal with Unicode and internationalization. -- Remy Lebeau (TeamB)
![]() |
0 |
![]() |
Okay, thank you for your answer. It's a lot clearer now. If you come in Paris one day, drop me a mail, you deserve a loooot of free beers :-) Adrien
![]() |
0 |
![]() |
Hello Remy, I know that here is not the correct place for my question. I need to develop a software that controls cameras ip, for the university to work, but I am new to delphi. I am looking for someone with experience who can help me, or even pass me some tips. you have experience in system control IP cameras? Know where to start? Miguel > (Quote: title = Remy Lebeau (TeamB) wrote:) (quote) > "Adrien Reboisson" <adrien-reboissonATastaseDOTcom@example.org> escreveu em > Mensagem news: 96519@forums.codegear.com ... > >> Suponha que: >> >> - Os dois colegas foram compilados com o Delphi 2007 e Indy 10.5.5, >> - Eles não compartilham a mesma página, ao nível operacional. Que muito >> Incomum no meu caso (cliente e todos os servidores devem estar usando um francês >> Compilação do Windows), mas estou tentando entender o que acontece no >> "Canto" casos (isto é usuário do Windows turco usando o meu software e >> Falando com um servidor francês). > > Eles não seriam capazes de lidar com os dados da outra seqüência corretamente, a menos > Ambos têm o francês ou ambos têm turco instalado e se você atualizou o seu > Explicitally código para usar um ou o outro. É por isso que a maioria dos Internet Protocolos> apenas usar ASCII ou UTF-8 para o máximo de interoperabilidade, e fornecer Mecanismos de> caráter real para descrever um bloco de texto, dado o conjunto de outros Situações>. > >> Se o cliente enviar um caráter que não podem ser traduzidos para o servidor >> Codepage, há alguma diferença se eu usar TIdTextEncoding.UTF8 ou >> TIdTextEncoding.Default? > > Sim. Pelo menos, no que diz respeito à transmissão dos caracteres vai, de qualquer maneira. > Se você enviar o personagem usando TIdTextEncoding.Default, então será > Enviada utilizando o padrão ANSI codepage que é local para a máquina de envio. > O receptor teria que entender que codepage Ansi especial a fim > Para interpretar o personagem corretamente. > > O símbolo do Euro é um bom exemplo de porque se baseia em dados Ansi pode ser > Problemático quando a troca de dados entre máquinas. No Windows-1252 > Charset (codepage 1252), o símbolo do euro tem um valor numérico de R $ 80. No > Charset ibm850 (codepage 850), tem um valor numérico de R $ D5 vez. Em > Muitas outras páginas de código ANSI, eles não têm sequer um símbolo do Euro definido. Se > O símbolo do euro foram enviados e recebidos através TIdTextEncoding.Default, o > Significado do valor numérico é 80 dólares em relação ao que quer que cada página de códigos ANSI > Máquina está usando. Se essas páginas de código não forem iguais, então o receptor > Não interpretar o personagem recebido como um símbolo do Euro. > > Por outro lado, se você enviar o personagem usando TIdTextEncoding.UTF8, o > Primeiro caractere é traduzida para o seu valor numérico Unicode padronizado > ($ 20AC), em seguida, antes de ser codificado como UTF-8. Se o receptor recebe então > O personagem usando TIdTextEncoding.UTF8, ele receberia caráter $ 20AC > Corretamente. No caso do D2007 e versões anteriores, esse valor seria então > Convertido em seu equivalente ANSI adequado para essa máquina, seja o correto > Ansi valor numérico para o símbolo do Euro, ou o '? personagem se o Euro > Símbolo não é conhecida a página de códigos padrão ANSI. > >> O TIdTextEncoding.UTF8 detecta o codepage utilizado pelo peer-to - >> Enviar a string > > Promoções TIdTextEncoding.UTF8 com Unicode padronizado, não Ansi. Todos > TIdTextEncoding de acordo com parâmetros de cadeias Unicode só, mesmo no D2007 > E anteriores. Isso significa que a RTL, não Indy, automaticamente converte string > Valores de ANSI para Unicode antes de passar os dados para Unicode > TIdTextEncoding, e também converte automaticamente valores de cadeia de > Unicode para ANSI quando TIdTextEncoding atribuindo valores de retorno para AnsiString Variáveis>. > >> Assim, como resultado, executando um cliente e um servidor em duas diferentes ANSI >> Codepages construir com D2007 (por exemplo, um cliente francês [1252] >> Conectado a um grego [1253 servidor]) não é seguro > > Na verdade, não. Quando as diferenças linguísticas são um problema, é muito bonito > Tem que usar Unicode e suas codificações definidas (UTF-7, UTF-8, UTF-16). Aquele > É o Unicode é toda sobre - um padrão comum para todos os idiomas. > >> Caracteres não suportados por um par são perdidos quando decodificada no >> Outra extremidade da conexão - mesmo se eu usar TIdTextEncoding.UTF8 >> Ou TIdTextEncoding.Default. Isso é verdade? > > Potencialmente, desde Indy AnsiString é principalmente baseado em D2007 e anteriores. Em Pelo menos> se você usar UTF-8, você pode ter uma menor chance de perda de dados, como Caracteres> em uma língua pode ter equivilents adequada nos outros > Linguagem, e converter-se entre Unicode permite que a conversão em > Ser tentada. > >> Assim, neste caso, por que você diz que TIdTextEncoding.UTF8 é "portátil"? > > Porque é - na camada de transporte, de qualquer maneira. O que você enviar é o que > Você começa. Se você ficar com codificações Ansi, você perde a garantia de que o > Bytes será interpretted corretamente, a menos que ambos os partidos estão usando a mesma > Linguagem. > >> Desculpe se este post não faz sentido, mas eu sou muito novo no Windows ANSI >> :-) Codepages > > A maioria das pessoas não prestam atenção às páginas de código até que eles têm de lidar com > Unicode e internacionalização. > > -- > Remy Lebeau (TeamB)
![]() |
0 |
![]() |