TIdHTTPServer, dealing with special UTF8

I'm using TIdHTTPServer to receive requests from a cloud service that "forwards"
mobile texting/SMS information.
Most of the time, the texting can be handled with Unicode.

It works fine for international languages, as long as 

AResponseInfo.Charset := 'utf-8';
when I reply back with international characters.

When I receive special UTF8 characters like a "happy face" emoji or emoticon, 
they are represented with 4 bytes of utf8.  The raw form parameters show that they are %encoded.

var body: string;
body := ARequestInfo.Params.Values['Body'];

this converts each utf8 byte into a 2 byte element where one byte is 0 and the other byte is the utf8 byte.

I'm assuming that this was the intended way to handle special utf8?

If I respond by AResponse.ContextText := body and AResponse.Charset := ''; // blank
then it works.  The cloud service receives the 4 bytes of utf8 representing a "happy face" emoticon.

But, I'm forced to respond with charset = blank.  Therefore I can't mix in Unicode international chars with the "happy face".

Are there any helper functions to convert Unicode into utf8 but stored as 2 bytes each where 1 byte is 0?
It's storing individual utf8 bytes but as a single 2 byte element in Unicode.

Hope I explained this well.

Joe
0
Joe
5/6/2015 7:18:21 PM
embarcadero.delphi.winsock 1874 articles. 2 followers. Follow

2 Replies
602 Views

Similar Articles

[PageSpeed] 47

Joe wrote:

> The raw form parameters show that they are %encoded.

As they should be.

> var body: string;
> body := ARequestInfo.Params.Values['Body'];
> this converts each utf8 byte into a 2 byte element where one byte
> is 0 and the other byte is the utf8 byte.

No, it actually decodes the %encoding to raw bytes and then decodes those 
to UTF-16 using whatever charset is specified in ARequestInfo.Charset.  If 
the Charset is blank, the bytes are converted as-is to 16bit values, which 
is roughly equivilent to ISO-8859-1, which handles most (but not all) Unicode 
codepoints <= $00FF.

If the cloud service is actually sending UTF-8, but is not sending a "Content-Type" 
header that specifies UTF-8, then TIdHTTPServer will not correctly decode 
non-ASCII characters > $7F.  This is a known limitation of TIdHTTPServer. 
 In that situation, you would have to either:

1. ignore ARequestInfo.Params and decode ARequestInfo.UnparsedParams or ARequestInfo.QueryParams 
manually (look at the source code for TIdHTTPRequestInfo.DecodeAndSetParams()), 
depending on where the parameters are coming from.

2. manually set ARequestInfo.Charset to 'utf-8' and then call ARequestInfo.DecodeAndSetParams() 
to re-populate ARequestInfo.Params using UTF-8 decoding.  However, DecodeAndSetParams() 
is protected, so you would have to use an accessor class to reach it.

> I'm assuming that this was the intended way to handle special utf8?

There is no such thing as "special UTF-8".  It is ordinary UTF-8, being url-encoded 
during transmission.  Nothing more.

> If I respond by AResponse.ContextText := body and
> AResponse.Charset := ''; // blank
> then it works.

That depends on what you are setting AResponseInfo.ContentType to.

If you do not set AResponseInfo.ContentType at all, TIdHTTPServer will default 
it to 'text/html', and default AResponseInfo.Charset to 'ISO-8859-1'.

If you set AResponseInfo.ContentType to a 'text/...' media type, a default 
value will be assigned to AResponseInfo.Charset if no explicit charset is 
specified in the ContentType value. The default charset is usually 'ISO-8859-1', 
except for a few select XML-related types where 'us-ascii' is used instead.

If you set AResponseInfo.ContentType to a non-'text/...' media type, the 
AResponseInfo.Charset is not implicitly assigned a default value.

If AResponseInfo.Charset is blank when TIdHTTPServer is encoding the ContentText, 
it will get encoded as 8bit binary, where characters for Unicode codepoints 
<= $00FF will be encoded as-is as 8bit octets <= $FF, and higher codepoints 
will be encoded as $3F ('?'). This encoding is roughly equivilent to ISO-8859-1, 
just with some extra octets that ISO-8859-1 does not natively support.

> The cloud service receives the 4 bytes of utf8 representing a "happy face" 
emoticon.

TIdHTTPServer will not send back a UTF-8 encoded response unless you explicitly 
set the AResponseInfo.Charset to 'utf-8'.

> But, I'm forced to respond with charset = blank.

No, you are not.  Use proper charsets.

> Therefore I can't mix in Unicode international chars with the "happy face".

Yes, you can.

> Are there any helper functions to convert Unicode into utf8 but stored
> as 2 bytes each where 1 byte is 0? It's storing individual utf8 bytes but
> as a single 2 byte element in Unicode.

That is not a valid encoding format, nor is it what the cloud service is 
going to expect.

-- 
Remy Lebeau (TeamB)
0
Remy
5/6/2015 9:09:07 PM
2. manually set ARequestInfo.Charset to 'utf-8' and then call ARequestInfo.DecodeAndSetParams() 
to re-populate ARequestInfo.Params using UTF-8 decoding. However, DecodeAndSetParams() 
is protected, so you would have to use an accessor class to reach it.

Yup.  The cloud service was not setting charset to utf8 on requests.
So I did what you said above.  
It worked.

(no idea why responding with no charset and each utf8 byte within a 2 byte unicode made it work … although it felt not proper)

Thanks
Joe
0
Joe
5/6/2015 9:43:40 PM
Reply:

Similar Artilces:

dealing with UTF8 text
Hi old friends (and new), I'm quite enjoying getting back to scripting, and like Perl a lot, especially with Affrus. While I'm probably inefficient, it's nice to have a language actually designed for text processing (search engine logs, in my case). However, I've got some Unicode issues and that seems to be platform-specific, so thought I'd ask here. I've done enough research to know that I should avoid hardcoded counting with positions and use the perl functions which will automatically handle utf8 characters properly. That's cool. I'm p...

Special Comments in DELPHI
I am looking to add special comments in Delphi Source Code. This is along the lines of "TODO" comments and the IDE (XE4) has a "Task List" option which lists all "TODO's". Are there any plugins available where I can add some special comment tags for other purposes. For e.g if I want to mention and measure complex piece of code, I should comment the portion with my TAG and then using the plugin I can gather some statistics (simple count) ...

UTF8, UTF-8, utf8, Utf8 encoding blues
Hi All, I'm reading loads, and loads of very confusing and contradicting information about UTF8 in Perl. A lot of posts are also (rightfully IMHO) stating that UTF8 is an absolute nightmare in Perl. Can someone shed some light as to what is going on here please: use Encoding; SysLog("debug", "1 - DEBUG LENGTH: " . length($Response)); my $unicode_chars = Encode::decode('utf8', $Response); SysLog("debug", "** ENCODING: " . find_encoding($Response)); my $newunicode_chars = substr($unicode_chars, 0, -3); my $Body = $newunicode...

Create a special DLL with Delphi
Hello people, I'm reading a C++ doc., about dll's, and I'm found someting interesting at all... A dll can be made exporting functions that are on another dll... For example, A.dll, have function1, function2 and function3... I want to make a B.dll, that exports function1 and function2 from A.dll, and function3 is 'write' directly and exported... like "overriding" the another dll. This is possible with Delphi? "Sergio Gianezini" <sergio_ag@terra.com.br> wrote in message news:253801@forums.embarcadero.com... > Hello people, I'm r...

Delphi XE broken UTF8
Hi, I am using the Rad Studio XE trial edition to see how hard an upgrade of our project will be. Getting to compile and link was pretty easy. However, the UnicodeToUtf8 function appears to be badly broken making our app unusable. Please can someone with the full release confirm this and maybe go into the System source file (not available in trial version) to analyse? It is easily reproducable (and this code works fine on D2010). The function appears to be not null terminating (as the help promises it will): {code} function UTF8Str(const WS: string): AnsiString; var szTitle:...

Delphi and Delphi for .Net
It seems that Delphi for .Net is slower than Delphi Win32 native applicaiton. I would like to know is it true all .Net application is slower than Win32 native applicaiton or it is Delphi for .Net only. Your information is great appreciated, Inung On 2011-06-21 18:20:17 +0100, Inung Huang said: > It seems that Delphi for .Net is slower than Delphi Win32 native applicaiton. > I would like to know is it true all .Net application is slower than > Win32 native applicaiton or it is Delphi for .Net only. If you are only running the code in the application once then, yes, yo...

Delphi + dbGo +firebird + UTF8
Hi, I have an application made with Delphi 2009 (same problem also on BDS2006) that works in multilanguage. I have a FireBird 2 database From delphi I connect to database using dbGo (ADOConnection) on Firebird ODBC. I'm unable to read or write Polish (same also with russian or other different from ISO8859_1) characters. If I write "leznc" I read "leznc" and so on. Database is UTF8 and I can read and write in Polish using directly database tools or connecting to database from delphi using FIBPlus. What's the problem? DbGo or the Firebird ODBC? ...

Delphi 2009, IdHTTP and UTF8
Hello I'm having a problem after migrating to Delphi 2009 Win32. I need to to a login on a web site and parse the returned data, but the problem is that I can't decode UTF8 characters. In delphi 2007 I used WideStrings and UTF8Decode, although this approach is not working anymore with delphi 2009. I replaced WIdeStrings with strings and UTF8Decode to UTF8ToString method, but still can't read the utf8 string. anyone can help? :( here's the code. var Params: TStringList; s: string; begin Params.Add('login=' + UserName); Params.Add('p...

Delphi + dbGo +firebird + UTF8
Hi, I have an application made with Delphi 2009 (same problem also on BDS2006) that works in multilanguage. I have a FireBird 2 database From delphi I connect to database using dbGo (ADOConnection) on Firebird ODBC. I'm unable to read or write Polish (same also with russian or other different from ISO8859_1) characters. If I write "leznc" I read "leznc" and so on. Database is UTF8 and I can read and write in Polish using directly database tools or connecting to database from delphi using FIBPlus. What's the problem? DbGo or the Firebird ODBC? ...

Delphi 2010 upgrade special
Hi I am curently a registered user of D2009 pro and am considering upgrading the D2010. If I upgrade after 15 May 2010 will I receive the new version of Prism? I want start using Prism and would like to get the latest version (2011). I have emailed EMB sales but got no response so far. Need to make a decision urgently. Thanks Godfrey I'm curious too. I just bought two Enterprise versions (upgraded to Architect). Will I get Prism 2011 this month as part of the upgrade? > {quote:title=Godfrey Fletcher wrote:}{quote} > I am curently a registered user of D...

utf8::upgrade,utf8::encode and utf8::is_utf8 on EBCDIC platform
Hi, This are the tetstcase i'm runing on EBCDIC platform, my $b = chr(0x0FF); $p=utf8::upgrade($b); print "\n$p"; utf8::upgarde returns the number of octets necessary to represent the string as UTF-X. EBCDIC output is 1 whereas ASCII platform output is 2. Is the return value i'm getting on EBCDIC is correct? my $c=chr(0x0FF); print "before $c\n"; print "\n"; utf8::encode($c); print "after $c\n"; print length($c); On ASCII before is single octet repsentation and after encode is two byte , length is 2. On EBCDIC it...

Dealing with special conditions on individual columns
I have a query to write that has several conditions on individual columns. When I mean special conditions, I Mean something like  IF COLUMNA_FLAG (BIT 0) = 1 THEN IF COLUMN19 = 1 THEN   COLUMN6FROMTABLEX WHERE COLUMN5 = COLUMN1FROMTABLEX  I happen to have so many of this sort of conditions and at the end I have to check the values of so many of my columns against values in other tables and other columns in the same table. How best can I handle this sort of situation in SQL?.Net Web/Software Engineer Did you mean something like this? SELECT CASE WHEN COLUMN19 = 1 TH...

Whats the deal with this delphi haters blog?
http://delphihaters.blogspot.com/ There are some complete bull crap posts there, anyone know who is behind it? Just wondering as it popped up when I was doing a search for 64bit delphi and Firemonkey. > {quote:title=Tony Caduto wrote:}{quote} > http://delphihaters.blogspot.com/ > I particularly enjoyed this fine entry: "David I's article want more peace reminded me of the 1960's - when the Roosevelt administration authorized the dropping of the Atomic Bomb." Roosevelt was long dead in the 60s, and it was Truman who authorised dropping it.........

Dealing with 'special characters'
Hi all: Here is my problem. I have a script which processes input from a textarea which may have 'special characters' in it like � or � etc. Unfortunately what I am getting back are these ‘ or ’ respectively. Script snippet start------------------ use CGI qw/:standard/; print "Content-type: text/html\n\n"; $c = new CGI; $ta = $c->param('ta'); open (TIDYFILE,">c:/tmp/tidied.html"); print TIDYFILE $ta; close (TIDYFILE); Script snippet end------------------ Any help appreciated. Howard ...

Web resources about - TIdHTTPServer, dealing with special UTF8 - embarcadero.delphi.winsock

Resources last updated: 12/20/2015 12:20:34 PM