utf8::upgrade,utf8::encode and utf8::is_utf8 on EBCDIC platform

Hi,

 This are the tetstcase i'm runing on EBCDIC platform,

my $b = chr(0x0FF);
$p=utf8::upgrade($b);
print "\n$p";

utf8::upgarde returns the number of octets necessary
to represent the string as UTF-X.

EBCDIC output is 1 whereas ASCII platform output is 2.
Is the return value i'm getting on EBCDIC is correct?


my $c=chr(0x0FF);
print "before $c\n";
print "\n";
utf8::encode($c);
print "after $c\n";
print length($c);

On ASCII before is single octet repsentation and after
encode is two byte , length is 2.

On EBCDIC it is single before and after encode and
length is 1.

Is this correct on EBCDIC or is it a bug in code for
EBCDIC ?

utf::is_utf8 test whether STRING is in UTF-8, so 0x0FF
is UTF-8 on EBCDIC?




		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail
0
akyaseen
9/1/2005 11:16:43 AM
perl.unicode 837 articles. 0 followers. Follow

1 Replies
957 Views

Similar Articles

[PageSpeed] 31

Hello.
I think it is correct.

On EBCDIC platforms, perl uses UTF-EBCDIC instead of UTF-8,
nevertheless perl calls it "utf8."

In general chr(0xFF) (equals to "\xFF") in EBCDIC encodings
corresponds to U+009F, that is a single-octet control character;
thus a single octet sequence "\xFF" is well-form in UTF-EBCDIC too.

If you want to convert an interger to a character according to
Unicode scalar values, you can use pack('U'), but not chr().
For example, pack('U', 0xFF) should correspond to U+00FF
(y with diaeresis), everywhere (both on ASCII and on EBCDIC).

Regards,
SADAHIRO Tomoyuki

> Hi,
> 
>  This are the tetstcase i'm runing on EBCDIC platform,
> 
> my $b = chr(0x0FF);
> $p=utf8::upgrade($b);
> print "\n$p";
> 
> utf8::upgarde returns the number of octets necessary
> to represent the string as UTF-X.
> 
> EBCDIC output is 1 whereas ASCII platform output is 2.
> Is the return value i'm getting on EBCDIC is correct?
> 
> my $c=chr(0x0FF);
> print "before $c\n";
> print "\n";
> utf8::encode($c);
> print "after $c\n";
> print length($c);
> 
> On ASCII before is single octet repsentation and after
> encode is two byte , length is 2.
> 
> On EBCDIC it is single before and after encode and
> length is 1.
> 
> Is this correct on EBCDIC or is it a bug in code for
> EBCDIC ?
> 
> utf::is_utf8 test whether STRING is in UTF-8, so 0x0FF
> is UTF-8 on EBCDIC?



0
bqw10602
9/1/2005 1:07:50 PM
Reply:

Similar Artilces:

UTF8, UTF-8, utf8, Utf8 encoding blues
Hi All, I'm reading loads, and loads of very confusing and contradicting information about UTF8 in Perl. A lot of posts are also (rightfully IMHO) stating that UTF8 is an absolute nightmare in Perl. Can someone shed some light as to what is going on here please: use Encoding; SysLog("debug", "1 - DEBUG LENGTH: " . length($Response)); my $unicode_chars = Encode::decode('utf8', $Response); SysLog("debug", "** ENCODING: " . find_encoding($Response)); my $newunicode_chars = substr($unicode_chars, 0, -3); my $Body = $newunicode...

? should interpolating a utf8-encoded string preserve utf8ness?
Consider my $s = 's'; utf8::upgrade($s); my $b = ":$s:"; $b isn't in utf8. Should it? I suppose one can argue that it shouldn't matter externally. karl williamson wrote: >Subject: ? should interpolating a utf8-encoded string preserve utf8ness? Interpolation should have the freedom to do whatever is more convenient. If the programmer cares about the ultimate encoding of the string, ey should explicitly upgrade or downgrade the resulting string. -zefram On Mon, Dec 13, 2010 at 10:24:25AM -0700, karl williamson wrote: > Consider > >...

use utf8; with bad utf8
Is this supposed to happen? perl -wle 'use utf8; %a = ("�"=>"sterling"); print ord foreach keys %a' Malformed UTF-8 character (2 bytes, need 3) at -e line 1. Possible unintended interpolation of @ܴ in string at -e line 1. Out of memory! [exit code was 1] The two characters in my malformed utf8 are 0xE1 0x80 [I believe. Meta-a Meta-space] Making my utf8 well formed (two meta spaces) and it's all happy, so that bit works. But I've no idea how the black magic in toke mixes with the utf8 black magic, so I don't know where to start on tr...

utf8.pm and the utf8 namespace
Hi, utf8.pm's POD first says that you don't have to load the module in order to use its functions. It even has in B<bold> letters that you should only use the pragma if your source is in UTF-8. But later, it says: > Note that in the Perl 5.8.0 and 5.8.1 implementation the functions > utf8::is_utf8, utf8::valid, utf8::encode, utf8::decode, utf8::upgrade, > and utf8::downgrade are always available, without a C<require utf8> > statement-- this may change in future releases. May this really change in future releases? That'll break a lot of code...

use utf8; <=> use encoding 'utf8';
Apart from the parser bug spotted earlier today, functionally (from the outside at least) and disregarding scoping issues, the following seem equivalent: use utf8; binmode( STDOUT,':utf8' ); and use encoding 'utf8'; The reason I tried the latter, was because the simple program: == simpleutf8 ======================================================== use utf8; my $string = <<EOD; élève EOD print $string; ====================================================================== produces the output: $ perl -w simpleutf8 ...

utf8
hi, I am trying to use perl's Net::LDAP module to manipulate data in eDirectory 8.6.2. We are located in Scandinavia and have many attributes that include utf8 characters. use utf8; use Net::LDAP; use Net::LDAP::LDIF; use Unicode::String qw(latin1 utf8); The following ldap search works fine, and prints output in the desired latin1 charset: $mesg = $ldap->search ( base => "o=org", filter => "(&(objectclass=user)(cn=$cn))" ); foreach $entry ($mesg->...

UTF8
Does anybody know how to catch UTF8 characters coming in from a text box. I've been getting a lot of them from people cutting and pasting information. Barry Jones DATABUILT, Inc. The Global AEC Information Company 1476 Fording Island Rd. Bluffton, SC 29910 (843) 836-2166 office "Life is like a dogsled team; if you ain't the lead dog, the scenery never changes." - Lewis Grizzard Not sure what you mean by UTF8 characters. Do you mean those in the 128-255 range (corresponding to the high half of the ASCII set), such as the accented characters and so forth? ...

UTF8
Does anybody know how to catch UTF8 characters? Barry Jones DATABUILT, Inc. The Global AEC Information Company 1476 Fording Island Rd. Bluffton, SC 29910 (843) 836-2166 office "Life is like a dogsled team; if you ain't the lead dog, the scenery never changes." - Lewis Grizzard ...

UTF8
--------------ms5D28ED689AFA9B1FF125206B Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Dear All,. Now i use perl to interface LDAP. But i have some problems that LDAP's data format is UTF8 but i want to convert UTF8 to ASCII. Do you know perl have function to convert its? If you know or you have a suggestion please tell to me. Regards,. P. Kumsaikaew ===================================================== Piyamart Kumsaikaew National Electronics and Computer Technology Center (NECTEC) Ministry of Science Technology and Environment, Tha...

utf8
Doing cross-compilation from Cross directory. miniperl already done. Now this error: "Can't locate unicore/PVA.pl in @INC" There isn't unicore/PVA.pl in the source. Can i build perl without utf8 support and how? On Sun, Nov 21, 2004 at 05:26:17PM +0200, gumbold <gumbold@bonbon.net> wrote: > Doing cross-compilation from Cross directory. > miniperl already done. > Now this error: > > "Can't locate unicore/PVA.pl in @INC" > > There isn't unicore/PVA.pl in the source. You appear to not be doing everything that ...

UTF8
Powerbuilder 703 10108 Is it possible to read data from a UTF txt. file and put data into a database table? If not. Will pb11 manage this? Roger Nyg�rd I would think you would need PowerBuilder 10 or higher since these are the Unicode aware versions and have capabilities to read and convert the different encodings. I would guess you could come up with a workaround using OLE to have third party component do the conversion. Anyone have any ideas or sample code. Doug Porter DailyAccess Corporation "Roger Nyg�rd" <roger@askit.no> wrote in message ne...

Working with utf8 and non-utf8 clients
HI I�m new and looking for consultation � I have Novell Netware 6.5 with sp7. Clients running under DOS, Windows 95-98-Me-XP-Vista. Server codepage English, station codepage Polish Because of old clients utf8 encoding (windows9x clients don�t support UTF8) was disabled. Now want use utf8 on clients side (vista client haven�t option to disable UTF8) and starts problems witch invisible files and folders � I did try solve it in testing environment by: 1. Change server codepage from cp850 to cp852 (polish) in startup.ncf. 2. Change l_config for Polish. 3. Use th...

more UTF8 test suites and an UTF8 patch
--------------B4DE3C9D378C82F5A85C2570 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Attached are UTF8 test suite and an UTF8 patch for perl@8223. The files in test suite are: t/op/subst_utf8.t, t/op/substr_utf8.t, t/op/regexp_utf8.t + t/op/re_tests.utf8 They are converted from t/op/{subst.t,substr.t,regexp.t,re_tests} simply translating ascii characters to unicode characters. (In fact, they are "FULLWIDTH" characters code FF01-FF5E) The files are UTF8-encoded so you need an UTF8 capable editor/terminal to see it. perl@822...

UTF8 matches in a non-UTF8 string
There might be a bug here, but I think it's a matter of philosophy. Could I have people's intuitive reactions, please: Given $a = v196.172.200 which is a non-UTF8 string, and $b = v300 which is a UTF8 string which just so happens to look like v196.172 in a byte representation, should $a =~ /^$b/ ? Should it require "use bytes" to match? Or "use utf8"? Personally, I don't think it should match at all - but it currently does. Simon ---------------------------------------------------------------- The information t...

Web resources about - utf8::upgrade,utf8::encode and utf8::is_utf8 on EBCDIC platform - perl.unicode

Resources last updated: 12/3/2015 12:36:43 PM