Unicode strings in structured data

I have a packed array of packed records, and the record holds regular numbers of known sizes like integers and singles and then some strings. If there were no strings, this would be easy to store and retrieve or memory map to a file, and individual records could be updated without accessing the entire file. However, the strings (which are unicode) ruins this.

In my existing application, which dates way back in time, the equivalent strings of ascii(byte) characters are just stored as arrays of fixed lengths of AnsiChar, but I must switch to unicode to support Asian text in this new version. Any particular tricks for this?

What if I use PChar or PWideChar in a union with a packed array of byte of sufficient size? Would this work?

Thanks,

Jens.
0
Jens
6/2/2015 3:47:28 PM
embarcadero.delphi.nativeapi 1236 articles. 1 followers. Follow

97 Replies
1069 Views

Similar Articles

[PageSpeed] 44
Get it on Google Play
Get it on Apple App Store

Jens Munk wrote:

> I have a packed array of packed records, and the record holds regular
> numbers of known sizes like integers and singles and then some
> strings. If there were no strings, this would be easy to store and
> retrieve or memory map to a file, and individual records could be
> updated without accessing the entire file. However, the strings
> (which are unicode) ruins this.
> 
> In my existing application, which dates way back in time, the
> equivalent strings of ascii(byte) characters are just stored as
> arrays of fixed lengths of AnsiChar, but I must switch to unicode to
> support Asian text in this new version. Any particular tricks for
> this?

Just use arrays of Widechar for storage, this way you can define a
fixed-size array as part of the records and gain direct indexed record
access in the disk file, like for your old application. The alternative
is to go to a real database engine for storage.

> 
> What if I use PChar or PWideChar in a union with a packed array of
> byte of sufficient size? Would this work?

No, not the way you are thinking probably. But you can take the address
of the first element of such an array and cast it to PWidechar, *if*
you make sure your array is always zero-terminated. For read access
that would work directly. For write access you need to make sure first
that your Unicodestring does not contain more characters than can fit
into the array (including the terminating #0000) and the copy the
characters using System.SysUtils.StrLCopy, which has an overloaded
version for PWidechar.

Do you need to deal with surrogate pairs, code points outside the basic
16 bit Unicode encoding?

The alternative would be to store the actual length of the string
contained in the array into the record as well (a bit like the old
Shortstring type is implemented) and then copy the content of the array
to a String (Unicodestring) using SetString. This way you do not need a
#0000 terminator.




-- 
Peter Below (TeamB)
0
Peter
6/2/2015 5:04:05 PM
On Tue, 2 Jun 2015 08:47:28 -0700, Jens Munk <> wrote:

>I have a packed array of packed records, and the record holds 
>regular numbers of known sizes like integers and singles and 
>then some strings. If there were no strings, this would be 
>easy to store and retrieve or memory map to a file, and individual 
>records could be updated without accessing the entire file. However, 
>the strings (which are unicode) ruins this.
>
>In my existing application, which dates way back in time, the 
>equivalent strings of ascii(byte) characters are just stored as 
>arrays of fixed lengths of AnsiChar, but I must switch to unicode 
>to support Asian text in this new version. Any particular tricks 
>for this?
>
>What if I use PChar or PWideChar in a union with a packed array of 
>byte of sufficient size? Would this work?
>
AFAIK the change to unicode in Delphi was done by redefining the
string type from AnsiString to WideString, i.e. the character codes
are 2 bytes instead of 1 byte in size.
So all of your (fixed length?) strings in the record will double in
size and this makes the new files of this record type incompatible
with old files.
I do not know what will happen for a packed record where some fields
are fixed length strings, but I suspect that the zero bytes will still
persist.

This is not the full story, I am sure, but you will have to wait for
someone with more insight (like Remy) to fill in the voids and
probably correct what I wrote too... ;)

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
0
Bo
6/2/2015 5:07:06 PM
Bo wrote:

> AFAIK the change to unicode in Delphi was done by redefining the
> string type from AnsiString to WideString, i.e. the character codes
> are 2 bytes instead of 1 byte in size.

Also redefining the (P)Char type from (P)AnsiChar to (P)WideChar.

> So all of your (fixed length?) strings in the record will double in
> size

Only if the record was using the generic Char type and not AnsiChar directly.

> this makes the new files of this record type incompatible with old files.

For backwards compatibility with existing data, you would have to define 
the original record as using AnsiChar explicitly, and then define a separate 
record that uses WideChar/UnicodeString instead, converting between the two 
records when loading/saving data.

-- 
Remy Lebeau (TeamB)
0
Remy
6/2/2015 5:46:17 PM
Jens Munk wrote:

> I have a packed array of packed records, and the record holds regular
> numbers of known sizes like integers and singles and then some
> strings. If there were no strings, this would be easy to store and
> retrieve or memory map to a file, and individual records could be
> updated without accessing the entire file. However, the strings
> (which are unicode) ruins this.
> 
> In my existing application, which dates way back in time, the
> equivalent strings of ascii(byte) characters are just stored as
> arrays of fixed lengths of AnsiChar, but I must switch to unicode to
> support Asian text in this new version. Any particular tricks for
> this?
> 
> What if I use PChar or PWideChar in a union with a packed array of
> byte of sufficient size? Would this work?

They did a very poor job of dealing with this.  Byte wide strings (and
arrays of them!) have been an excellent tool for handling binary data
for a very long time.  They could have added unicode without such
turmoil.  Unfortunataly, we got what we got.

Also, beware there are zealots who just conclude we are stupid for
using a tool so well suited for the task because of the name "string"
in the datatype.

That said, for the variables that need binary compatibility you should
declare them as explicit ansistring.  That will take care of 99% of the
problems.

Unfortunately, failing (refusing!) to recognize this use leads to
problems you'll have to guard against.  Not the least of which, even
with functions including Ansi in thier name (ex: AnsiRightStr() and
friends are NOT ansi by default), they have coerced the data types into
text strings.  Maddening.

So, declare as ansistrings, then follow them through the compile
process watching for conversion warnings.  There are versions of
Ansi*() functions available in a special ansi unit that you can add to
your uses. that actually support ansistring but don't assume using a
function with the name has any relevance.

Having to deal with these changes at the level required is fairly
stupid.  Worse, failing (again, it's actually a refusal) to recognize
and extend such a great tool for dealing with binary data is beyond
stupid.

Now they point to some magic beans in the form of array functionality
in XE7/8 as if it's a help.

They don't "get it", and you can count on that to continue.  Standby
for the zealots to  start their lecture on how stupid it is to use
ansistring for binary data...

in 3...2...1...

Dan
0
Dan
6/2/2015 6:07:07 PM
Dan


>Also, beware there are zealots who just conclude we are stupid for
>using a tool so well suited for the task because of the name "string"
>in the datatype.

I don't know the maximum upvotes allowed but +MaxInt should work.

>They don't "get it", and you can count on that to continue. Standby
>for the zealots to start their lecture on how stupid it is to use
>ansistring for binary data...
>
>in 3...2...1...

Bowl of popcorn at the ready :)

Roy Lambert
0
Roy
6/3/2015 7:05:34 AM
Dan


>Also, beware there are zealots who just conclude we are stupid for
>using a tool so well suited for the task because of the name "string"
>in the datatype.

I don't know the maximum upvotes allowed but +MaxInt should work.

>They don't "get it", and you can count on that to continue. Standby
>for the zealots to start their lecture on how stupid it is to use
>ansistring for binary data...
>
>in 3...2...1...

Bowl of popcorn at the ready :)

Roy Lambert
0
Roy
6/3/2015 7:11:32 AM
Dan


>Also, beware there are zealots who just conclude we are stupid for
>using a tool so well suited for the task because of the name "string"
>in the datatype.

I don't know the maximum upvotes allowed but +MaxInt should work.

>They don't "get it", and you can count on that to continue. Standby
>for the zealots to start their lecture on how stupid it is to use
>ansistring for binary data...
>
>in 3...2...1...

Bowl of popcorn at the ready :)

Roy Lambert
0
Roy
6/3/2015 7:20:00 AM
Dan Barclay wrote:

> Jens Munk wrote:
> 
> > I have a packed array of packed records, and the record holds
> > regular numbers of known sizes like integers and singles and then
> > some strings. If there were no strings, this would be easy to store
> > and retrieve or memory map to a file, and individual records could
> > be updated without accessing the entire file. However, the strings
> > (which are unicode) ruins this.
> > 
> > In my existing application, which dates way back in time, the
> > equivalent strings of ascii(byte) characters are just stored as
> > arrays of fixed lengths of AnsiChar, but I must switch to unicode to
> > support Asian text in this new version. Any particular tricks for
> > this?
> > 
> > What if I use PChar or PWideChar in a union with a packed array of
> > byte of sufficient size? Would this work?
> 
> They did a very poor job of dealing with this.  Byte wide strings (and
> arrays of them!) have been an excellent tool for handling binary data
> for a very long time.  They could have added unicode without such
> turmoil.  Unfortunataly, we got what we got.
> 
> Also, beware there are zealots who just conclude we are stupid for
> using a tool so well suited for the task because of the name "string"
> in the datatype.

Yes, I am such a "zealot". Strings are meant to contain text, nothing
else. Using them for something else has already shown to be a bad
choice, when strings got a different internal format. They can still
contain text, and even much better now. But that made them unsuitable
for binary data.

You can conclude that they should not have changed the internal format,
but hey, if nothing ever changed, we would still have short strings.

So, yes, there are a few obstinate unteachables who keep on using the
old Turbo Vision style hacks, just because they never grew up and
realized that hacks should only be used if nothing else works. And
these people blame Embarcadero for changing the data type, instead of
looking into themselves and thinking: was my solution future proof?
Obviously not.

I don't think I ever called anyone an idiot. But it was indeed pretty
foolish (and stubborn) to keep on using that hack without thinking of
removing it and doing the right thing.

Call me a zealot. I never had any problems converting from short to
long strings and I never had any problems converting to UnicodeString,
because I did not use such hacks. I think that is a pretty practical
approach. The last time I used them *was* for Turbo Vision (color
schemes were passed as literals like #01#04#07#00), some 20 or more
years ago.

So don't blame Embarcadero for changing strings. Blame yourself for not
thinking about the consequences of using a hack, in all those many
years you had the opportunity to change things.

-- 
Rudy Velthuis        http://www.rvelthuis.de

Murphy's Fourth Law: If there is a possibility of several things
going wrong, the one that will cause the most damage will be the
one to go wrong.
0
Rudy
6/3/2015 7:41:34 AM
Dan Barclay wrote:

> Jens Munk wrote:
> 
> > I have a packed array of packed records, and the record holds
> > regular numbers of known sizes like integers and singles and then
> > some strings. If there were no strings, this would be easy to store
> > and retrieve or memory map to a file, and individual records could
> > be updated without accessing the entire file. However, the strings
> > (which are unicode) ruins this.
> > 
> > In my existing application, which dates way back in time, the
> > equivalent strings of ascii(byte) characters are just stored as
> > arrays of fixed lengths of AnsiChar, but I must switch to unicode to
> > support Asian text in this new version. Any particular tricks for
> > this?
> > 
> > What if I use PChar or PWideChar in a union with a packed array of
> > byte of sufficient size? Would this work?
> 
> They did a very poor job of dealing with this.  Byte wide strings (and
> arrays of them!) have been an excellent tool for handling binary data
> for a very long time.  They could have added unicode without such
> turmoil.  Unfortunataly, we got what we got.
> 
> Also, beware there are zealots who just conclude we are stupid for
> using a tool so well suited for the task because of the name "string"
> in the datatype.

Yes, I am such a "zealot". Strings are meant to contain text, nothing
else. Using them for something else has already shown to be a bad
choice, when strings got a different internal format. They can still
contain text, and even much better now. But that made them unsuitable
for binary data.

You can conclude that they should not have changed the internal format,
but hey, if nothing ever changed, we would still have short strings.

So, yes, there are a few obstinate unteachables who keep on using the
old Turbo Vision style hacks, just because they never grew up and
realized that hacks should only be used if nothing else works. And
these people blame Embarcadero for changing the data type, instead of
looking into themselves and thinking: was my solution future proof?
Obviously not.

I don't think I ever called anyone an idiot. But it was indeed pretty
foolish (and stubborn) to keep on using that hack without thinking of
removing it and doing the right thing.

Call me a zealot. I never had any problems converting from short to
long strings and I never had any problems converting to UnicodeString,
because I did not use such hacks. I think that is a pretty practical
approach. The last time I used them *was* for Turbo Vision (color
schemes were passed as literals like #01#04#07#00), some 20 or more
years ago.

So don't blame Embarcadero for changing strings. Blame yourself for not
thinking about the consequences of using a hack, in all those many
years you had the opportunity to change things.

-- 
Rudy Velthuis        http://www.rvelthuis.de

Murphy's Fourth Law: If there is a possibility of several things
going wrong, the one that will cause the most damage will be the
one to go wrong.
0
Rudy
6/3/2015 7:42:14 AM
Dan Barclay wrote:

> They did a very poor job of dealing with this.  Byte wide strings (and
> arrays of them!) have been an excellent tool for handling binary data
> for a very long time.  They could have added unicode without such
> turmoil.  Unfortunataly, we got what we got.
> 
> Also, beware there are zealots who just conclude we are stupid for
> using a tool so well suited for the task because of the name "string"
> in the datatype.

Yes, I am such a "zealot". Strings are meant to contain text, nothing
else. Using them for something else has already shown to be a bad
choice, when strings got a different internal format. They can still
contain text, and even much better now. But that made them unsuitable
for binary data.

You can conclude that they should not have changed the internal format,
but hey, if nothing ever changed, we would still have short strings.

So, yes, there are a few obstinate unteachables who keep on using the
old Turbo Vision style hacks, just because they never grew up and
realized that hacks should only be used if nothing else works. And
these people blame Embarcadero for changing the data type, instead of
looking into themselves and thinking: was my solution future proof?
Obviously not.

I don't think I ever called anyone an idiot. But it was indeed pretty
foolish (and stubborn) to keep on using that hack without thinking of
removing it and doing the right thing.

Call me a zealot. I never had any problems converting from short to
long strings and I never had any problems converting to UnicodeString,
because I did not use such hacks. I think that is a pretty practical
approach. The last time I used them *was* for Turbo Vision (color
schemes were passed as literals like #01#04#07#00), some 20 or more
years ago.

So don't blame Embarcadero for changing strings. They are not the ones
who did a poor job. They actually did an excellent job.

Blame yourself for not thinking about the consequences of using a hack,
in all those many years you had the opportunity to change things.

-- 
Rudy Velthuis        http://www.rvelthuis.de

Murphy's Fourth Law: If there is a possibility of several things
going wrong, the one that will cause the most damage will be the
one to go wrong.
0
Rudy
6/3/2015 7:44:53 AM
Dan Barclay wrote:

> They did a very poor job of dealing with this.  Byte wide strings (and
> arrays of them!) have been an excellent tool for handling binary data
> for a very long time.  They could have added unicode without such
> turmoil.  Unfortunataly, we got what we got.
> 
> Also, beware there are zealots who just conclude we are stupid for
> using a tool so well suited for the task because of the name "string"
> in the datatype.

Yes, I am such a "zealot". Strings are meant to contain text, nothing
else. Using them for something else has already shown to be a bad
choice, when strings got a different internal format. They can still
contain text, and even much better now. But that made them unsuitable
for binary data.

You can conclude that they should not have changed the internal format,
but hey, if nothing ever changed, we would still have short strings.

So, yes, there are a few obstinate unteachables who keep on using the
old Turbo Vision style hacks, just because they never grew up and
realized that hacks should only be used if nothing else works. And
these people blame Embarcadero for changing the data type, instead of
looking into themselves and thinking: was my solution future proof?
Obviously not.

I don't think I ever called anyone an idiot. But it was indeed pretty
foolish (and stubborn) to keep on using that hack without thinking of
removing it and doing the right thing.

Call me a zealot. I never had any problems converting from short to
long strings and I never had any problems converting to UnicodeString,
because I did not use such hacks. I think that is a pretty practical
approach. The last time I used them *was* for Turbo Vision (color
schemes were passed as literals like #01#04#07#00), some 20 or more
years ago.

So don't blame Embarcadero for changing strings. They are not the ones
who did a poor job. They actually did an excellent job.

Blame yourself for not thinking about the consequences of using a hack,
in all those many years you had the opportunity to change things.

-- 
Rudy Velthuis        http://www.rvelthuis.de

Murphy's Fourth Law: If there is a possibility of several things
going wrong, the one that will cause the most damage will be the
one to go wrong.
0
Rudy
6/3/2015 7:48:02 AM
am so glad I'm not the only one with hiccoughs  


Roy Lambert
0
Roy
6/3/2015 1:57:45 PM
Rudy

> Strings are meant to contain text, nothing
>else. Using them for something else has already shown to be a bad
>choice, when strings got a different internal format. They can still
>contain text, and even much better now. But that made them unsuitable
>for binary data.

A string is still just a load of bytes strung together. The principal change from my viewpoint is that Delphi decided that rather than one byte meaning one character its now two. I also seem to remember people stored unicode text into strings even in the old days. Should they be smacked for being naughty?

>You can conclude that they should not have changed the internal format,
>but hey, if nothing ever changed, we would still have short strings.

Change = good idea
Change for the sake of change = bad idea
Change implemented badly = worst idea

>So don't blame Embarcadero for changing strings.

They made their life easier at the expense of making others more difficult. I feel that merits some blame.

>Blame yourself for not
>thinking about the consequences of using a hack, in all those many
>years you had the opportunity to change things.

You keep using the pejorative term hack. How about think of it in terms of making sensible use of facilities available and not expecting someone to take it away for no good reason?


>Murphy's Fourth Law: If there is a possibility of several things
>going wrong, the one that will cause the most damage will be the
>one to go wrong.

I do like this one. :)

Roy
0
Roy
6/3/2015 2:17:14 PM
On Wed, 3 Jun 2015 06:57:45 -0700, Roy Lambert <roy@lybster.me.uk>
wrote:

>am so glad I'm not the only one with hiccoughs  
>
>
>Roy Lambert

If the posting does not succeed, do not retry it!

I have found that this forum/news server is very often acting up by
being extremely slow to the extent that my newsreader gives up and
tells me the post failed.
But it did not, instead it is the acknowledge message that apparently
got too delayed for my newsreader.
Hitting send again just duplicates the post (as we have seen).

Instead when this happens I first change forum/newsgroup and refresh
until I get a response, then head back to the original ng and refresh
again. Mostly my failed post is actually there!

One should not have to go through such loops in a discussion
forum/newsgroup hosted by a software development tool company!


---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
0
Bo
6/3/2015 3:02:25 PM
Roy Lambert wrote:

> Rudy
> 
> > Strings are meant to contain text, nothing
> > else. Using them for something else has already shown to be a bad
> > choice, when strings got a different internal format. They can still
> > contain text, and even much better now. But that made them
> > unsuitable for binary data.
> 
> A string is still just a load of bytes strung together.

Yeah, and a keyboard is just a number of atoms strung together. 

You could use a keyboard as a plate and eat your food from it, but most
people are well advised to only use if for the purpose of entering text
into a computer. The same can be said about strings.

String are vehicles to contain text. If you do it right, the internals,
and any changes to these internals, should hardly matter. They are NOT
vehicles to contain binary data, even if AnsiStrings have been abused
for that purpose. I mean, you could just as well use an array of
longints to contain bytes, but that would be just as silly and just as
wrong.

But hey, just continue to use strings as byte containers. You'll see
where that will get you. Fact is that those who do that should not
complain that THEIR hack doesn't work anymore. It is not Embarcadero's
job to forego improvements (and Unicode is a vast improvement over Ansi
with its codepages) in order to sustain such hacks.

A hack is a hack is a hack. So stop complaining, start doing things
properly and you won't have any problems if such things are changed.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"If people are good only because they fear punishment, and hope for
 reward, then we are a sorry lot indeed." -- Albert Einstein
0
Rudy
6/3/2015 3:30:43 PM
Bo Berglund wrote:

> If the posting does not succeed, do not retry it!

Sometimes, the newsreader doesn't get a clue that the posting did
succeed, so it tries until it gets such a clue.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"The Bible was a consolation to a fellow alone in the old cell.
 The lovely thin paper with a bit of matress stuffing in it, if
 you could get a match, was as good a smoke as I ever tasted."
 -- Brendan Behan.
0
Rudy
6/3/2015 3:32:40 PM
Roy Lambert wrote:

> Change = good idea
> Change for the sake of change = bad idea

Bullshit.

Unicode was not just introduced for the sake of change. It was one of
the most favourite requests by users having to write international
software. It was an excellent thing that they changed the default from
Ansi to Unicode.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"It is practically imposible to teach good programming to
 students that have had a prior exposure to BASIC: as potential
 programmers they are mentally mutilated beyond hope of
 regeneration." -- Edsger Dijkstra
0
Rudy
6/3/2015 3:37:03 PM
On Wed, 3 Jun 2015 08:32:40 -0700, Rudy Velthuis (TeamB)
<newsgroups@rvelthuis.de> wrote:

>Bo Berglund wrote:
>
>> If the posting does not succeed, do not retry it!
>
>Sometimes, the newsreader doesn't get a clue that the posting did
>succeed, so it tries until it gets such a clue.

Oh, I see. So the newsreader continues posting all by itself then?
Mine does not, it shows an error message when it fails (the
determining factor for that I really do not know, but I suspect a
return message that got astray).

So I have control over the amount of posting. Just recently discovered
that postings which seem to fail actually succeeded notbeknowest by
the newsreader...

Your milage may vary of course.

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
0
Bo
6/3/2015 3:45:58 PM
Bo Berglund wrote:

> > Sometimes, the newsreader doesn't get a clue that the posting did
> > succeed, so it tries until it gets such a clue.
> 
> Oh, I see. So the newsreader continues posting all by itself then?

It tries to post until it gets a clue it succeeded. Then it removes the
post from its outbox. If it already got through, but the newsreaer
doesn't know this, it keeps on trying.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"The compulsion to do good is an innate American trait. Only
 North Americans seem to believe that they always should, may,
 and actually can choose somebody with whom to share their
 blessings. Ultimately this attitude leads to bombing people
 into the acceptance of gifts."
 -- Ivan Illich
0
Rudy
6/3/2015 3:52:16 PM
Rudy Velthuis (TeamB) wrote:

> Roy Lambert wrote:
> 
> > Change = good idea
> > Change for the sake of change = bad idea
> 
> Bullshit.

Implementation: correct assessment

 
> Unicode was not just introduced for the sake of change. It was one of
> the most favourite requests by users having to write international
> software. It was an excellent thing that they changed the default from
> Ansi to Unicode.

I agree.

The problem was the way in which they did it, and the failure to
maintain this great tool for binary data handling in the process.

Dan
0
Dan
6/3/2015 6:35:52 PM
Bo


I think on this occasion you're right, but all to often the attempted post hasn't turned up after several days. I'm just been conditioned to expect that posting failed :(

Roy Lambert
0
Roy
6/4/2015 7:12:09 AM
Rudy


Why did you decide to cut my list before

Change implemented badly = worst idea

Roy Lambert
0
Roy
6/4/2015 7:12:10 AM
Rudy

>Yeah, and a keyboard is just a number of atoms strung together.
>
>You could use a keyboard as a plate and eat your food from it, but most
>people are well advised to only use if for the purpose of entering text
>into a computer. The same can be said about strings.

The same argument can be used about anything that is not being used for what it was expressly designed for. However, to refute your argument totally

http://www.theregister.co.uk/2015/05/20/kfc_germany_bakes_bluetooth_keyboard_into_meal_trays/

>String are vehicles to contain text. If you do it right, the internals,
>and any changes to these internals, should hardly matter. They are NOT
>vehicles to contain binary data, even if AnsiStrings have been abused
>for that purpose. I mean, you could just as well use an array of
>longints to contain bytes, but that would be just as silly and just as
>wrong.

You mean even sillier than using multiple integers to represent one character?

>But hey, just continue to use strings as byte containers. You'll see
>where that will get you. Fact is that those who do that should not
>complain that THEIR hack doesn't work anymore. It is not Embarcadero's
>job to forego improvements (and Unicode is a vast improvement over Ansi
>with its codepages) in order to sustain such hacks.
>
>A hack is a hack is a hack. So stop complaining, start doing things
>properly and you won't have any problems if such things are changed.

At what point does good and sensible practice become a hack?


Roy Lambert
0
Roy
6/4/2015 7:27:09 AM
On Thu, 4 Jun 2015 00:12:09 -0700, Roy Lambert <roy@lybster.me.uk>
wrote:

>Bo
>
>
>I think on this occasion you're right, but all to often the attempted 
>post hasn't turned up after several days. I'm just been conditioned 
>to expect that posting failed :(
>

Yeah,
and I was accustomed to discussion forums/newsservers "just working"
until about a year ago.
At that time this one broke badly and has not recovered since (for
example everything in the past history is lost in all forums/ngs).

But a little more than a year ago another forum I use heavily also
broke, this is the Microchip forum for embedded controller design (the
PIC line of controllers).
Here it is:  http://www.microchip.com/forums/default.aspx
But contrary to the Embarcadero way Microchip has at least spent some
effort returning it to operational status and *without* losing past
history!

I do not get why EBT disregards their user base this way.....

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
0
Bo
6/4/2015 8:54:38 AM
Bo


>I do not get why EBT disregards their user base this way.....

They either believe, or have proven to themselves, that it is more profitable to do it this way.

Roy
0
Roy
6/4/2015 11:37:57 AM
Dan Barclay wrote:

> > Unicode was not just introduced for the sake of change. It was one
> > of the most favourite requests by users having to write
> > international software. It was an excellent thing that they changed
> > the default from Ansi to Unicode.
> 
> I agree.
> 
> The problem was the way in which they did it

They did it in a way that those who used strings for text would not
have any problems at all.


-- 
Rudy Velthuis        http://www.rvelthuis.de

"My opinions might have changed, but not the fact that I am
 right."
0
Rudy
6/6/2015 7:26:39 AM
Roy Lambert wrote:

> Why did you decide to cut my list before
> 
> Change implemented badly = worst idea
> 

Because it was irrelevant. The changes were not implemented badly at
all. They were done in the best possible way, so that those who use
strings for the purpose they are meant for would not have any problems
at all.


-- 
Rudy Velthuis        http://www.rvelthuis.de

"Religion is excellent stuff for keeping common people quiet."
 -- Napoleon
0
Rudy
6/6/2015 7:27:57 AM
Roy Lambert wrote:

> The same argument can be used about anything that is not being used
> for what it was expressly designed for. 

Indeed.

Strings were not meant to be carrying binary data. Doing so is foolish,
as time has told. What else can I say?

-- 
Rudy Velthuis        http://www.rvelthuis.de

"The only function of economic forecasting is to make astrology
 look respectable." -- John Kenneth Galbraith
0
Rudy
6/6/2015 7:29:41 AM
Rudy


I'm so glad it was irrelevant and not you trying to either ignore something or twist to your viewpoint.

Roy Lambert
0
Roy
6/6/2015 1:01:58 PM
Rudy


>They did it in a way that those who used strings for text would not
>have any problems at all.

I do believe your sig makes my point for me, well that coupled with quite a few posts in these ngs

>Rudy Velthuis http://www.rvelthuis.de
>
>"My opinions might have changed, but not the fact that I am
> right."
0
Roy
6/6/2015 1:01:59 PM
Rudy

>> The same argument can be used about anything that is not being used
>> for what it was expressly designed for.
>
>Indeed.
>
>Strings were not meant to be carrying binary data. Doing so is foolish,
>as time has told. What else can I say?

1. please give the timepoint at which your assertion became truth.

2. Aren't you going to carry on the analogies? I did so enjoy the eating of a keyboard one <G>

Roy Lambert
0
Roy
6/6/2015 1:06:58 PM
Roy Lambert wrote:

> > They did it in a way that those who used strings for text would not
> > have any problems at all.
> 
> I do believe your sig makes my point for me, 

<sigh>

No, it doesn't. I did not change my opinion. That has always been my
opinion, but indeed, the fact I am right has not changed.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"You must ask your neighbor if you shall live in peace."
 -- John Clark
0
Rudy
6/6/2015 2:46:18 PM
Rudy


Whilst I was out walking the dog I thought "shall I let Rudy have the last word as he so loves" - then I thought "nah"

Was it an exasperated sigh, an exhausted one or something else?

Your belief, unless I misunderstand,  is that Embarcadero altered strings in the best possible fashion, causing as little disruption as possible to anyone.

Mine is that there were alternative strategies such as introducing a UnicodeString type, or a compiler switch, which would have been less disruptive of existing code, left a useful artefact in place, allowed future unicode development but have been more expensive for Embarcadero than the one they took.

I have yet to see any reasoned refutation of my viewpoint - simply chanting "your wrong and I'm right" just doesn't seem to work.

Roy Lambert

"Absolute certainty is a sign of a rigid and inflexible mind"
unattributed
0
Roy
6/6/2015 4:56:14 PM
Roy Lambert wrote:

> Your belief, unless I misunderstand,  is that Embarcadero altered
> strings in the best possible fashion, causing as little disruption as
> possible to anyone.

That is not a belief, that is an established fact (see later in text).
They caused as little disruption as possible for those who use and used
strings for what strings are meant: to contain text.

It has turned out (and I predicited this well before the actual switch)
that the only people who had considerable problems with thr switch were
those who prematurely ansified their programs (i.e. changed every
occurrence of the generic "string" by a specified "AnsiString") and
those who abused (Ansi)strings to contain binary data. And perhaps some
ASM routines.

So those who simply used strings to contain text and did not do
anything special (like ansifying or storing binary data) had hardly any
problems. Usually a full rebuild and a few corrections were enough.

That does show that using strings as carrier of binary data was a hack,
that may have made sense in the early days. People have had 15 years to
remove their hacks, and whatever the motives were of those who didn't
doesn't matter. Fact is that the hack caused them big problems when the
switch was made. <shrug>

So some may call me a zealot fot telling people it was wrong to keep on
using strings for binary data (and I have said that for many years
already, well before the Unicode switch was made), but ISTM that was
not zealotry, it was pure practicism. It is *never* a good idea to
leave a hack in place longer than absolutely necessary. A hack is a
hack is a hack, after all.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"There is a tragic clash between truth and the world. Pure
 undistorted truth burns up the world."
 -- Nikolay Berdyayev
0
Rudy
6/6/2015 9:12:58 PM
Roy Lambert wrote:

> Mine is that there were alternative strategies such as introducing a
> UnicodeString type, or a compiler switch

No. A switch was never a viable option (just like an ARC/no ARC switch
is no viable option). This has been discussed ad nauseam, so try to
find those discussions, I won't repeat them anymore.

So the facts contradict your belief. This is not a matter of belief or
opinion, it is a matter of fact.

You can, of course, keep on believing what you want, but ISTM that
ignoring the facts won't do you any good.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"To understand a man you should walk a mile in his shoes. If what
 he says still bothers you that's ok because you'll be a mile away
 from him and you'll have his shoes." -- Unknown
0
Rudy
6/6/2015 9:16:25 PM
Roy Lambert wrote:

> I have yet to see any reasoned refutation of my viewpoint

Actually, your point of view is totally irrelevant to me. 

I will just correct your saying that the Unicode switch was implemented
badly. The switch was done very well and in the only viable way.

I have explained how and why. Your views about a switch, etc. don't
make sense. Just ponder it a little harder and longer, and perhaps
you'll see why. Good luck with that.

And I will correct any "accusations" of zealotry. Time has shown that
storing binary data in strings was and is a bad idea. If you did it,
you knew (but perhaps forgot) that it might one day break terribly,
especially if you did it consistently, like some here. If, by chance,
you did not know this, then well, too bad. If you use a hack, you
should be aware of the dangers and you should keep the use of the hack
to a minimum. Forget those principles and one day you may be in big
trouble.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"If I could find a way to get [Saddam Hussein] out of there, even
 putting a contract out on him, ... ahh ... if the CIA still did
 that sort of thing, . . . ahh . . . assuming it ever did . . . .
 . . . I would be for it." -- Richard Nixon
0
Rudy
6/6/2015 9:26:06 PM
Rudy

>> Your belief, unless I misunderstand, is that Embarcadero altered
>> strings in the best possible fashion, causing as little disruption as
>> possible to anyone.
>
>That is not a belief, that is an established fact (see later in text).
>They caused as little disruption as possible for those who use and used
>strings for what strings are meant: to contain text.

I saw later in the text, unfortunately I did not see any facts. Again I see belief or, if you prefer the word, opinion.

>It has turned out (and I predicited this well before the actual switch)
>that the only people who had considerable problems with thr switch were
>those who prematurely ansified their programs (i.e. changed every
>occurrence of the generic "string" by a specified "AnsiString") and
>those who abused (Ansi)strings to contain binary data. And perhaps some
>ASM routines.

Congratulations, I'm not sure where your evidence comes from though. I must have been reading the wrong newsgroups - I don't monitor all of the Embacardero ones.

>So those who simply used strings to contain text and did not do
>anything special (like ansifying or storing binary data) had hardly any
>problems. Usually a full rebuild and a few corrections were enough.

IO neither doubt that, or dispute it. After all if you don't do much there's not a lot to go wrong.

>That does show that using strings as carrier of binary data was a hack,
>that may have made sense in the early days.

That assertion is rubbish. What is shown is that trying to use unicodesiting in the same way as the old string is difficult at best and at times foolhardy

>People have had 15 years to
>remove their hacks, and whatever the motives were of those who didn't
>doesn't matter. Fact is that the hack caused them big problems when the
>switch was made. <shrug>

The timescale you quote rather surprises me.

>So some may call me a zealot fot telling people it was wrong to keep on
>using strings for binary data (and I have said that for many years
>already, well before the Unicode switch was made), but ISTM that was
>not zealotry, it was pure practicism. It is *never* a good idea to
>leave a hack in place longer than absolutely necessary. A hack is a
>hack is a hack, after all.

I wouldn't call you a zealot, I think you have strong opinions, are willing to stand up for them, but are not willing to listen to others views when they contradict your own.

Your persistent reference to the use of strings to carry other than printable characters (or as you call it binary data) as a hack is a fine example of this. At one point it was good programming practice (probably because it was the only vehicle available). AT that point it wasn't a hack.
...
Roy Lambert
0
Roy
6/7/2015 7:30:43 AM
Rudy

>So the facts contradict your belief. This is not a matter of belief or
>opinion, it is a matter of fact.

Facts are things that can be scientifically proved or disproved, anything else is at best an hypothesis or more probably an opinion or belief.

Roy Lambert
0
Roy
6/7/2015 7:31:17 AM
Rudy


>> I have yet to see any reasoned refutation of my viewpoint
>
>Actually, your point of view is totally irrelevant to me.

I know <g>

>I will just correct your saying that the Unicode switch was implemented
>badly. The switch was done very well and in the only viable way.

As you may have guessed I disagree.

>I have explained how and why. Your views about a switch, etc. don't
>make sense. Just ponder it a little harder and longer, and perhaps
>you'll see why. Good luck with that.

Must have missed that.

>And I will correct any "accusations" of zealotry. Time has shown that
>storing binary data in strings was and is a bad idea. If you did it,
>you knew (but perhaps forgot) that it might one day break terribly,
>especially if you did it consistently, like some here. If, by chance,
>you did not know this, then well, too bad. If you use a hack, you
>should be aware of the dangers and you should keep the use of the hack
>to a minimum. Forget those principles and one day you may be in big
>trouble.

Time has certainly shown that if someone comes along and changes the fundamental definition of an object it screws up what's gone before.

Roy Lambert
0
Roy
6/7/2015 7:35:43 AM
On Sun, 7 Jun 2015 00:35:43 -0700, Roy Lambert <roy@lybster.me.uk>
wrote:

>Time has certainly shown that if someone comes along and changes 
>the fundamental definition of an object it screws up what's gone 
>before.
Interesting discussion/flaming...

There is another similar case concerning Indy evolution.
When Indy went from 9 to 10 I had to hack my Delphi installation
by using environment variables to re-point the search paths etc
to go to the Indy version pertinent to the application in question.
We had many applications built with Indy9 which broke in Indy10.
The reason is that the Indy team removed/renamed/changed a number of
public methods and properties in a way that caused a LOT of work to
modify.

And since Delphi switched from 9 to 10 at some time when we upgraded
we could not continue development in the new Delphi version because of
this. Until I realized I could "hack" my way into the Delphi
installation directory and move away all of the Indy source and dcu
files to a directory not in the common Delphi search system. Then I
could add the env vars to connect to the correct version and got a
possibility to continue.

Of course new applications were using Indy10....
And Indy is not nearly as much in use "everywhere" as string is.

Side note concerning string:
My group was developing industrial automation applications in my
workplace (I am now retired) and we started using Delphi back in 1995.
In this we had to interface to machine tools for control purposes and
these used ASCII character control over RS232 and later TCP/IP (where
Indy came into use). Basically all machine tool makers used different
control syntax and transmission packet formats.

Needless to say we composed the control sentences in strings, even
those that had packet systems with embedded control characters and
other binary data. The reason: Ease of programming (RAD!) using all
the different string manipulation functions in Delphi.

I do not consider that a hack!


---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
0
Bo
6/7/2015 8:24:23 AM
Bo


>I do not consider that a hack!

I'm certain Rudy will <g>

Roy

ps

From your posts it doesn't seen as if you're retired
0
Roy
6/7/2015 2:27:43 PM
Roy Lambert wrote:

> I saw later in the text, unfortunately I did not see any facts.

Then you don't. I don't really mind if you see it or not.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"Why do we kill people who are killing people to show that
 killing people is wrong?"
 -- Holly Near
0
Rudy
6/7/2015 2:32:36 PM
Roy Lambert wrote:

> Rudy
> 
> > So the facts contradict your belief. This is not a matter of belief
> > or opinion, it is a matter of fact.
> 
> Facts are things that can be scientifically proved or disproved

No, really?

Does it actually matter what I write? You don't "believe" me anyway, so
why should I bother?

This has been discussed ad nauseam already, not just by me. Go and read
it.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"Democracy is the process by which people choose the man who'll
 get the blame." -- Bertrand Russell
0
Rudy
6/7/2015 2:37:38 PM
Roy Lambert wrote:

> Rudy
> 
> 
> >> I have yet to see any reasoned refutation of my viewpoint
> > 
> > Actually, your point of view is totally irrelevant to me.
> 
> I know <g>

Just like whatever I write will not change your mind anyway, so I won't
bother.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"'Everything you say is boring and incomprehensible', she said,
 'but that alone doesn't make it true.'" -- Franz Kafka
0
Rudy
6/7/2015 2:38:38 PM
Bo Berglund wrote:

> Needless to say we composed the control sentences in strings, even
> those that had packet systems with embedded control characters and
> other binary data. The reason: Ease of programming (RAD!) using all
> the different string manipulation functions in Delphi.
> 
> I do not consider that a hack!

Even if it was actually a hack. I bet it broke or will break badly one
day.


-- 
Rudy Velthuis        http://www.rvelthuis.de

"It is not easy to find happiness in ourselves; it is not
 possible to find it elsewhere."
 -- Agnes Repplier
0
Rudy
6/7/2015 2:39:55 PM
Roy Lambert wrote:

> Rudy
> 
> > So the facts contradict your belief. This is not a matter of belief
> > or opinion, it is a matter of fact.
> 
> Facts are things that can be scientifically proved

Bullshit.

Not every fact can be proven. It is a fact I am thinking of a glass of
cool water now, but it can't be proven. It is a fact I picked up a
screwdriver a few seconds ago, but it can't be proven. Etc. etc.

But it is a metter of fact, and it can be proven, that a switch was not
a viable option. It has been proven many times already and I will not
repeat it, especially since it won't change your fixed notions anyway.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"In this war - as in others - I am less interested in honoring
 the dead than in preventing the dead." -- Butler Shaffer
0
Rudy
6/7/2015 2:46:00 PM
On Sun, 7 Jun 2015 07:27:43 -0700, Roy Lambert <roy@lybster.me.uk>
wrote:

>From your posts it doesn't seem as if you're retired

Well, for many years I had a day job in Sweden, which I have retired
from.
But I also have a share in a small business in Austin Texas into which
I provide some development support.
Used to be basically electronics development but now I have been
thrown into maintenance of a software suite developed by someone else
who has quit. Hence the many posts about things I have encountered
when:
1) Migrating the 3 applications to the Unicode enabled RAD studio from
BDS2006.

2) Adding new functionality to one of the applications, which was
written in C++ (a language I have never programmed in myself).

One has to have something to do even when no longer working, for
instance work....


---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
0
Bo
6/7/2015 2:49:08 PM
Rudy


>But it is a metter of fact, and it can be proven, that a switch was not
>a viable option. It has been proven many times already and I will not
>repeat it, especially since it won't change your fixed notions anyway.

It can only be proven if it was attempted. It was not. You have a strange definition of proof.

Roy Lambert
0
Roy
6/7/2015 2:52:42 PM
Roy Lambert wrote:

> Rudy
> 
> 
> > But it is a metter of fact, and it can be proven, that a switch was
> > not a viable option. It has been proven many times already and I
> > will not repeat it, especially since it won't change your fixed
> > notions anyway.
> 
> It can only be proven if it was attempted.

Bullshit again. No attempt is needed to tell it is not a good idea to
jump off the Empire State without something like a parachute or so.

It has been shown, many many times, why a switch is a bad idea.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"It was, of course, a lie what you read about my religious
 convictions, a lie which is being systematically repeated. I do
 not believe in a personal god and I have never denied this but
 have expressed it clearly. If something is in me which can be
 called religious, then it is the unbounded admiration for the
 structure of the world so far as our science can reveal it."
 -- Albert Einstein
0
Rudy
6/7/2015 2:57:07 PM
Rudy


>Does it actually matter what I write?

Since all you are doing is repeating the mantra probably not.

>You don't "believe" me anyway, so
>why should I bother?

All you have so far offered me to believe is "its a hack" "you are wrong"

Sorry - not buying

Roy Lambert
0
Roy
6/7/2015 2:57:43 PM
Rudy


>Just like whatever I write will not change your mind anyway,


Well not when your contribution consists of "you're wrong" "its a hack"

>so I won't
>bother.

OK

Roy Lambert
0
Roy
6/7/2015 3:02:43 PM
Rudy


>Not every fact can be proven. It is a fact I am thinking of a glass of
>cool water now, but it can't be proven.

I will agree with the second part of the statement, and since that is so maybe the first part should be considered null ie unknown

>It is a fact I picked up a
>screwdriver a few seconds ago, but it can't be proven. Etc. etc.

Possibly, possibly not. Was there a witness? Did you record it on video?

>But it is a metter of fact, and it can be proven, that a switch was not
>a viable option. It has been proven many times already and I will not
>repeat it, especially since it won't change your fixed notions anyway.

It has not been proven it has been stated. Simply continuing to state something does not make it true.

Roy Lambert
0
Roy
6/7/2015 3:22:42 PM
Rudy


>Bullshit again. No attempt is needed to tell it is not a good idea to
>jump off the Empire State without something like a parachute or so.

It is if you want to commit suicide <vbg>

>It has been shown, many many times, why a switch is a bad idea.

Interesting so we've moved from "not viable" to "bad idea". Some progress may be being made.

Roy Lambert
0
Roy
6/7/2015 3:23:14 PM
Bo


>One has to have something to do even when no longer working, for
>instance work....

Yup. My body is breaking down, I'm trying to keep my brain functioning.

Roy
0
Roy
6/7/2015 3:23:45 PM
This thread now contains 54 messages, not all of which are really
addressing the issue...

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
0
Bo
6/7/2015 4:37:54 PM
> Yes, I am such a "zealot". Strings are meant to contain text, nothing
> else. 

Text can be encode by many style. Unicode is just one coding way. Unicode is not equal text.
Can you distinguish the difference between the two concepts.
 
> we would still have short strings.

What's question about short strings? If we have "String[n] ", and each character has two bytes. And the whole "short string" can
have 65535 characters. That's very nice. Of course, we should have "AnsiString[n]".

This is the now situation :  String = two bytes  String[n] = one byte.  It's a complete joke. For a beginner, this is a serious ambiguity.
They can draw a conclusion, poorly designed language.

> Blame yourself for not thinking about the consequences of using a hack,
> in all those many years you had the opportunity to change things.

Hack?  If one day, the world replaced the Unicode with another encoding method.
Well, Delphi is hard to use Hack to using new encoding method.
So, please do not confuse encoding with text. You should take one second and think about the opinions of others.
0
wenjie
6/9/2015 2:09:22 AM
wenjie zhou wrote:

> > Yes, I am such a "zealot". Strings are meant to contain text,
> > nothing else. 
> 
> Text can be encode by many style. Unicode is just one coding way.

Sure. Not sure how that matters, though. Strings contain text, not
binary data. The fact that strings contain an encoding is exactly why
they should not be used for binary.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"Roses are #FF0000
 Violets are #0000FF
 All my base are belong to you!"
 -- Geek Valentine T-shirt at ThinkGeek
0
Rudy
6/9/2015 6:48:29 AM
wenjie zhou wrote:

> Text can be encode by many style. Unicode is just one coding way.
> Unicode is not equal text.  Can you distinguish the difference
> between the two concepts.   

Finally you wrote something correctly. You wrote text and not binary
data. Yes strings are meant to contain text data and _not_ binary.
Embarcadero did a great job to support it. 

 
> What's question about short strings? If we have "String[n] ", and
> each character has two bytes. And the whole "short string" can have
> 65535 characters. That's very nice. Of course, we should have
> "AnsiString[n]".

I agree they should deprecate short strings. Unfortunately short
strings are widely used and they made a choice to leave them in the
language. I can't blame them. Fortunately short strings are not widely
used and can be replaced without a headache (except when used in packed
records).


> This is the now situation :  String = two bytes  String[n] = one
> byte.  It's a complete joke. For a beginner, this is a serious
> ambiguity.  They can draw a conclusion, poorly designed language.


Embarcadero has addressed this in the mobile compiler. You should know
that most of the forum users didn't celebrate that decision. So now you
are advocating removing ANSI strings from the language completely? No
it's not a joke; you should read some of the threads in the
non-technical forum on this topic regarding the mobile compiler.

> Hack?  If one day, the world replaced the Unicode with another
> encoding method.  Well, Delphi is hard to use Hack to using new
> encoding method.  So, please do not confuse encoding with text. You
> should take one second and think about the opinions of others.

Please don't forget that there were changes. In my case first I had to
store every text data regardless of the code page in (ansi)string (due
to lack of Unicode support for Informix databases. Of course with the
old ansi string it was not easy as you was in charge to take care of
code page conversions. There was not a built in support to easily
convert the text to Unicode and back. The next step was to move the
text to Unicode using widestrings. Now with the new (Unicode)string
type the life is much easier. Of course this new shiny features
required some code rewrite mostly for the dot-matrix printer support as
now the data you send must be threatened as binary data.
0
Lajos
6/10/2015 5:58:49 AM
Rudy Velthuis (TeamB) wrote:

> Roy Lambert wrote:
> 
> > I have yet to see any reasoned refutation of my viewpoint
> 
> Actually, your point of view is totally irrelevant to me. 
> 
> I will just correct your saying that the Unicode switch was
> implemented badly. The switch was done very well and in the only
> viable way.
> 
> I have explained how and why. Your views about a switch, etc. don't
> make sense. Just ponder it a little harder and longer, and perhaps
> you'll see why. Good luck with that.
> 
> And I will correct any "accusations" of zealotry. Time has shown that
> storing binary data in strings was and is a bad idea. If you did it,
> you knew (but perhaps forgot) that it might one day break terribly,
> especially if you did it consistently, like some here. If, by chance,
> you did not know this, then well, too bad. If you use a hack, you
> should be aware of the dangers and you should keep the use of the hack
> to a minimum. Forget those principles and one day you may be in big
> trouble.

There seems to be confusion here, maybe I can clear some of it up.

Actually, what some of us did was to translate our binary into *text*
that worked well even in narrow (one byte) Ansistring *text* variables.
I can't speak for others, but I used a simple substitution table.  Yup,
they were *text* and every *text* character was important.  The text
wasn't "English" or "German", but it was one byte ansi text.  A
straight up translation.

Where EMB went wrong was, in fact, in their implementation.  Had they
implemented the unicode "automagic" type conversions correctly there
would have been no corruption of the text.  In fact, EMB went out of
their way to put conversions where none were needed (ref AnsiLeftStr()
and friends).

Ansistring to Unicode, back to Ansistring, should be clean.  As it
happens, there is occasionaly corruption of the text.  The text itself
is corrupted.  As a result, our translation *back* to binary is also
corrupted, since you can't fix corruption once it has happened. This
happens rarely, but it happens and that breaks things.

Now, does that make you feel better that we don't store nasty binary
data in your precious text strings?  I apologize if my earlier
description was confusing to you.

Funny how the end result is almost identical.  Almost?  Hmmm...

I'm curious... you're a TeamB guy huh?  Is your "bullshit" approach to
things representative of that moniker?  It really doesn't seem as
helpful as the approach other team members seem to use.  You might want
to use something more like their style.  They cut the bullshit and
treat others with respect.  I have no allusions that you will do that,
but I thought I'd offer the suggestion.

For whatever that is worth.

Dan
0
Dan
6/11/2015 1:05:44 AM
> Finally you wrote something correctly. You wrote text and not binary
> data. Yes strings are meant to contain text data and _not_ binary.
> Embarcadero did a great job to support it. 

First, what is text? and what is binary data? 
Before UTF8, Ansi is text, and UTF8 is binary data, is it?  Today is the binary data,  and maybe tomorrow it is text.
In old days, we can use AnsiString to store UTF8 encoding string, ansi encoding string, even can store GB2312 and etc.
Perhaps in my application, I need to customize a encoding text. You maybe call it binary. But i think it is text data.
Embarcadero did a bad job to trevent the expression of diversification. 
I know that Americans mostly pursuit freedom. Why to add so many restrictions? Where is freedom?

>I agree they should deprecate short strings. Unfortunately short
>strings are widely used and they made a choice to leave them in the
>language. I can't blame them. Fortunately short strings are not widely
>used and can be replaced without a headache (except when used in packed
>records).

I can't understand why you think they should deprecate short strings. 

Short string is very useful for net application. It's perfect. I don't know what can replace short string in records. 
Do not tell me it is array of bytes.

> So now you are advocating removing ANSI strings from the language completely?

No, i do not advocat removing ANSI strings from the language completely?
I want that have  AnsiString[n] and UnicodeString[n] in the same time. And, in XE, String[n] = UnicodeString[n].
UnicodeString[n] can have 65545(word) charaters. And each charater has two bytes. That's it.

>Now with the new (Unicode)string type the life is much easier. 

We can have mang methods to make the life much easier. But (Unicode) string is bad way.
If you are interested, we can discuss this topic further.
0
wenjie
6/11/2015 3:30:12 AM
wenjie zhou wrote:

> First, what is text? and what is binary data? 

Text is a sequence of characters written in some language. Now text can
be represented using ANSI code pages or using unicode.

> Before UTF8, Ansi is text, and UTF8 is binary data, is it?  Today is
> the binary data,  and maybe tomorrow it is text.  In old days, we can
> use AnsiString to store UTF8 encoding string, ansi encoding string,
> even can store GB2312 and etc.  Perhaps in my application, I need to
> customize a encoding text. You maybe call it binary. But i think it
> is text data.  Embarcadero did a bad job to trevent the expression of
> diversification.  I know that Americans mostly pursuit freedom. Why
> to add so many restrictions? Where is freedom?

Embarcadero by making ansi string code-page aware tried to make even
simpler to assign different ansi strings to unicodestring and back.
This will work perfectly when you know in which code page is used in
input data. Otherwise if you have to handle multiple code pages this
can be a harder job than before. If your input data can be in multiple
code pages than yes you have to threat it as a binary data parse for
every part of it detect in which code page is the data written and
using the TEncoding class convert it to unicode. This can be more


> I can't understand why you think they should deprecate short strings. 

Shortstring is a leftover from the old days (Pascal days or Delphi 1).
When strings were more binary containers than really strings. It's not
code-page aware and just bring confusion on the table.

 
> Short string is very useful for net application. It's perfect. I
> don't know what can replace short string in records.  Do not tell me
> it is array of bytes.

If the application is used to store text in a single language that
shortstring is yes usefull. However nowdays we live in unicode word.
Everyone would like to use the full alphabet. How could you handle
chineese russian etc. characters in the shortstring? You can't you
would have to add an extra field describing the code page in that the
data was written. Now if you have a codepage in addition of the string
you are using the shortstring as an array of bytes that must be
converted to unicode in order to display it.

> UnicodeString[n] can have 65545(word) charaters. And each charater
> has two bytes. That's it.

Please don't forget that some characters can be represented only by
surrogate pairs. Thus how much memory should be allocated for
UnicodeString[n]? Also please note that a record containing
UnicodeString[n] would not be binary compatible with the old version of
the record. A true fixed length shortstring could be achieved only by
encoding the content in UTF-32.

> 
> > Now with the new (Unicode)string type the life is much easier. 
> 
> We can have mang methods to make the life much easier. But (Unicode)
> string is bad way.  If you are interested, we can discuss this topic
> further.

Every string representation has it's possitive and negative sides. None
of the representations is perfect. While UTF-8 tends to be most
compact, UTF-32 on the other hand would be lenght encoding.
0
Lajos
6/11/2015 2:33:32 PM
Lajos

>> First, what is text? and what is binary data?
>
>Text is a sequence of characters written in some language. Now text can
>be represented using ANSI code pages or using unicode.

You started out well here, but only really succeeded in making things less clear. You choose to use the word "characters" which implies an alphabet and excludes languages such as Chinese which do not use an alphabet.  There are also a number of (admittedly the ones I know of are dead) languages which do use characters, have an alphabet but aren't covered by current unicode (eg Ugarit which used cuniform and was possibly the first alphabetically based system). The we can add in smileys which only by the de
ranged standards of today's youf can be considered a character or part of a language.


>Shortstring is a leftover from the old days (Pascal days or Delphi 1).
>When strings were more binary containers than really strings. It's not
>code-page aware and just bring confusion on the table.

So at that point it was alright to store binary data into a string?

>Everyone would like to use the full alphabet.

Please do assume everyone shares your opinion - I may be in a minority, however, the only "full" alphabet I'm interested in is the English one.

Roy Lambert
0
Roy
6/11/2015 3:18:55 PM
The character is represented by binary data. No matter what language you used.
 We can say that the character is binary data in computer. Do we know all the language in theuniverse? Obviously not!
 So some languages we don't know can be use any binary data to express. So that is why i like STRING in old DELPHI.
 We must  leave enough space for the unknown world. 
 Once again, i want to emphasize Unicode is one encoding for string. Not the only one in the world. Text and Unicode is
 not the same thing. Force them to set them together will made huge obstacles for the expansion of the future.

> Embarcadero by making ansi string code-page aware tried to make even
> simpler to assign different ansi strings to unicodestring and back.

If the rtl supply some code , we can be simpler too. But build in type should not so such thing.
e.g.
<code>
   UTF8String = record
   private
       FCodePage: Integer;
       FByteString; AnsiString;   
   public
      class operator Implicit(Value: AnsiString): UTF8String ;
      class operator Implicit(Value: WideString): UTF8String ;
      class operator Implicit(Value: UTF8String ): AnsiString ;
      class operator Implicit(Value: UTF8String ): WideString;
   end; 
<code>
You see, UTF8String can do such convertion. And AnsiString, WideString do not need care about codepage.







> Text is a sequence of characters written in some language. Now text can
> be represented using ANSI code pages or using unicode.
> 
> > Before UTF8, Ansi is text, and UTF8 is binary data, is it?  Today is
> > the binary data,  and maybe tomorrow it is text.  In old days, we can
> > use AnsiString to store UTF8 encoding string, ansi encoding string,
> > even can store GB2312 and etc.  Perhaps in my application, I need to
> > customize a encoding text. You maybe call it binary. But i think it
> > is text data.  Embarcadero did a bad job to trevent the expression of
> > diversification.  I know that Americans mostly pursuit freedom. Why
> > to add so many restrictions? Where is freedom?
> 
> Embarcadero by making ansi string code-page aware tried to make even
> simpler to assign different ansi strings to unicodestring and back.
> This will work perfectly when you know in which code page is used in
> input data. Otherwise if you have to handle multiple code pages this
> can be a harder job than before. If your input data can be in multiple
> code pages than yes you have to threat it as a binary data parse for
> every part of it detect in which code page is the data written and
> using the TEncoding class convert it to unicode. This can be more
> 
> 
> > I can't understand why you think they should deprecate short strings. 
> 
> Shortstring is a leftover from the old days (Pascal days or Delphi 1).
> When strings were more binary containers than really strings. It's not
> code-page aware and just bring confusion on the table.
> 
>  
> > Short string is very useful for net application. It's perfect. I
> > don't know what can replace short string in records.  Do not tell me
> > it is array of bytes.
> 
> If the application is used to store text in a single language that
> shortstring is yes usefull. However nowdays we live in unicode word.
> Everyone would like to use the full alphabet. How could you handle
> chineese russian etc. characters in the shortstring? You can't you
> would have to add an extra field describing the code page in that the
> data was written. Now if you have a codepage in addition of the string
> you are using the shortstring as an array of bytes that must be
> converted to unicode in order to display it.
> 
> > UnicodeString[n] can have 65545(word) charaters. And each charater
> > has two bytes. That's it.
> 
> Please don't forget that some characters can be represented only by
> surrogate pairs. Thus how much memory should be allocated for
> UnicodeString[n]? Also please note that a record containing
> UnicodeString[n] would not be binary compatible with the old version of
> the record. A true fixed length shortstring could be achieved only by
> encoding the content in UTF-32.
> 
> > 
> > > Now with the new (Unicode)string type the life is much easier. 
> > 
> > We can have mang methods to make the life much easier. But (Unicode)
> > string is bad way.  If you are interested, we can discuss this topic
> > further.
> 
> Every string representation has it's possitive and negative sides. None
> of the representations is perfect. While UTF-8 tends to be most
> compact, UTF-32 on the other hand would be lenght encoding.
0
wenjie
6/12/2015 4:33:46 AM
The character is represented by binary data. No matter what language you used.
 We can say that the character is binary data in computer. Do we know all the language in theuniverse? Obviously not
 So some languages we don't know can be use any binary data to express. So that is why i like STRING in old DELPHI.
 We must  leave enough space for the unknown world. 
 Once again, i want to emphasize Unicode is one encoding for string. Not the only one in the world. Text and Unicode is
 not the same thing. Force them to set them together will made huge obstacles for the expansion of the future.

> Embarcadero by making ansi string code-page aware tried to make even
> simpler to assign different ansi strings to unicodestring and back.

If the rtl supply some code , we can be simpler too. But build in type should not do so such thing.
e.g.
<code>
   UTF8String = record
   private
       FCodePage: Integer;
       FByteString; AnsiString;   
   public
      class operator Implicit(Value: AnsiString): UTF8String ;
      class operator Implicit(Value: WideString): UTF8String ;
      class operator Implicit(Value: UTF8String ): AnsiString ;
      class operator Implicit(Value: UTF8String ): WideString;
   end; 
</code>
You see, UTF8String can do such convertion. And AnsiString, WideString do not need care about codepage.

And i am Chinese.
0
wenjie
6/12/2015 4:37:56 AM
The character is represented by binary data. No matter what language you used.
 We can say that the character is binary data in computer. Do we know all the language in theuniverse? Obviously not
 So some languages we don't know can be use any binary data to express. So that is why i like STRING in old DELPHI.
 We must  leave enough space for the unknown world. 
 Once again, i want to emphasize Unicode is one encoding for string. Not the only one in the world. Text and Unicode is
 not the same thing. Force them to set them together will made huge obstacles for the expansion of the future.

> Embarcadero by making ansi string code-page aware tried to make even
> simpler to assign different ansi strings to unicodestring and back.

If the rtl supply some code , we can be simpler too. But build in type should not do so such thing.
e.g.
<code>
   UTF8String = record
   private
       FCodePage: Integer;
       FByteString; AnsiString;   
   public
      class operator Implicit(Value: AnsiString): UTF8String ;
      class operator Implicit(Value: WideString): UTF8String ;
      class operator Implicit(Value: UTF8String ): AnsiString ;
      class operator Implicit(Value: UTF8String ): WideString;
   end; 
</code>
You see, UTF8String can do such convertion. And AnsiString, WideString do not need care about codepage.

And i am Chinese. I do not like the full alphabet. And Roy Lambert  is also do not. 
It just like  automatic camera and SLR camera. I do not need automatic camera . I want to produce high quality product.
0
wenjie
6/12/2015 4:56:58 AM
Dan Barclay wrote:

> Actually, what some of us did was to translate our binary into text
> that worked well even in narrow (one byte) Ansistring text variables.
> I can't speak for others, but I used a simple substitution table.
> Yup, they were text and every text character was important.  The text
> wasn't "English" or "German", but it was one byte ansi text.  A
> straight up translation.

No problem with that. If the end result of the "translation" was indeed
text, it should not cause a problem with encodings like UTF-16 or UTF-8.
 
> Where EMB went wrong was, in fact, in their implementation.  Had they
> implemented the unicode "automagic" type conversions correctly there
> would have been no corruption of the text.

If you encoded it as you claim, there was still no corruption, no
matter if it was encoded in UTF-16, UTF-8 or even plain ASCII. If not,
then you did not do what you claim above.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"Programmers are in a race with the Universe to create bigger and
 better idiot-proof programs, while the Universe is trying to
 create bigger and better idiots. So far the Universe is winning."
 -- Rich Cook
0
Rudy
6/15/2015 8:34:51 AM
> The character is represented by binary data. No matter what language
> you used.   We can say that the character is binary data in computer.

The characters are binary data if you use assembler and some other low
level languages. However modern languages that are Unicode enabled can
know really well text. Unfortunately Unicode doesn't make your job
easier. Even with a Unicode enabled application it's not an easy task
to support every lanaguage. For example to find upper/lower case for an
unicode code point can be ambigous (is a language dependent).

> Do we know all the language in theuniverse? Obviously not  So some
> languages we don't know can be use any binary data to express. So
> that is why i like STRING in old DELPHI.   We must  leave enough
> space for the unknown world.   

This is already done. There is quite enough space in the unicode table
for future languages.

> If the rtl supply some code , we can be simpler too. But build in
> type should not do so such thing.  e.g.
> <code>
>    UTF8String = record
>    private
>        FCodePage: Integer;
>        FByteString; AnsiString;   
>    public
>       class operator Implicit(Value: AnsiString): UTF8String ;
>       class operator Implicit(Value: WideString): UTF8String ;
>       class operator Implicit(Value: UTF8String ): AnsiString ;
>       class operator Implicit(Value: UTF8String ): WideString;
>    end; 
> </code>
> You see, UTF8String can do such convertion. And AnsiString,
> WideString do not need care about codepage.

Not really. How would you handle a concatenation of a Chinese Ansi
string with a Greek one without writing additional code?


> And i am Chinese. I do not like the full alphabet. And Roy Lambert
> is also do not.  It just like  automatic camera and SLR camera. I do
> not need automatic camera . I want to produce high quality product.

Maybe you doesn't but your program sure is using. Windows controls are
Unicode thus Delphi must communicate with Windows API using PCHAR. I
know there for the most Windows API there is still ANSI version
(unfortunately most of them just converts the input data to Unicode and
executes the Unicode version of the function). Nowadays any code can
really use ANSI strings as input data from some device, file or
database and to send it back to any device, file or database. However
nowadays an ANSI input data is less common.
0
Lajos
6/15/2015 4:21:35 PM
> The characters are binary data if you use assembler and some other low
> level languages. 
Unfortunately, Delphi include Object-oriented Pascal and BASM. And we can use assembler in Delphi. I think this is the biggest difference with you. 
I will not be down the assembler function. I think assembler is important.  And you want discard it.

Now, just think about now solution. Whenever a string assignment to another. The compilered code had to compare the codepage to judge how to convert them.
There is no doubt that 90% of the scene we do not need such comparing. And this is not a rigorous scientific attitude.

Think about RTTI. It also happens that, In 90% of the scene, we do not need RTTI, but we have to include the useless information. And the .exe has become so bloated.
Think about lockable object. In 90% of the scene, we do not need lock the object, but we have to include the hidden field in every object.

The attitude is very harmful.

> This is already done. There is quite enough space in the unicode table
> for future languages.
People always believe that they have found all the solutions. But the truth is always the opposite.
For example, If we find an alien civilization, and the use another encoding string. 
They have a lot of applications and refuse to use Unicode. 
How to com communication with them? Convert now UnicodeString to bytes array?

> Not really. How would you handle a concatenation of a Chinese Ansi
> string with a Greek one without writing additional code?

I don't know what you mean. UTF8String or UTF16String is same as UnicodeString(in now solution). So i do not think this has any problem.

>  ANSI input data is less common.
Less ? There are a lot of equipment still using Ansi. They do not need unicode.
Not all devices need to have a variety of language skills. e.g Router, hub, SCM. And in network communications, in many case we also do not need Unicode.
Do not think Unicode is a silver bullet please.
0
wenjie
6/16/2015 4:11:07 AM
wenjie zhou wrote:

> > The characters are binary data if you use assembler and some other
> > low level languages. 
> Unfortunately, Delphi include Object-oriented Pascal and BASM.

Unfortunately? Huh?

-- 
Rudy Velthuis        http://www.rvelthuis.de

"In all affairs, it's a healthy thing now and then to hang a
 question mark on the things you have long taken for granted."
 -- Bertrand Russell
0
Rudy
6/16/2015 6:26:38 AM
On Mon, 15 Jun 2015 21:11:07 -0700, wenjie zhou <> wrote:

>For example, If we find an alien civilization, and the use another encoding string. 
>They have a lot of applications and refuse to use Unicode. 
>How to com communication with them? Convert now UnicodeString to bytes array?

They probably would not use 8 bit bytes anyway. A byte is just a
randomly selected size of a bit array to represent one unit of word
organization...

A civilization on a planet 100 light years away would with almost
certainty use something completely different if they at all store data
in any way resembling what we do. Maybe they even use a trinary
concept?

Aliens are not a good argument..

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
0
Bo
6/16/2015 8:03:52 AM
> 
> Unfortunately? Huh?
> 

 I mean Delphi has BASM and can use assembler.  Unfortunately, Delphi is he called "some other low level languages".
0
wenjie
6/16/2015 8:10:15 AM
> Aliens are not a good argument..
> 

Yes, i know aliens is not a good argument. I just want to explain that Unicode is not a panacea.
And we should not supply such a panacea to express Text. 
Just think about this :

We have ByteString and WordString. They have only reference count and binday data. Do not include code page. 
They are original type.
And further more, we have smoe records type.

[code]
UTF16_String = record
pirvate
   FCodePage: Integer;
   FStringData: WordString;
public
   .... here, we can define mang implicit convert ...
end;

UTF8_String = record
end;

Ansi_String = record
end;

//You can even define  this
UTF32_String = record
end;

[code]

Maybe, in most case, we can may use UTF16_String instead of String;

e.g.

[code]
var
   S: UTF16_String;
begin
  ShowMessage(S);  // It's OK, because UTF16_String has the convertion to UnicodeString.
end;
[code]
0
wenjie
6/16/2015 8:23:11 AM
wenjie zhou wrote:

> > 
> > Unfortunately? Huh?
> > 
> 
>  I mean Delphi has BASM and can use assembler.  Unfortunately, Delphi
> is he called "some other low level languages".

No, I meant assembler and other low level language. Yes, assembler is
low level.
0
Lajos
6/16/2015 2:12:41 PM
wenjie zhou wrote:


> assembler is important.  And you want discard it.

No I never wrote that assembler should be removed from the language. I
just wrote that there are languages that are low level without a
language for string data type. In those languages your handle strings
as binary data (for example array of bytes).

> 
> Now, just think about now solution. Whenever a string assignment to
> another. The compilered code had to compare the codepage to judge how
> to convert them.  There is no doubt that 90% of the scene we do not
> need such comparing. And this is not a rigorous scientific attitude.

This really depends on your code. You wrote that you're using only
ASCII that means that in your code doesn't matter. I almost never use
the ansi strings, but when I need them it must be code page aware in
order to be able to convert it to Unicode. I know I could go back and
handle it in old fashion and have another variable to hold the code
page for the string. However having two variable for a string can
introduce errors.


> Think about RTTI. It also happens that, In 90% of the scene, we do
> not need RTTI, but we have to include the useless information. And
> the .exe has become so bloated.  

I agree that most of the times the enhanced RTTI bloats the exe.
Unfortunately to use the enhanced RTTI is still on my to do list.

> Think about lockable object. In 90%
> of the scene, we do not need lock the object, but we have to include
> the hidden field in every object.
> 
> The attitude is very harmful.

I disagree. It's a good thing that some objects are ready for multiple
scenarios.
0
Lajos
6/16/2015 2:22:15 PM
wenjie zhou wrote:


> assembler is important.  And you want discard it.

No I never wrote that assembler should be removed from the language. I
just wrote that there are languages that are low level without a
language for string data type. In those languages your handle strings
as binary data (for example array of bytes).

> 
> Now, just think about now solution. Whenever a string assignment to
> another. The compilered code had to compare the codepage to judge how
> to convert them.  There is no doubt that 90% of the scene we do not
> need such comparing. And this is not a rigorous scientific attitude.

This really depends on your code. You wrote that you're using only
ASCII that means that in your code doesn't matter. I almost never use
the ansi strings, but when I need them it must be code page aware in
order to be able to convert it to Unicode. I know I could go back and
handle it in old fashion and have another variable to hold the code
page for the string. However having two variable for a string can
introduce errors.


> Think about RTTI. It also happens that, In 90% of the scene, we do
> not need RTTI, but we have to include the useless information. And
> the .exe has become so bloated.  

I agree that most of the times the enhanced RTTI bloats the exe.
Unfortunately to use the enhanced RTTI is still on my to do list.

> Think about lockable object. In 90%
> of the scene, we do not need lock the object, but we have to include
> the hidden field in every object.
> 
> The attitude is very harmful.

I disagree. It's a good thing that some objects are ready for multiple
scenarios.
0
Lajos
6/16/2015 2:23:16 PM
wenjie zhou wrote:


> assembler is important.  And you want discard it.

No I never wrote that assembler should be removed from the language. I
just wrote that there are languages that are low level without a
language for string data type. In those languages your handle strings
as binary data (for example array of bytes).

> 
> Now, just think about now solution. Whenever a string assignment to
> another. The compilered code had to compare the codepage to judge how
> to convert them.  There is no doubt that 90% of the scene we do not
> need such comparing. And this is not a rigorous scientific attitude.

This really depends on your code. You wrote that you're using only
ASCII that means that in your code doesn't matter. I almost never use
the ansi strings, but when I need them it must be code page aware in
order to be able to convert it to Unicode. I know I could go back and
handle it in old fashion and have another variable to hold the code
page for the string. However having two variable for a string can
introduce errors.


> Think about RTTI. It also happens that, In 90% of the scene, we do
> not need RTTI, but we have to include the useless information. And
> the .exe has become so bloated.  

I agree that most of the times the enhanced RTTI bloats the exe.
Unfortunately to use the enhanced RTTI is still on my to do list.

> Think about lockable object. In 90%
> of the scene, we do not need lock the object, but we have to include
> the hidden field in every object.
> 
> The attitude is very harmful.

I disagree. It's a good thing that some objects are ready for multiple
scenarios.
0
Lajos
6/16/2015 4:00:42 PM
> However having two variable for a string can introduce errors.
The reason for this is not convincing. The UTF8_String and UTF16_String and etc. is supply by VCL. Not by your self.
The two variable is private. 


> I agree that most of the times the enhanced RTTI bloats the exe.
> Unfortunately to use the enhanced RTTI is still on my to do list.
> 

Nobody thought RTTI was bad. And the main problem is to supply a choice for developers. 
If RTTI is stored in a seperated file. Can user can chose wheather to link the RTTI. Is it more better?


And tt is same as lockable object;
Think about this :

TSomeObject = class *lock*
end;

We can supply the key word *lock*. And let developers to judge which object should has the hidden field. But not every class have the field.
That's not difficult. And will not waste memory, will not destroy the memory compatibility with the C++ object. Is it?


I had do some job to export c++ object in .DLL, and let Delphi can use it.
0
wenjie
6/17/2015 1:56:33 AM
> However having two variable for a string can introduce errors.
The UTF8_String and UTF16_String and etc. is supply by VCL. Not by your self.
The two variable is private. How can you introduce errors?


> I agree that most of the times the enhanced RTTI bloats the exe.
> Unfortunately to use the enhanced RTTI is still on my to do list.
> 

Nobody thought RTTI was bad. And the main problem is to supply a choice for developers. 
If RTTI is stored in a seperated file. Can user can chose wheather to link the RTTI. Is it more better?


And tt is same as lockable object;
Think about this :

TSomeObject = class *lock*
end;

We can supply the key word *lock*. And let developers to judge which object should has the hidden field. But not every class have the field.
That's not difficult. And will not waste memory, will not destroy the memory compatibility with the C++ object. Is it?


I had do some job to export c++ object in .DLL, and let Delphi can use it.
0
wenjie
6/17/2015 2:05:30 AM
wenjie zhou wrote:

> > Aliens are not a good argument..
> > 
> 
> Yes, i know aliens is not a good argument. I just want to explain
> that Unicode is not a panacea.  And we should not supply such a
> panacea to express Text.  Just think about this :
> 
> We have ByteString and WordString. They have only reference count and
> binday data. Do not include code page.  They are original type.

You already have those. They are called dynamic arrays. And exactly
those are the best way to manage binary data.


-- 
Rudy Velthuis        http://www.rvelthuis.de

"The internet is not something you just dump something on. It's
 not a truck. It's a series of tubes!"
 -- Sen. Ted Stevens, chairman of the United States Senate
    Committee on Commerce, Science and Transportation
0
Rudy
6/17/2015 8:25:06 AM
wenjie zhou wrote:

> > Aliens are not a good argument..
> > 
> 
> Yes, i know aliens is not a good argument. I just want to explain
> that Unicode is not a panacea.  And we should not supply such a
> panacea to express Text.  Just think about this :
> 
> We have ByteString and WordString. They have only reference count and
> binday data. Do not include code page.  They are original type.

You already have those. They are called dynamic arrays. And exactly
those are the best way to manage binary data.


-- 
Rudy Velthuis        http://www.rvelthuis.de

"The internet is not something you just dump something on. It's
 not a truck. It's a series of tubes!"
 -- Sen. Ted Stevens, chairman of the United States Senate
    Committee on Commerce, Science and Transportation
0
Rudy
6/17/2015 8:50:11 AM
wenjie zhou wrote:

> > Aliens are not a good argument..
> > 
> 
> Yes, i know aliens is not a good argument. I just want to explain
> that Unicode is not a panacea.  And we should not supply such a
> panacea to express Text.  Just think about this :
> 
> We have ByteString and WordString. They have only reference count and
> binday data. Do not include code page.  They are original type.

You already have those. They are called dynamic arrays. And exactly
those are the best way to manage binary data.


-- 
Rudy Velthuis        http://www.rvelthuis.de

"The internet is not something you just dump something on. It's
 not a truck. It's a series of tubes!"
 -- Sen. Ted Stevens, chairman of the United States Senate
    Committee on Commerce, Science and Transportation
0
Rudy
6/17/2015 9:00:08 AM
> You already have those. They are called dynamic arrays. And exactly
> those are the best way to manage binary data.
> 

Dynamic arrays is everything. It can even be used to express integer, float, int64, int128 and etc.  Dynamic arrays can even express a file or memory stream. 
Should we use dynamic arrays to replace integer, float and etc? Obviously not. 

Old style ansistring is a wondful thing. It only care some original things:

 (1) Member is one byte ==>   SomeText[x] is one byte
 (2) Member count  ==> SomeText[0] 
 (3) Reference count 
 (4) Scan the memory by byte ==> Pos(SomeText, 'abc'); 
 (5) Joint string easier ==> Copy memory and change SomeText[0]

It do not care encoding. That's a very simple principle. And very elegant and clear.  

We've had such a problem: we can not easy got string characters in UTF8 string. 
And the *Length* only return bytes but not characters in UTF8 string.
If UTF8 String is really text. Then *Length* should return characters count, but not bytes.
This problem is same as UTF16( UnicodeString ). You see, Here, the expression is very confusing.
Instead of complicating the problem, it might as well not use UnicodeString with codepage. 
Just only express tow bytes string.  That is simple and easy to understand.
0
wenjie
6/18/2015 2:34:17 AM
wenjie zhou wrote:


> assembler is important.  And you want discard it.

No I never wrote that assembler should be removed from the language. I
just wrote that there are languages that are low level without a
language for string data type. In those languages your handle strings
as binary data (for example array of bytes).

> 
> Now, just think about now solution. Whenever a string assignment to
> another. The compilered code had to compare the codepage to judge how
> to convert them.  There is no doubt that 90% of the scene we do not
> need such comparing. And this is not a rigorous scientific attitude.

This really depends on your code. You wrote that you're using only
ASCII that means that in your code doesn't matter. I almost never use
the ansi strings, but when I need them it must be code page aware in
order to be able to convert it to Unicode. I know I could go back and
handle it in old fashion and have another variable to hold the code
page for the string. However having two variable for a string can
introduce errors.


> Think about RTTI. It also happens that, In 90% of the scene, we do
> not need RTTI, but we have to include the useless information. And
> the .exe has become so bloated.  

I agree that most of the times the enhanced RTTI bloats the exe.
Unfortunately to use the enhanced RTTI is still on my to do list.

> Think about lockable object. In 90%
> of the scene, we do not need lock the object, but we have to include
> the hidden field in every object.
> 
> The attitude is very harmful.

I disagree. It's a good thing that some objects are ready for multiple
scenarios.
0
Lajos
6/18/2015 5:19:03 AM
> You already have those. They are called dynamic arrays. And exactly
> those are the best way to manage binary data.
> 

Dynamic arrays is everything. It can even be used to express integer, float, int64, int128 and etc.  Dynamic arrays can even express a file or memory stream. 
Should we use dynamic arrays to replace integer, float and etc? Obviously not. 

Old style ansistring is a wondful thing. It only care some original things:

 (1) Member is one byte ==>   SomeText[x] is one byte
 (2) Member count  ==> SomeText[0] 
 (3) Reference count 
 (4) Scan the memory by byte ==> Pos(SomeText, 'abc'); 
 (5) Joint string easier ==> Copy memory and change SomeText[0]

It do not care encoding. That's a very simple principle. And very elegant and clear.  

We've had such a problem: we can not easy got string characters count in UTF8 string. 
And the *Length* only return bytes but not characters in UTF8 string.
If UTF8 String is really text. Then *Length* should return characters count, but not bytes.
This problem is same as UTF16( UnicodeString ). You see, Here, the expression is very confusing.
Instead of complicating the problem, it might as well not use UnicodeString with codepage. 
Just only express tow bytes string.  That is simple and easy to understand.
0
wenjie
6/18/2015 6:34:57 AM
wenjie zhou wrote:

> > You already have those. They are called dynamic arrays. And exactly
> > those are the best way to manage binary data.
> 
> Dynamic arrays is everything. It can even be used to express integer,
> float, int64, int128 and etc.  Dynamic arrays can even express a file
> or memory stream.  Should we use dynamic arrays to replace integer,
> float and etc? Obviously not.

That's idiotic and doesn't make any sense. 

A dynamic array is, however, very well destined to hold multiples of
the same kind, like your byte or word strings. They are reference
counted too, and you can dynamically change their sizes.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"The artist is nothing without the gift, but the gift is nothing
 without work." -- Emile Zola (1840-1902)
0
Rudy
6/18/2015 7:52:36 AM
Yes. They are reference counted. And can change their size.
How about the string routines ?  Pos(),  LeftStr(), Trim() and etc.  Dynamic array is only a workround. It is not Text or String.

In addition. 
I would like to quote Lajos's  words again :
*"Text is a sequence of characters written in some language. Now text can be represented using ANSI code pages or using unicode."*
The codepage is for *character*. And In theory, a TEXT(string) can contain a variety of codepage *character*.
Based on this analysis, every *character* should has codepage. And the type WideChar should has codepage too.

All the reasons above for String should has codepage. also can prove WideChar should has codepage.
0
wenjie
6/19/2015 2:08:47 AM
wenjie zhou wrote:

> Yes. They are reference counted. And can change their size.
> How about the string routines ?  Pos(),  LeftStr(), Trim() and etc.
> Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

-- 
Rudy Velthuis        http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
0
Rudy
6/19/2015 9:02:42 AM
wenjie zhou wrote:

> Yes. They are reference counted. And can change their size.
> How about the string routines ?  Pos(),  LeftStr(), Trim() and etc.
> Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

-- 
Rudy Velthuis        http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
0
Rudy
6/19/2015 10:57:36 AM
wenjie zhou wrote:

> Yes. They are reference counted. And can change their size.
> How about the string routines ?  Pos(),  LeftStr(), Trim() and etc.
> Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

-- 
Rudy Velthuis        http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
0
Rudy
6/19/2015 11:27:19 AM
wenjie zhou wrote:

> Yes. They are reference counted. And can change their size.
> How about the string routines ?  Pos(),  LeftStr(), Trim() and etc.
> Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

-- 
Rudy Velthuis        http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
0
Rudy
6/19/2015 11:30:23 AM
wenjie zhou wrote:

> Yes. They are reference counted. And can change their size.
> How about the string routines ?  Pos(),  LeftStr(), Trim() and etc.
> Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

-- 
Rudy Velthuis        http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
0
Rudy
6/22/2015 8:59:29 AM
wenjie zhou wrote:

> Yes. They are reference counted. And can change their size.
> How about the string routines ?  Pos(),  LeftStr(), Trim() and etc.
> Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

-- 
Rudy Velthuis        http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
0
Rudy
6/22/2015 9:41:11 AM
6 duplicates....

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
0
Bo
6/23/2015 8:47:10 AM
Bo Berglund wrote:

> 6 duplicates....

Nope. Yesterday, I cancelled some of them.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"I'm so poor I can't even pay attention." -- Unknown
0
Rudy
6/23/2015 12:30:40 PM
Bo Berglund wrote:

> 6 duplicates....

Nope. Yesterday, I cancelled some of them.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"I'm so poor I can't even pay attention." -- Unknown
0
Rudy
6/23/2015 12:32:30 PM
Bo Berglund wrote:

> 6 duplicates....

Nope. Yesterday, I cancelled some of them.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"I'm so poor I can't even pay attention." -- Unknown
0
Rudy
6/23/2015 12:35:09 PM
Bo Berglund wrote:

> 6 duplicates....

Nope. Yesterday, I cancelled some of them.

-- 
Rudy Velthuis        http://www.rvelthuis.de

"I'm so poor I can't even pay attention." -- Unknown
0
Rudy
6/23/2015 12:51:39 PM
Reply:

Similar Artilces:

how can I extract String as UTF-8 under UNIX, my data was in SQL Server Database and stored as Unicode data
------=_NextPart_000_0087_01C4DE09.63F3A3C0 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: quoted-printable when I simply use perl DBI, DBD::ODBC to connect to the database and try = to extract the Unicode string(=20 there is Chinese, Japanese, Korea) string stored in it, the return value = is unreadable. I wish to get UTF-8 string so that those=20 double byte language can be readable. so, what configuration I need to do to the DBI, ODBC ? thanks hugh wang CM Engineer MicroStrategy Inc. ------=_NextPart_000_0087_01C4DE09.63F3A3C0-- ...

Delphi 2009 Unicode String
I have some old (from 2003) units for a freeware component that don't compile in Delphi 2009 for unicode translation error. Can I disable with a directive the unicode and leave the old ansichar? Thanks. Roberto Colpani wrote: > I have some old (from 2003) units for a freeware component that don't > compile in Delphi 2009 for unicode translation error. Can I disable > with a directive the unicode and leave the old ansichar? Thanks. No. Good reasons why this is have been given and discussed already. Can't you simply recompile the code and adjust it thus that...

Using C data structures inside Perl data structures
Hiya, Brian keeps encouraging me to get on the Inline train, so here I am. =) I'm wondering whether there's a better way to embed C data structures inside Perl data structures. I don't need to access them from Perl (I'm using Inline wrappers for that), so I don't think Inline::Struct is the ticket. So far I have this hacky method: ----------------------------------------------------- int create_struct (SV *self, double x, double y) { double *fooey; HV *datahash; fooey = malloc(10*sizeof(double)); fooey[0] = x; fooey[1] = y; ...

Data structure key as a literal string
With the intention of optimization, I am looking for a way around using *eval* in the below snippet at line 19: my $value = eval $key; The objective is to get from $key to $value, knowing that $key is a literal string. Thank you for an insights! #!/usr/bin/perl -w use strict; use Data::Dumper qw(Dumper); my $hash = { 'food' => { 'fruit' => [ { 'name' => 'apple', 'color' => 'red', }, ], }, }; my $key = '...

Problem concatenating Unicode strings in Delphi 2010
I'm new to Unicode so hopefully I am doing something wrong. But here is the problem. I have 2 variables defined as String (Rad Studio 2010 Update 1). var s1, s2, s3: String; begin s1 := 'abcdefgh'; // These are two Arabic strings but s2 := 'ijklmno'; // here is use normal letters so you can see what's happening. s3 := s1 + s2; //Instead of S3 getting "abcdefghijklmno" it gets "abcjklmdefghno" end; So the result is a jumbled string. The project has MultiByte character support set to True and I noticed ...

How to convert GB18030 data to unicode in Delphi 2009?
Hi, I have a firebird database store string of code page GB18030 (54936). I can't directly assign the string field's value to a string variable. It won't work. I use MultiByteToWideChar to perform the conversion and it works. Is this the only way to do it? -- Best regards, Chau Chee Yang E Stream Software Sdn Bhd URL: www.sql.com.my SQL Financial Accounting "Chee-Yang ??? Chau" <=?Utf-8?Q?Chee-Yang_=E5=91=A8=E8=B5=B7=E9=98=B3_Chau?=> wrote in message news:72967@forums.codegear.com... > I have a firebird database store string of code pa...

data, data, data
Hello, I need some advice on the best method to keep a database updated in this scenario. I have a local Solomon SQL Server where we keep our inventory. I want to build an application that will be hosted at a remote hosting location which provides a SQL database. I want to build the application to allow clients to access the inventory items and make requests based on the remote SQL data. The order will be sent to our fulfillment department via email; they will fiill the request and ship.  I'm not sure the method to do this while keeping the data current on the remote and the l...

Storable
--=.2EzJHRU8ZUqN8Q Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable hi, I am using Storable to store/fetch data into/from a database. Is there any way to check if a given string is a frozen data-structure or not? Is there any way to tell? Did i overlook something in the documentation? regards, Arne G=F6deke --=.2EzJHRU8ZUqN8Q Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QVNn7KKSbrL7bNcRAnSdAJ4ojZjDiuoOmSyiuqJioyXudRNmkQCeIRqX Y5ln8pK5DOGy/A6wePM7x5Q= =yp1H ---...

Should I cast a String data type to AnsiString in Delphi 2009
Hi, GetHostByName function in WinSock.pas is declared as: function gethostbyname(name: PAnsiChar): PHostEnt; stdcall; Unfortunately, it don't have unicode version. I have used string data type since Delphi 7 to call GetHostByName: var s: string; begin ... GetHostByName(PAnsiChar(s)); end; When I upgrade to Delphi 2009, I change to : var s: string; a: AnsiString; begin ... a := s; GetHostByName(PAnsiChar(a)); end; But I get a compiler warning: [DCC Warning] Unit17.pas(33): W1058 Implicit string cast with potential data loss from 'string&#...

not able to insert string to binary data and retrieve binary data to string withount using file in sql server
how to insert string into table in the form of binary data and how to retrieve binary data in ms sql in the form of string. Wellcome to the forums dhirendra11:how to insert string into table in the form of binary data and how to retrieve binary data in ms sql in the form of string. As far as storing and retrieving data in binary format is concerned, SQL Server provides BLOB data types ( TEXT, NTEXT, IMAGE ).  You'll have to provide us more details about what exactly you're trying to do and what are the problems that you're facing for us to help you better.Thanks,Dhima...

Data structure for handling Binary Data
Hello, I was wondering if any of you guys could help me out with some insight on building a data structure for sending and receiving binary data. Here is what I am doing: 1. building a tcp client to query a server with data 2. the client sends the binary data stream, and then receives binary data stream from server, and closes the socket 3. I would like to be able to build a structure where I can modify certain bytes (whether decimal, hex, or binary) before sending the stream, such as to create a "message", and then sending the message to the server. 4. When the m...

Convert Unicode string to Readable String
Novell Identity Manager 3.5.1 RedHat Linux enterprise edition. SOAP IDM driver.. When I query on an attribute "firstName" from IDM, the application returns value on that atrribute as the "English\u00c6". It looks like the string comes as a Unicode string.. I need help to reformat that value so that it can be readable for my rules in my driver.. Any help guys? Regards, M. -- love anything that talks binary! ------------------------------------------------------------------------ On Fri, 21 May 2010 08:26:01 +0000, belaie wrote: >...

sending a string data in a query string
based on some operation i do Response.Redirect("~/InvalidUser.aspx?msg=message") where message is a string variable which is assigned to different values based on a particular condition. but in my invalid user page, the variable name 'message' gets displayed instead of the string it contains.. ive tried single and double quotes too.. im not sure how to do this.. can anyone help ??? Use Response.Redirect("~/InvalidUser.aspx?msg="+message)case when answered then mark answered else do nothingend  Don't forget to url encode it and check the length to...

Modify string data base on data in the table
I need to get rid of the first char of data in a field f1 (in sql server) I tried to use: update myTable set f1= substring(f1,1,f1.Len -1) and get error "The column prefix 'f1' does not match with a table name or alias name used in the query. Could anyone help? Thanks. sorry, it works now should be update myTable set f1= substring(f1,1,Len(f1) -1) Try this: update mytable set f1= right(f1,len(f1)-1 ) ...

Web resources about - Unicode strings in structured data - embarcadero.delphi.nativeapi

Structured analysis - Wikipedia, the free encyclopedia
Structured Analysis (SA) in software engineering and its allied technique, Structured Design (SD), are methods for analyzing and converting business ...

Facebook Android App Update Includes Privacy Settings Icon, Structured Status Updates, Ability To Edit ...
... Facebook released for its Android application , and sister blog Inside Facebook reported that the update also extends the availability of structured ...

Cassandra – A structured storage system on a P2P Network - Facebook
Facebook Engineering hat eine Notiz mit dem Titel Cassandra – A structured storage system on a P2P Network geschrieben. Du kannst den vollständigen ...

Facebook’s latest iOS update brings structured status updates, better privacy settings
Structured status updates, previously only available on mobile web and desktop, have found their way to a Facebook native application. The latest ...

Amazon : How are the six-page narratives structured in Jeff Bezos' S-Team meetings?
Answer: Like a dissertation defense: 1) the context or question. 2) approaches to answer the question - by whom, by which method, and their conclusions 3) ...

Boximize: Structured note taking app, personal database, form builder and organizer! on the App Store ...
Get Boximize: Structured note taking app, personal database, form builder and organizer! on the App Store. See screenshots and ratings, and read ...


Water Memory and Structured Water - YouTube
ACQUAPHI™ water activator and fluids revitalizer - www.acquaphi.com ACQUAPHI™ represents a rare example of pure innovation. The "Informationability" ...

Melbourne Cup 2014: Fashions convert goes from kaftan to structured couture
It took a rip in a Camilla kaftan to bring out the winning entry in&#160;the Myer Fashions on the Field competition.

Structured Networking « Above the Law: A Legal Web Site – News, Commentary, and Opinions on Law Firms ...
The news continues to be bad for the impatient “just get me on the internet” types regarding the development of relationships. Regardless of ...

Resources last updated: 12/10/2015 3:41:20 PM