Reading any file and converting the encoding to UTF8

Hi,

i want to read a file using TStream (or any like this) class using cbuilder 2009.
i want to use generic approach that means whatever the file is ansi coded or unicode, i want to read it with the same function.
when i look at the help of RAD Studio 2009, i saw C++ Examples for TEncoding...
here is that code (adapted for me)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
	TEncoding *EncodingArray[6];
	EncodingArray[0] = TEncoding::UTF8;
	EncodingArray[1] = TEncoding::UTF7;
	EncodingArray[2] = TEncoding::Unicode;
	EncodingArray[3] = TEncoding::Default;
	EncodingArray[4] = TEncoding::BigEndianUnicode;
	EncodingArray[5] = TEncoding::ASCII;
	//TEncoding *DestEncoding = EncodingArray[0];
	TEncoding *DestEncoding = EncodingArray[ComboBox1->ItemIndex];


	String inFileName = Edit1->Text;
	String outFileName = Edit2->Text;
	String tmpStr = "";

	  // Sample to convert a file of any encoding to UTF8.
	  TEncoding *LEncoding = NULL;
	  std::auto_ptr<TFileStream> LFileStream(new TFileStream(inFileName, fmOpenRead));

	  // Read file into buffer
	  TBytes myBytes;
	  std::auto_ptr<TBytesStream> myBytesStream(new TBytesStream(myBytes));
	  myBytesStream->CopyFrom(LFileStream.get(), LFileStream->Size);

	  // Identify encoding and convert buffer to UTF8
	  int LOffset = TEncoding::GetBufferEncoding(myBytesStream->Bytes, LEncoding);
	  if (LOffset != 0)
	  {
		  myBytes = TEncoding::Convert(LEncoding, DestEncoding,
									   myBytesStream->Bytes,
									   LOffset, myBytesStream->Size-LOffset);

		  // Create output file
		  std::auto_ptr<TFileStream> DestFileStream(new TFileStream("..\\SampleUTF8.txt", fmCreate));

		  // Write UTF8 byte order mark and buffer to output file
		  TBytes LByteOrderMark;
		  LByteOrderMark = TEncoding::UTF8->GetPreamble();

		  // Grab preamble and write to destination
		  DestFileStream->Write(&LByteOrderMark[0], LByteOrderMark.Length);

		  // Write converted buffer
		  DestFileStream->Write(&myBytes[0], myBytes.Length);

	  }
	  else
	  {
			Sleep(1);// Unknown encoding, don't convert.
	  }

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

*and i should say it works very well.(at least i did not try to read all kind of unicoded files but i can to save any file format avaliable)*

*My question is how to use this approach with reading some amount of data into a buffer*
*if file is quite big, then it is time consuming and have to wait a long.*
*instead of this, i want to read into the buffer (this is also required for progressing)*
........
				for (int i = 0; i < inFileStream->Size; )
				{
					iFileRead = inFileStream->Read(Buffer, BufferLength);
					tmpStr = String(Buffer);
					iFileWrite = outFileStream->Write(Buffer, iFileRead);
					i += iFileRead;
				}
........

*as seen above, i want to use TEncoding Class for "Buffer"...*

*how can i modify the example in RAD 2009?*

this is my draft code 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
	TEncoding *SrcEncoding = NULL, *DestEncoding = NULL;

	TFileStream *SrcFileStream = NULL, *DestFileStream = NULL;

	TMemoryStream *SrcMemoryStream = NULL, *DestMemoryStream = NULL;

	String SrcFileName = "", DestFileName = "";

	const int BufferLength = 5;

	Char Buffer[BufferLength];

	__int64	iFileLength, iFilePosition;

	long bytesRead, bytesWrite;

	try
	{
		try
		{
			SrcFileName  = LabeledEdit1->Text;

			DestFileName = LabeledEdit2->Text;

			SrcFileStream = new TFileStream(SrcFileName, fmOpenRead);

			if (SrcFileStream)
			{
				DestFileStream = new TFileStream(DestFileName, fmCreate);

				if (DestFileStream)
				{
					SrcFileStream->Seek(0, soFromBeginning);

					iFilePosition = SrcFileStream->Position;

					iFileLength = SrcFileStream->Size;

							  Memo1->Lines->Clear();

					while ( iFilePosition < iFileLength )
					{
						ZeroMemory(Buffer, BufferLength * sizeof(Char));

						bytesRead = SrcFileStream->Read(Buffer, BufferLength * sizeof(Char));

-------->						Convert2(Buffer, Buffer);

						bytesWrite = DestFileStream->Write(Buffer, bytesRead);

						iFilePosition += bytesRead;

							//Memo1->Lines->Add(String(Buffer));
					}
				}
				else
				{
					//Error
					//DestFileStream = new TFileStream(DestFileName, fmCreate);
                }
			}
			else
			{
				//Error
				//SrcFileStream = new TFileStream(SrcFileName, fmOpenRead);
            }

		}
		catch (...)
		{

		}
	}
	__finally
	{

		if (DestFileStream) delete DestFileStream;

			DestFileStream = NULL;

		if (SrcFileStream) delete SrcFileStream;

		    SrcFileStream = NULL;
	}

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0
Aykut
5/7/2009 1:35:49 PM
embarcadero.cppbuilder.cpp 2803 articles. 0 followers. Follow

3 Replies
2175 Views

Similar Articles

[PageSpeed] 19

<Aykut T> wrote in message news:113956@forums.codegear.com...

>   // Sample to convert a file of any encoding to UTF8.

If the file is entirely text, you can use a TStringList instead.  The 
TStringList::LoadFromStream() method (which LoadFromFile() calls internally) 
automatically detects the file's encoding (if you do not explitically 
specify one) and then decodes the text to UTF-16 when storing the strings 
into the list.  You can then save the list to a new file as UTF-8.  For 
example:

{code:cpp}
std::auto_ptr<TStringList> LStringList(new TStringList);
LStringList->LoadFromFile(inFileName);
LStringList->SaveToFile("..\\SampleUTF8.txt", TIdTextEncoding::UTF8));
{code}

-- 
Remy Lebeau (TeamB)
1
Remy
5/7/2009 4:52:28 PM
> {quote:title=Remy Lebeau (TeamB) wrote:}{quote}
> <Aykut T> wrote in message news:113956@forums.codegear.com...
> 
> >   // Sample to convert a file of any encoding to UTF8.
> 
> If the file is entirely text, you can use a TStringList instead.  The 
> TStringList::LoadFromStream() method (which LoadFromFile() calls internally) 
> automatically detects the file's encoding (if you do not explitically 
> specify one) and then decodes the text to UTF-16 when storing the strings 
> into the list.  You can then save the list to a new file as UTF-8.  For 
> example:
> 
> {code:cpp}
> std::auto_ptr<TStringList> LStringList(new TStringList);
> LStringList->LoadFromFile(inFileName);
> LStringList->SaveToFile("..\\SampleUTF8.txt", TIdTextEncoding::UTF8));
> {code}
> 
> -- 
> Remy Lebeau (TeamB)

Hi and thank you...

just i tried your code.it works perfectly...
here is my code :

{code:cpp}
	TEncoding *EncodingArray[6];
	EncodingArray[0] = TEncoding::UTF8;
	EncodingArray[1] = TEncoding::UTF7;
	EncodingArray[2] = TEncoding::Unicode;
	EncodingArray[3] = TEncoding::Default;
	EncodingArray[4] = TEncoding::BigEndianUnicode;
	EncodingArray[5] = TEncoding::ASCII;
	//TEncoding *DestEncoding = EncodingArray[0];
	TEncoding *DestEncoding = EncodingArray[ComboBox1->ItemIndex];


	String inFileName = Edit1->Text;
	String outFileName = Edit2->Text;
	String tmpStr = "";

	// Sample to convert a file of any encoding to UTF8.
	TEncoding *LEncoding = NULL;
	std::auto_ptr<TStringList> LStringList(new TStringList);
	LStringList->LoadFromFile(inFileName);
	LStringList->SaveToFile(outFileName, DestEncoding);
{code}


but as i wondered before, how it behaves for big-size docs.
so i just found some russian text from web and i saved it into a file...
by using the small but powerfull code snipts, i read it and wrote in any kind format.
then i changed the approach...
i copied the contents of this file to a Memo and i inserted that text into the Stringlist about 150000 times and 
it gaved me about 275 MB - sized file.

{code:cpp}
	for (int i = 0; i < 150000; i++) {
		LStringList->AddStrings(Memo1->Lines);
	}
       
	LStringList->SaveToFile(outFileName, DestEncoding);
{code}


then i selected this file as input and run the program...
it works as before except, while reading and writing the program freezes...
i know this example is extreme case but i wanted to show what i want...
i want to eliminate those freezing issues... 

what should i do?

the second issue is:

> TStringList::LoadFromStream() method (which LoadFromFile() calls internally) 
> automatically detects the file's encoding (if you do not explitically 
> specify one) and then decodes the text to UTF-16 when storing the strings 
> into the list. 

Like TStringList::LoadFromStream(), can the other streams automatically detects the file's encoding?
( TfileStream , TMemoryStream,...)

thanks
0
Aykut
5/8/2009 8:28:14 AM
<Aykut T> wrote in message news:114275@forums.codegear.com...

> it works as before except, while reading and writing the program 
> freezes...

As it should be, since you are calling it in the main thread, and it is 
taking awhile to load and process that much data.

> i want to eliminate those freezing issues...
>
> what should i do?

The simpliest way is to move the code into a worker thread.  That won't 
change the fact that a lot of memory has to be used to process such large 
files, though.  If memory usage is an issue for you, then you will have to 
choice but to read and convert the data in smaller encoding-aligned chunks 
manually.

> Like TStringList::LoadFromStream(), can the other streams automatically
> detects the file's encoding?

No, nor do they need to, since they operate on raw binary data, not string 
data specifically.

-- 
Remy Lebeau (TeamB)
0
Remy
5/8/2009 5:21:13 PM
Reply:

Similar Artilces:

How do I convert this file read code?
Basically I want to accomplish what this PHP code does in ASP.NET because that is what the site is built on. So if anyone can acommplish similiar, I would appreciate it! I dont need the string to be put in an array, I just need the first line of that csv file to be put into a variable - then I can manipulate from there! PHP Code:&lt;?php</P> <P>$fd = fopen ("<A href="http://quote.yahoo.com/d/quotes.csv?s=AAPL&amp;f=sl1d1t1c1ohgv&amp;e=.csv">http://quote.yahoo.com/d/quotes.csv?s=AAPL&amp;f=sl1d1t1c1ohgv&amp;e=.csv</A>", "...

How can I convert a file to UTF8?
Hi, How can I convert a file to UTF8? Exmaplecode is appriciated. jm Am 05.10.2011 11:29, schrieb Johann Mühlbauer: > Hi, > > How can I convert a file to UTF8? > > Exmaplecode is appriciated. > > jm Hi, UTF-8 is only for parameters and Text and XML. It makes no sence to try to convert a .jpg file to UTF-8. What does this have to do with sockets? Hi Johann On the: 05. of oktober-2011 At: 11:29 Johann Mühlbauer wrote: > Hi, > > How can I convert a file to UTF8? > > Exmaplecode is appriciated. Which verison of...

Eof of file , file read and file write ! Problem !
Hello All, I want to do 2 things. I have a big file and I want to read file line by line upto last. and side by side i want to write it to new file with some change I am not getiing , in a read , linemode! how i will get End of file and how i will do all this. can you tell me script. I'm not sure I understand your question. From the PowerBuilder Help file, in linemode! FileRead() will return a 0 when it reaches End of File (EOF) so it should be trivial to code? This is just off the cuff but I would imagine it's something like li_Readfile = FileOpen( ...

UTF8, UTF-8, utf8, Utf8 encoding blues
Hi All, I'm reading loads, and loads of very confusing and contradicting information about UTF8 in Perl. A lot of posts are also (rightfully IMHO) stating that UTF8 is an absolute nightmare in Perl. Can someone shed some light as to what is going on here please: use Encoding; SysLog("debug", "1 - DEBUG LENGTH: " . length($Response)); my $unicode_chars = Encode::decode('utf8', $Response); SysLog("debug", "** ENCODING: " . find_encoding($Response)); my $newunicode_chars = substr($unicode_chars, 0, -3); my $Body = $newunicode...

Converting utf8 data to base64 encoding
------=_NextPart_000_0096_01C32448.4CEFB910 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi I have been encountering a problem with the MIME::Base64 = encode_base64 function. Whenever I try to print a (non-ASCII) utf8 = string, I get the error "Wide character in print". Can you help me in = sorting out this problem? Thanks.. Lavanya ------=_NextPart_000_0096_01C32448.4CEFB910-- ...

Convert Gridview Data in to PDf file. or Convert report data in pdf file
 Hi have problem to convert Gridview data into PDF file..let me know is there any solution or open source components..Regards Rambhopal Reddy EPlease remember to click “Mark as Answer” on the post that helps you, and to click “Unmark asAnswer” if a marked post does not actually answer your question. Friend, Check this thread for soultions http://forums.asp.net/t/1164793.aspx and then to http://www.aspnetworld.com/articles/2004011801.aspx   Good luck!     Please Don't forget to click "Mark as Answer" on the post that helped you.This can be benef...

Reading Groupwise encode WPC files?
Not sure where to post this, but I figured I'd ask here since the MTA is where the file is converted to this format. Incoming GWIA - Receive files are readable with any text editor, but once they are handed off to the MTA they are encoded with the WPC at the begining and the rest is garbage to a text editor. I had some mail stuck in the GWVSCAN folders and needed to rename MSLOCAL. I would like to look at the mail to see if anything is important before I try to feed it back into the system. Is there any thing that will allow me to read in this format? Thanks, John Puskar ...

[noob] processing utf8 encoded files?
Hello, I'm trying to process some UTF-8 encoded files (wikipedia's extracts) =20= through Text::MediawikiFormat. Works rather fine as far as the HTML convertion goes, except that the =20= character set encoding gets lost along the way, e.g. what used to be =20 properly UTF-8 encoded Russian (=D0=B2=D1=8B=D1=87=D0=B8=D1=81=D0=BB=D0=B8= =D1=82=D0=B5=D0=BB=D1=8C=D0=BD=D0=B0=D1=8F =D0=BC=D0=B0=D1=88=D0=B8=D0=BD=D0= =B0) gets mangled =20 into =E2=88=9A=C3=AA=C2=AC=E2=89=A4=E2=88=9A=C3=AB&... Here is what I presently do: use File::Find; use File::Slurp; use Text::...

utf8::upgrade,utf8::encode and utf8::is_utf8 on EBCDIC platform
Hi, This are the tetstcase i'm runing on EBCDIC platform, my $b = chr(0x0FF); $p=utf8::upgrade($b); print "\n$p"; utf8::upgarde returns the number of octets necessary to represent the string as UTF-X. EBCDIC output is 1 whereas ASCII platform output is 2. Is the return value i'm getting on EBCDIC is correct? my $c=chr(0x0FF); print "before $c\n"; print "\n"; utf8::encode($c); print "after $c\n"; print length($c); On ASCII before is single octet repsentation and after encode is two byte , length is 2. On EBCDIC it...

Reading files from file server
Hi all,I need help reading data from files . I am storing files in the file system and  unable to read the file from the file system. I am stroing doc and PDF files in my file system. Can any one help me in reading the  files back from the file system...  Here is my code  If (uploadPDF.HasFile = False Or uploadWord.HasFile = False) Then                           ' No file uploaded!&nbs...

reading iTunes-encoded AAC file info
Hi, Does anyone here knows about a perl module that can read file info from the MPEG-4 AAC files that iTunes encodes and manages? I didn't try using MP3::Info but I'm looking for something along the lines of MP3::Info. Thanks, Antoine -- Antoine Quint <aq@fuchsia-design.com> W3C SVG Working Group Invited Expert SVG Consulting, Teaching and Outsourcing Fuchsia Design <http://www.fuchsia-design.com/> ...

Reading ASCII Text / Convert to UTF8 (Problem)
I have a file in ASCII format and I need to parse all of it's contents into a database. Unfortunatly it's an ASCII file and some special characters (like the german ü, &uuml; in HTML) aren't be written into the database correctly. Can anybody help me? I already tried some Encoding converting (see code below) but it doesn't help. Special characters are still (even in the MessageBox) shown as questionmarks and so on. srInput = new StreamReader (source,Encoding.ASCII); Encoding ascii= Encoding.ASCII; Encoding utf8 = Encoding.UTF8; byte[] asciiBytes=utf8.GetBytes(szSr...

Converting files into .rar file
Hi  Is there  any possible way to convert file into rar file.I have a folder, which contains files, by the way of using .net(regardless any language) program, all of the files need to be converted into .rar filePlz let me know the solution as soon as possible Regards,Appu    In .net 2.0 (and following edition) we have a very powerful GZipStrream class which we use to zip the file [but extrension will be .gzip]Kamran ShahidSr. Software Engineer(MCP,MCAD.net,MCSD.net,MCTS,MCPD.net[web])Netprosys Inc.www.netprosys.comRemember to click "Mark as Answer" o...

create utf8 encoded file using streamwriter?
HiCan someone point me in the right direction to creating a UTF8 encoded text file? Here is my sub so far: Sub messageToFile(ByVal fileName As String, ByVal fileExtension As String, ByVal message As String, ByVal dName As String) Dim fp As StreamWriter Try fp = File.CreateText(dName & "\" & fileName & fileExtension) fp.WriteLine(message) lblStatus.Text = "Success" fp.Close() Catch err As Exception lblStatus.Text = "Error: " & err.ToString() ...

Web resources about - Reading any file and converting the encoding to UTF8 - embarcadero.cppbuilder.cpp

Facebook Begins Converting Users To HTTPS
Are you willing to sacrifice a little bit of speed for a lot more safety? Facebook is asking that very question with its announcement that it ...

Facebook No Longer Converting Groups Into Pages
Back when Facebook first launched Facebook Pages, many businesses and brands who had built up substantial audiences in their Facebook Groups ...

Vert - simply converting for iPhone, iPad, and iPod touch on the iTunes App Store
Get Vert - simply converting on the App Store. See screenshots and ratings, and read customer reviews.

Converting SIM Card to Micro SIM Card - Flickr - Photo Sharing!
Place new Micro SIM into the iPhone SIM card tray

Ayaan Hirsi Ali on Converting Muslims to Christianity - YouTube
Complete video at: http://fora.tv/2010/07/29/Nomad_From_Islam_to_America_with_Ayaan_Hirsi_Ali Ayaan Hirsi Ali explains her support of missionary ...

Click go fears of converting print files
Is there a way to convert a print queue item to a .RTF or .PDF file? I like to save or email them. - The Sydney Morning Herald

Sudanese woman ordered to hang under sharia law for converting to Christianity gives birth
Khartoum, Sudan: A Christian Sudanese woman sentenced to hang for apostasy has given birth in jail, a Western diplomat said on Tuesday.

Imams warn against radicalism to Aboriginal inmates converting to Islam
The prison system has enlisted the help of ASIO to crack down on radicalisation behind bars amid revelations that Aboriginals are converting ...

Converting the world's companies one by one - The Science Show - ABC Radio National (Australian Broadcasting ...
Image: Trucks carrying logs make their way up a road in Jambi, Indonesia. A vast area of the Sumatran forest, and orangutan habitat, is being ...

Rothesay building new arena, converting existing rink to fieldhouse
The Town of Rothesay plans to build a new arena and convert the existing one into a fieldhouse.

Resources last updated: 1/20/2016 9:42:53 AM