splitting large xml file

I have a very large (200Mb) XML file that consists of multiple records.  I
would like to split these records up and store the XML for each in a
database for quick retrieval.  I simply need to echo all of the XML between
the enclosing record tags into the database.  Ideally, I would use SAX to
parse things, but I can't figure out how to echo the data back out exactly
as I got it.  Any clues?

Thanks,
Sean



0
sdavis2
7/22/2004 9:41:43 PM
perl.beginners 29312 articles. 3 followers. Follow

2 Replies
343 Views

Similar Articles

[PageSpeed] 38

> Ideally, I would use SAX to parse things

Optionally you could look at XML::RAX.

Article on the RAX concept:
http://www.xml.com/pub/a/2000/04/26/rax/index.html

RAX allows you to specify a record seperator (a tag in the XML file), and
splits into into chunks of that tag.  It is stream based so it only reads in
as much of the file it needs to construct the next record.  It only applies
to XML files that fit that type of format though (like RSS).  At the very
least you might find the code helpful.

> but I can't figure out how to echo the data
> back out exactly as I got it.

I'm not sure I completely understand.  Anyway I am out of here today, hope
you find an answer.

Rob


-----Original Message-----
From: Sean Davis [mailto:sdavis2@mail.nih.gov]
Sent: Thursday, July 22, 2004 5:42 PM
To: beginners@perl.org
Subject: splitting large xml file


I have a very large (200Mb) XML file that consists of multiple records.  I
would like to split these records up and store the XML for each in a
database for quick retrieval.  I simply need to echo all of the XML between
the enclosing record tags into the database.  Ideally, I would use SAX to
parse things, but I can't figure out how to echo the data back out exactly
as I got it.  Any clues?

Thanks,
Sean




-- 
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>

0
rhanson
7/22/2004 11:06:12 PM
Rob,

Thanks for replying.  I ended up answering my own question.  I used 
XML::Twig to find chunks I was interested in, could grab indexing 
information from the twig, then save the indices in a database for 
later lookup of the entire XML record and...presto, random-access of 
200 Mb of XML!

Sean

On Jul 22, 2004, at 7:06 PM, Hanson, Rob wrote:

>> Ideally, I would use SAX to parse things
>
> Optionally you could look at XML::RAX.
>
> Article on the RAX concept:
> http://www.xml.com/pub/a/2000/04/26/rax/index.html
>
> RAX allows you to specify a record seperator (a tag in the XML file), 
> and
> splits into into chunks of that tag.  It is stream based so it only 
> reads in
> as much of the file it needs to construct the next record.  It only 
> applies
> to XML files that fit that type of format though (like RSS).  At the 
> very
> least you might find the code helpful.
>
>> but I can't figure out how to echo the data
>> back out exactly as I got it.
>
> I'm not sure I completely understand.  Anyway I am out of here today, 
> hope
> you find an answer.
>
> Rob
>
>
> -----Original Message-----
> From: Sean Davis [mailto:sdavis2@mail.nih.gov]
> Sent: Thursday, July 22, 2004 5:42 PM
> To: beginners@perl.org
> Subject: splitting large xml file
>
>
> I have a very large (200Mb) XML file that consists of multiple 
> records.  I
> would like to split these records up and store the XML for each in a
> database for quick retrieval.  I simply need to echo all of the XML 
> between
> the enclosing record tags into the database.  Ideally, I would use SAX 
> to
> parse things, but I can't figure out how to echo the data back out 
> exactly
> as I got it.  Any clues?
>
> Thanks,
> Sean
>
>
>
>
> -- 
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>

0
sdavis2
7/23/2004 11:55:25 AM
Reply: