Create new "outfile" foreach line in "inputfile"

------_=_NextPart_001_01C4D63A.3E1D1140
Content-Type: text/plain;
	charset="iso-8859-1"

Hi All, 
 
I have a urls.txt file that contains a different url on each line (100
urls).  And I have a item_numbers.txt file (100 items).  I want to create a
new outfile.txt, named w/ the corresponding item_number, each time the
urls.txt file passes through the loop.   Can someone please let me know
where I can read about this...  Is this something I need to work into a
hash?  I have a working script (screen scraping) but it is only for one url
and one outfile. 
 
Any direction would be greatly appreciated.
 
Thanks! 
 
Brian Volk
HP Products
317.298.9950 x1245
 <mailto:bvolk@hpproducts.com> bvolk@hpproducts.com
 
 

------_=_NextPart_001_01C4D63A.3E1D1140--
0
BVolk
11/29/2004 5:38:31 PM
perl.beginners 29378 articles. 3 followers. Follow

1 Replies
490 Views

Similar Articles

[PageSpeed] 38

Jay,

SUCCESS!  Thank you for your time and expertise.  The way you walked me
through the loop step by step was a great learning experience... I was even
able to figure out an error at the end... (my fault I'm sure... but none the
less, I'm learning!)

Thanks again!  Here is the working loop..  I needed to print OUT to $lgdesc



while (@urls) {
   my $url = shift(@urls);
   chomp $url;
   my $file = shift(@items);
   chomp $file;

   my $page = get($url);

   # insert your code to parse whatever you want here,
   # or write a function to call here  

   my $parser = HTML::TokeParser::Simple->new(\$page) 
 	or die "Could not parse page";

   # This will get the 10th table in the source code
   my  ($tag, $attr);
   $tag = $parser->get_tag("table") foreach (1..10);

   # This will get the 11th instance of <tr><td
   $parser->get_tag("tr") foreach (1..11);
   $parser->get_tag("td");
   my $lgdesc = $parser->get_text();

   open(OUT, ">", $file) or die "can't open $file:$!";
   print OUT $lgdesc;
   close (OUT);
}



> -----Original Message-----
> From: Brian Volk 
> Sent: Tuesday, November 30, 2004 1:47 PM
> To: 'daggerquill'
> Subject: RE: Create new "outfile" foreach line in "inputfile"
> 
> 
> Jay,
> 
> Thank you so much for your help...  I will get started right 
> away!  If I have anymore questions, I usually do.. :~) , I 
> will post them to the mailing list.
> 
> Thanks again!
> 
> Brian 
> 
> > -----Original Message-----
> > From: daggerquill [mailto:daggerquill@gmail.com]
> > Sent: Tuesday, November 30, 2004 1:14 PM
> > To: Brian Volk
> > Subject: Re: Create new "outfile" foreach line in "inputfile"
> > 
> > 
> > On Tue, 30 Nov 2004 12:14:09 -0500, Brian Volk 
> > <bvolk@hpproducts.com> wrote:
> > > Great, thanks!  I'm reading Chapter 5 "Hashes" in the 
> Lama book, I'm
> > > thinking that might be what I need to do, just not real 
> > sure how.. :~).
> > > What I want to do is, read the first line of the urls.txt 
> > (100 urls) ( Right
> > > now I just have a single url in the script).  Then get the 
> > text w/ $parser
> > > ->get text.  Then name the file w/ the first item number in the
> > > item_numbers.txt file. (For now I'm just printing the text 
> > w/ a filehandle).
> > > 
> > > -----  begin -----
> > > 
> > > # This program will get the large description from the KC 
> web site.
> > > 
> > > #!/usr/bin/perl -w
> > > 
> > >  use strict;
> > >  use HTML::TokeParser::Simple;
> > >  use LWP::Simple;
> > > 
> > >  my $url =
> > > 
> > "http://www.kcprofessional.com/us/product-details.asp?search=v
> > 1&searchtext=1
> > > 804&x=0&y=0";
> > >  my $page = get($url)
> > >         or die "Could not load URL\n";
> > > 
> > > # Create file to store large description
> > >  open LGDESC, "> largedecs.txt"
> > >         or die "Cannot open largedecs.txt for writing: $!";
> > > 
> > >  my $parser = HTML::TokeParser::Simple->new(\$page)
> > >         or die "Could not parse page";
> > > 
> > > # This will get the 10th table in the source code
> > >  my  ($tag, $attr);
> > >  $tag = $parser->get_tag("table") foreach (1..10);
> > > 
> > > # This will get the 11th instance of <tr><td
> > >  $parser->get_tag("tr") foreach (1..11);
> > >  $parser->get_tag("td");
> > >  my $lg_desc = $parser->get_text();
> > > 
> > >  print  LGDESC "$lg_desc, \n";
> > > 
> > >  close LGDESC;
> > > 
> > > ---- end --------------
> > > 
> > > Thank you!
> > > 
> > > Brian Volk
> > > 
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: daggerquill [mailto:daggerquill@gmail.com]
> > > > Sent: Tuesday, November 30, 2004 12:04 PM
> > > > To: Brian Volk
> > > > Subject: Re: Create new "outfile" foreach line in "inputfile"
> > > >
> > > >
> > > > On Mon, 29 Nov 2004 12:38:31 -0500, Brian Volk
> > > > <bvolk@hpproducts.com> wrote:
> > > > > Hi All,
> > > > >
> > > > > I have a urls.txt file that contains a different url on
> > > > each line (100
> > > > > urls).  And I have a item_numbers.txt file (100 items).  I
> > > > want to create a
> > > > > new outfile.txt, named w/ the corresponding item_number,
> > > > each time the
> > > > > urls.txt file passes through the loop.   Can someone please
> > > > let me know
> > > > > where I can read about this...  Is this something I need to
> > > > work into a
> > > > > hash?  I have a working script (screen scraping) but it is
> > > > only for one url
> > > > > and one outfile.
> > > > >
> > > > > Any direction would be greatly appreciated.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Brian Volk
> > > > > HP Products
> > > > > 317.298.9950 x1245
> > > > >  <mailto:bvolk@hpproducts.com> bvolk@hpproducts.com
> > > > >
> > > > >
> > > >
> > > > Brian,
> > > >
> > > > Let us see the script you have, and we can help you work it
> > > > into a loop.
> > > >
> > > > --jay savage
> > > >
> > > 
> > 
> > 
> > Brian,
> > 
> > You're definitely on the right track.  You could use hashes, but
> > assuming both files are in the correct order--the second line of the
> > number file goes with the second url, you can just use 
> arrays, as I've
> > done below.  This is a simple while loop that will call
> > LWP::SIMPLE::get to read the page and save it.  You'll need 
> to go back
> > and add anything you want to do with HTML::TokenParser, but this
> > should give you a pretty good idea of one way to loop through the
> > lists and files.
> > 
> > #!/usr/bin/perl
> > use strict;
> > use warnings;
> > use LWP::Simple;
> > 
> > my $urlfile = "urls.txt" ;
> > my $numfile = "item_numbers.txt" ;
> > 
> > open(URL, "<", $urlfile) or die "couldn't read urls: $!";
> > open(NUM, "<", $numfile) or die "couldn't read numbers: $!";
> > 
> > my @urls = <URL> ;
> > my @numbers = <NUM>;
> > 
> > close(URL);
> > close(NUM);
> > 
> > while (@urls) {
> >    my $url = shfit(@urls);
> >    chomp $url;
> >    my $file = shift(@numbers);
> >    chomp $file;
> > 
> >    my $page = get($url);
> > 
> >    # insert your code to parse whatever you want here,
> >    # or write a function to call here  
> > 
> >    open(OUT, ">", $file) or die "can't open $file:$!";
> >    print OUT $page;
> >    close (OUT);
> > }
> > __END__
> > 
> > As you work with it, I'm sure you'll see some places to simplify it
> > (maybe "while (<URL>)"?), but the basic idea is to process the two
> > lists in parallel.  One thing that's important to remember in
> > situations like this is to use shift and unshift rather than pop and
> > push: While you've probably made sure that beginnings of 
> the files are
> > in good order, the ends may have whitespace and blank lines 
> that might
> > cause a mismatch.
> > 
> > HTH,
> > 
> > --jay
> > 
> 
0
BVolk
11/30/2004 8:02:34 PM
Reply:

Similar Artilces:

.ALLCOL("%COLUMN%", " ", ", ", ", ")
Do you know anyway for me to exclude a subset of columns returned by this function. We have two columns (rec_user and rec_datetime) which are in all of our tables, but when generating triggers I want automatically generate a script which does not include those two columns but does include all other columns in that table. Bruce I should add that I am using PD 9.0.0.580. Bruce "Bruce Lamb" <lamb.bruce@mayo.edu> wrote in message news:6HgI315nCHA.155@forums.sybase.com... > Do you know anyway for me to exclude a subset of columns returned by this > function. ...

Precedence of "where" ("of", "is", "will")?
Nobody on #perl6 today could answer this one. Is: Str | Int where { $_ } the same as: (Str | Int) where { $_ } or: Str | (Int where { $_ }) ? Followup questions, Mr. President: What kind of operators are "where", "of", "is", and "will"? Is there a reason that S03 doesn't list them? What are their precedence(s)? -- Chip Salzenberg - a.k.a. - <chip@pobox.com> Open Source is not an excuse to write fun code then leave the actual work to others. Chip Salzenberg writes: &...

quotes, quotes, quotes...
I am getting this error and I know what is causing it, but I have no idea how to fix it, any help would be great. The script steps through the /var/log/messages file on a linux server and puts The entries into a mysql database. However when it gets to the 'hlt' line in the messages file it just barfs. The single quotes are freaking it out. I know about quotes but not how to use in this situation. Thanks, Paul Error: May 27 17:53:00 localhost kernel: Checking 'hlt' instruction... OK. <----- doesn't like this in the messages file DBD::mysql::st exec...

What is the difference between "for " and "foreach"
I always confused when should I use "for" and when foreach should be used...when we talking about efficient which one is better to use?thank you  its quite the same just a different way howto, never done or read about big differences about the both options:  string [] n = new string[] {"test1", "test2"} foreach(string a in n)   .... for(int i=0;i< i<n.Length;++i)   ....regards, roni---speed up your applications with distributed caching or replicated caching: http://www.sharedcache.com - its free! thank you , How about efficien...

"Me" is better than "You"
Yes I know, strings are frozen. But let me talk about it, I really can't get through the idea of a PC talkin to me. I consider my PC as an extension of myself, not a dumb companion who addresses Me as You. Yes there are times when I get angry with Him while I work and get wrong calculations etc.., but it really is my fault, Me using wrong istructions and eventually wanting to find someone else to blame, but it's Me. And yes, I consider Thunderbird my mail program, reading my mail on my PC as Me. So I personally like to have Me in the header bar as a compact address ...

"Using" or "With"
Hi all Please can someone enlighten to me as regards the difference with the "Using" and "With" statement when accessing data - which is better, what are the limitations and/or any pointers. Many thanks. Regards DaveDavid WinchesterPlease mark as answer if this is the solution.  using gives you the ability to use the connection and it closes the connection directlly after you finish using it. and there is no need to try- cach - finaly. there is no limitation on using USING keywordMuhanad YOUNISMCSD.NETMy Blog || My Photos || LinkedIn I have a dataobject the re...

"To" and "From" missing
When I print emails, the words "To" and "From" are blank, even though the "To" name and "From name (addresser, addressee) do show up. This is not a problem for other users on my system. Suggestions In mailbox right click, view. On the message window, right click and choose print options. Make sure print header is checked. -- Barry Merchant NSC Volunteer SysOp *** no email unless requested please!! *** > In mailbox right click, view. On the message window, right click and > choose print options. Make sure prin...

double quote
hello there...  i tried everything of think but not working the way i wanted to be... not sure what i'm missing...i'm generating a <span> in code behind and then using in javascript.... here is what i'm doing code behind: int i=0string _keywordID = "keyword";string _name = row["visit_info_nm"].ToString().Trim(); String _getElementByID = String.Format("<span id='{0}' OnClick = \"document.getElementById('{1}').value='{2}';\">{3}</span><br>", i, _keywordID, _name, _name); here is what it generate : <span id='1' OnClick = \"document.getElementById('keyword')...

Using "+" or "||"
Using SQLAnywhere 5.5.04, I've gotten into the habit of using "||" in ISQL to indicate a string concatenation. I needed to paste my SQL statement into the PowerBuilder script painter for some embedded SQL, and PB didn't like the "||" very much at all. I changed it to "+" and it seems to be ok. Do these two operators indicate ~exactly~ the same thing? moin, afaik these two's are not the same! if you're using "||" and any term is NULL then in the resultstring the term will be ignored if you use "+" then the resu...

Replacing "\\" with "\"
Hi all I'm getting this value from a CheckBoxList control - a location of file, i have to remove "\\" and replace it with "\" and pass it to Query, how to do it, i tried with Replace, but coud'nt suceed. "\\\\Blaze10xp\\BLZ_SFS_07\\Sample Excel Files\\Excel Files\\report2.xls" thank's in advance - Prakash.C you tried Replace like this? string newstring = oldstring.Replace(@"\\",@"\");Plese, do not forget to click "Mark as Answer" on the post that helped you. Thanx!My blog: Scenes From A Developer Memory yes i tr...

Difference between "Creating new website" and "Creating new web project"
Hello (Using VS2008) I would like to know the difference between "Creating a New Website" and "Creating a New Web project". Thanks a lot. Hi, the first one creates a web site using the web site project model (default in VS2005) which is based on that you don't have a standalone project file, but project contents are defined by physical directory structure. Basically a web site project has also slightly different compilation model, for example not everything in the project is necessarily built into single assembly, and you don't necessarily ha...

"-" not "_"
I wrote a SQL statement in the data tab. I wrote a bunch of alaises as example ' word-type ' but when I hit the layout tab it converts the "-" to "_". So now my field name is ' word_type '. Is there any way to prevent this? CardGunner Don' use a hypen ( - ).  It isn't a valid character for column names.   See http://searchsqlserver.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid87_gci1188931,00.html   Here's an excerpt about column names: Letters as defined in the Unicode Standard 2.0 Decimal numbers from either B...

replace the "." with a ","
Oi.... I need to build a small programm in ASP.NET and chose to use C# for it.Now i got everything working but there's one little problem.the first textbox is a double. I need to make it so that when someone enters a "." then it gets replaced by a ","any ideas?Ghan  string blah = "4.2.2.2";blah = blah.Replace(".", ",");Ryan Ryan OlshanASPInsider | Microsoft MVP, ASP.NEThttp://ryanolshan.comHow to ask a question...

Regular Expression to remove "/", "\", "<", ">" and "="
Can anyone please show me the regular expression to reject a string ("<blue", "right>" etc.) which has the following expression in it: "/", "\", "<", ">" and "="  hi, It may Help u.. it is in Class file u may use this expressin in validation controls also. Regex objReg = new Regex(@"^[^,.?/\~|`;:'<>]*$", RegexOptions.Singleline); Regex objReg = new Regex(@"^[^,][^.][^?][^/][^\][^~|][^`][^;][^:][^'][[^<][^>]$", RegexOptions.IgnoreCase);Thanks &...

Web resources about - Create new "outfile" foreach line in "inputfile" - perl.beginners

Resources last updated: 12/28/2015 6:16:36 AM