getting list of all .html files in a directory and its directories

I need to get a list of all the files that end with '.html' in a directory and 
all of its subdirectories. I then want to search through each file and remove 
the ones from the list that contain '<%perl>' or '<%init>'. How can I do this? 
Thanks for any help.

-- 
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548

0
agaffney
7/30/2004 6:32:44 PM
perl.beginners 29368 articles. 3 followers. Follow

9 Replies
709 Views

Similar Articles

[PageSpeed] 40

On Fri, 30 Jul 2004, Andrew Gaffney wrote:

> I need to get a list of all the files that end with '.html' in a 
> directory and all of its subdirectories. I then want to search through 
> each file and remove the ones from the list that contain '<%perl>' or 
> '<%init>'. How can I do this? Thanks for any help.

From a Unix command line, you could do something like this:

     $ find /path/to/htdocs -type f | xargs egrep -li '<%(perl|init)>'

The above line results in a list of all the files that have either 
'<%perl>' or '<%init>' in them.

From here, you can o a step further by deleting them all. Because files 
with spaces in their name (or their path) can break this horribly, I'll 
use `sed` to wrap each line in quotes before removing them:

     $ find /path/to/htdocs -type f | \
     > xargs egrep -li '<%(perl|init)>' | \
     > sed 's/\(.*\)/"\1"/' | \
     > xargs rm -i

This should also prompt you before taking any action, in case you 
realize that you really wanted one of these files. If you want to just 
proceed blindly -- and my but you're brave if you do -- then delete the 
"-i" from the last line.


Of course, you probably wanted to do this in Perl, but sometimes things 
are just as easy to do with shell tools, and this seems like a good 
example. Unless you want to do this all the time -- in which case go 
ahead & script it in Perl -- a shell one liner like this should be fine.



And of course this all breaks down if you're using Windows, in which 
case unless you're a fan of Cygwin you can just ignore all of this :)



-- 
Chris Devers
0
cdevers
7/30/2004 6:43:46 PM
Chris Devers wrote:
> On Fri, 30 Jul 2004, Andrew Gaffney wrote:
> 
>> I need to get a list of all the files that end with '.html' in a 
>> directory and all of its subdirectories. I then want to search through 
>> each file and remove the ones from the list that contain '<%perl>' or 
>> '<%init>'. How can I do this? Thanks for any help.
> 
> 
>  From a Unix command line, you could do something like this:
> 
>     $ find /path/to/htdocs -type f | xargs egrep -li '<%(perl|init)>'
> 
> The above line results in a list of all the files that have either 
> '<%perl>' or '<%init>' in them.
> 
>  From here, you can o a step further by deleting them all. Because files 
> with spaces in their name (or their path) can break this horribly, I'll 
> use `sed` to wrap each line in quotes before removing them:
> 
>     $ find /path/to/htdocs -type f | \
>     > xargs egrep -li '<%(perl|init)>' | \
>     > sed 's/\(.*\)/"\1"/' | \
>     > xargs rm -i
> 
> This should also prompt you before taking any action, in case you 
> realize that you really wanted one of these files. If you want to just 
> proceed blindly -- and my but you're brave if you do -- then delete the 
> "-i" from the last line.

I think you misunderstand. I don't want to delete the files that contain 
'<%perl>' or '<%init>'. I just want to make a list of all .html files in a 
directory tree and remove the ones that contains '<%perl>' or '<%init>' from my 
list.

-- 
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548

0
agaffney
7/30/2004 7:05:12 PM
On Fri, 30 Jul 2004, Andrew Gaffney wrote:

> I think you misunderstand. I don't want to delete the files that 
> contain '<%perl>' or '<%init>'. I just want to make a list of all 
> .html files in a directory tree and remove the ones that contains 
> '<%perl>' or '<%init>' from my list.

Then yes, I misunderstood. This version should do what you want:

     $ find /path/to/htdocs -type f | xargs egrep -liv '<%(perl|init)>'

It's exactly like the first one I sent before, but I've added "-v" to 
the egrep arguments, which inverts the meaning from "all files with this 
pattern" to "all files NOT with this pattern". In this case, that's what 
you're trying to get.

If you then want to remove / delete files, tack on the sed & rm commands 
I had in the earlier version, but it sounds like you just mean "omit 
from the list" rather than "remove from the hard drive".


-- 
Chris Devers
0
cdevers
7/30/2004 7:21:07 PM
Chris Devers wrote:
> On Fri, 30 Jul 2004, Andrew Gaffney wrote:
> 
>> I think you misunderstand. I don't want to delete the files that 
>> contain '<%perl>' or '<%init>'. I just want to make a list of all 
>> .html files in a directory tree and remove the ones that contains 
>> '<%perl>' or '<%init>' from my list.
> 
> 
> Then yes, I misunderstood. This version should do what you want:
> 
>     $ find /path/to/htdocs -type f | xargs egrep -liv '<%(perl|init)>'
> 
> It's exactly like the first one I sent before, but I've added "-v" to 
> the egrep arguments, which inverts the meaning from "all files with this 
> pattern" to "all files NOT with this pattern". In this case, that's what 
> you're trying to get.
> 
> If you then want to remove / delete files, tack on the sed & rm commands 
> I had in the earlier version, but it sounds like you just mean "omit 
> from the list" rather than "remove from the hard drive".

That still doesn't appear to do what I want. I believe it is showing me all 
files where *all* lines don't contain '<%perl>' or '<%init>'. Since not *all* 
lines contain either one of those, all files still show in the list.

-- 
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548

0
agaffney
7/30/2004 10:10:53 PM
On Fri, 30 Jul 2004, Andrew Gaffney wrote:

> Chris Devers wrote:
>> On Fri, 30 Jul 2004, Andrew Gaffney wrote:
>> 
>> Then yes, I misunderstood. This version should do what you want:
>> 
>>     $ find /path/to/htdocs -type f | xargs egrep -liv '<%(perl|init)>'
>
> That still doesn't appear to do what I want. I believe it is showing 
> me all files where *all* lines don't contain '<%perl>' or '<%init>'. 
> Since not *all* lines contain either one of those, all files still 
> show in the list.

Okay, let's try again then:

   $ grep -li '<title>' *html  # print all html files with '<title>'
   20things.html
   bookmarks.html
   gas.html
   gas_form.html
   itunes.html
   noise.html
   $

   $ grep -Li '<title>' *html  # print all html files WITHOUT '<title>'
   HEADER.shtml
   $

The sets are non-intersecting, and so apparently what you meant.

If you want to refine this further, try `egrep --help` or `man egrep`.
I should have tested what I sent before sending it, but ten seconds of 
skimming over the documentation on your own should have been enough to 
show you these lines from `egrep --help`:

   $ egrep --help | grep -i 'files.*match.*print'
     -L, --files-without-match only print FILE names containing no match
     -l, --files-with-matches  only print FILE names containing matches
   $

So, as with many Unix commands, shift-L inverts the usual sense of L, 
meaning that '-L' gets you the opposite of what '-l' does.

Now have we got it? :-)




-- 
Chris Devers
0
cdevers
7/30/2004 11:49:04 PM
In DOS:
> perl -n0 -e "push @b, $ARGV unless /<%(?:perl|init)>/; END{print \"@b\"}"
file1.html file2.html file3.html

In *nix (untested):
> perl -n0 -e 'push @b, $ARGV unless /<%(?:perl|init)>/; END{print "@b"}'
*.html

"Andrew Gaffney" <agaffney@skylineaero.com> wrote in message That still
doesn't appear to do what I want. I believe it is showing me all
> files where *all* lines don't contain '<%perl>' or '<%init>'. Since not
*all*
> lines contain either one of those, all files still show in the list.


0
zeus
7/31/2004 12:08:07 AM
Chris Devers wrote:
> On Fri, 30 Jul 2004, Andrew Gaffney wrote:
> 
>> Chris Devers wrote:
>>
>>> On Fri, 30 Jul 2004, Andrew Gaffney wrote:
>>>
>>> Then yes, I misunderstood. This version should do what you want:
>>>
>>>     $ find /path/to/htdocs -type f | xargs egrep -liv '<%(perl|init)>'
>>
>>
>> That still doesn't appear to do what I want. I believe it is showing 
>> me all files where *all* lines don't contain '<%perl>' or '<%init>'. 
>> Since not *all* lines contain either one of those, all files still 
>> show in the list.
> 
> 
> Okay, let's try again then:
> 
>   $ grep -li '<title>' *html  # print all html files with '<title>'
>   20things.html
>   bookmarks.html
>   gas.html
>   gas_form.html
>   itunes.html
>   noise.html
>   $
> 
>   $ grep -Li '<title>' *html  # print all html files WITHOUT '<title>'
>   HEADER.shtml
>   $
> 
> The sets are non-intersecting, and so apparently what you meant.
> 
> If you want to refine this further, try `egrep --help` or `man egrep`.
> I should have tested what I sent before sending it, but ten seconds of 
> skimming over the documentation on your own should have been enough to 
> show you these lines from `egrep --help`:
> 
>   $ egrep --help | grep -i 'files.*match.*print'
>     -L, --files-without-match only print FILE names containing no match
>     -l, --files-with-matches  only print FILE names containing matches
>   $
> 
> So, as with many Unix commands, shift-L inverts the usual sense of L, 
> meaning that '-L' gets you the opposite of what '-l' does.
> 
> Now have we got it? :-)

I think it is a problem with the regex. If I change it to:

grep -RLi '<%init>' * | grep '.html'

I get all files that don't have '<%init>', but it doesn't work with the 
'<%(init|perl)>'. That regex doesn't seem to match anything.

-- 
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548

0
agaffney
7/31/2004 1:07:54 AM
On Fri, 30 Jul 2004, Andrew Gaffney wrote:

> I think it is a problem with the regex. If I change it to:
>
> grep -RLi '<%init>' * | grep '.html'
>
> I get all files that don't have '<%init>', but it doesn't work with 
> the '<%(init|perl)>'. That regex doesn't seem to match anything.

More man page material: I was using `egrep` for the earlier examples, 
not `grep`. On my computer (a Mac), `egrep` is equivalent to `grep -e`; 
either way, this pulls in an enhanced regex parser that, in this case, 
is being used to match multiple patterns (by|doing|this).

Hence, these two lines are equivalent:

   egrep    'pattern|anotherpattern'  *
   grep  -e 'pattern|anotherpattern'  *

Also, the line you ended up with --

   grep -RLi '<%init>' * | grep '.html'

-- should be equivalent to this one --

   grep -RLi '<%init>' *html

-- without needing the second grep statement.

And to weave the multiple pattern matching back in, you can do these:

   egrep -RLi  '<%(init|perl)>' *html
   grep  -RLie '<%(init|perl)>' *html

Both of these should match files that have neither of the two patterns 
you were asking about: /<%init>/ nor /<%perl>/ .

Make sense?



-- 
Chris Devers
0
cdevers
7/31/2004 4:19:00 AM
Chris Devers wrote:
> On Fri, 30 Jul 2004, Andrew Gaffney wrote:
> 
>> I think it is a problem with the regex. If I change it to:
>>
>> grep -RLi '<%init>' * | grep '.html'
>>
>> I get all files that don't have '<%init>', but it doesn't work with 
>> the '<%(init|perl)>'. That regex doesn't seem to match anything.
> 
> 
> More man page material: I was using `egrep` for the earlier examples, 
> not `grep`. On my computer (a Mac), `egrep` is equivalent to `grep -e`; 
> either way, this pulls in an enhanced regex parser that, in this case, 
> is being used to match multiple patterns (by|doing|this).
> 
> Hence, these two lines are equivalent:
> 
>   egrep    'pattern|anotherpattern'  *
>   grep  -e 'pattern|anotherpattern'  *
> 
> Also, the line you ended up with --
> 
>   grep -RLi '<%init>' * | grep '.html'
> 
> -- should be equivalent to this one --
> 
>   grep -RLi '<%init>' *html
> 
> -- without needing the second grep statement.

It isn't though. I had the '-R' flag in which means I want it to search 
subdirectories also. The '*html' gets interpreted by the shell and it ends up 
not recursing.

> And to weave the multiple pattern matching back in, you can do these:
> 
>   egrep -RLi  '<%(init|perl)>' *html
>   grep  -RLie '<%(init|perl)>' *html

I ended up with "egrep -RLi  '<%(init|perl)>' * | egrep '.html$'" which seems to 
get me exactly what I wanted.

> Both of these should match files that have neither of the two patterns 
> you were asking about: /<%init>/ nor /<%perl>/ .
> 
> Make sense?

Yes. Thanks for the help.

-- 
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548

0
agaffney
7/31/2004 6:01:37 AM
Reply:

Similar Artilces:

RE: getting list of all .html files in a directory and its directories
Andrew Gaffney wrote: > I need to get a list of all the files that end with '.html' in a > directory and all of its subdirectories. I then want to search > through each file and remove the ones from the list that contain > '<%perl>' or '<%init>'. How can I do this? Thanks for any help. Use File::Find. Take a look at the docs. Std Core Perl. Wags ;) >=20 > -- > Andrew Gaffney > Network Administrator > Skyline Aeronautics, LLC. > 636-357-1548 ******************************************************* This mess...

Html Input File to get Directories/Folders instead of Files
I've got a page that needs to list all the (*.txt) files in a folder/directory chosen by the user... Is there a control to use to browse like the HtmlInputFile control that returns just a directory/folder, not having to select a specific file? or does the HtmlInputFile control have settings/attributes to do this?she rocks...she rules......

List the directories and files of directory in a custom way
 Greetings,I view the link which list directories and files of a certain directory in a nice way:http://msdn.microsoft.com/en-us/magazine/cc164770(printer).aspxand also it also has the navigate path at the top of the table.i like this way of browsing the folder online and i need this feature since i make directory browsing for the folder in IIS 6.0 but it does not list the directories and files of the folder in a nice way and there is no images for the directories and for the files,so how can i do that??your help is highly apprecitedbest regards.    try this code:void P...

Files Both In Directory and Not In Directory
I have run into a very strange situation/problem. If files are copied using the MS copy command in this manner: copy g:\testfile.txt "i:\archive\" the command reports the file is copied. Using any file browser in Windows or at the server the file does not show up in the directory listing. However, if you try to create a file by the same name in the destination directory it tells you it cannot because there is already a file with that name. I don't know what the quotes are for in the copy command. This was from a batch file that someone else set up. It does work as nor...

getting listing of files in a directory
I need to display a table on a webpage showing all of the files in a specific directory. Ive been told to use system.io to accomplish this. Unfortunately I havent been able to find any real good examples of this using vb. Can someone help me out here? thanks This will do the basics at least, check out the FileInfo Object for every property you can access on the files. Dim dirInfo As New DirectoryInfo("c:\somedir\) Dim infoFiles() As FileInfo = dirInfo.GetFiles Dim infoFile As FileInfo For Each infoFile In infoFiles ...

Is it possible to make a file/directory listing on virtual directory ?
I would like to create a file/directory list for my virtual directory, is it possible ? I am a newbie for ASP.NET, I failed to list it in PHP...Please advice, thank you Yes... Use system.io GetDirectories together with IEWC TreeView control... Happy directory listing.. Cheers Bracoute Thanks for your answer, would you please give me a simple example ? Hi, you probably can find something usefull at this address: http://aspnet.4guysfromrolla.com/articles/052803-1.aspx Grz, Kris.Read my blog. Handy Firefox plugins for web developers.Workaround for non working...

Getting list of the files in a virtual directory
I have set up a virtual directory on ISS 6.0 to store the all .jpg and .swf files which i use in my projectIt is alias name is "OutSrc"Is it possible to acquire a list of files which exist in that virtual directory ?For my default project directory, i can get list of the specific files like that;*******************DirectoryInfo drc = new DirectoryInfo("~/banners");FileInfo[] FileLst = drc.GetFiles("*.swf");*******************But, how about for the virtaul directory,  Is there a way to do this?Thanks, Hi,adam.Use this string[] files = System.IO.Directory....

getting the list of files under root directory
how can i get the list of all files under the root directory? i want these files to act as datasource for datalist.-keeara g------------------ Try something like this: Sub Page_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load MyDataList.DataSource = CreateDataSource() MyDataList.DataBind() End Sub Function CreateDataSource() As ICollection Dim File As String Dim Files() As String Files = System.IO.Directory.GetFiles(Server.MapPath("MyDirectory/")) Dim dt As DataTable Dim dr As DataRow dt = New DataTa...

Get File Name List in Directory
Hi all I need to know what are the files exists in C:\ temp directory where the file type is txt. Is there anyway to do this using PB6.5. Thanks The ListBox object has a DirList method. That's probably the simplest approach. Otherwise you're looking at making API calls. On Fri, 3 May 2002 09:13:24 +0800, in powersoft.public.powerbuilder.powerscript tomms <tomms@po.jaring.my> wrote: >Hi all > >I need to know what are the files exists in C:\ temp directory where the >file type is txt. Is there anyway to do this using PB6.5. > > &...

How can I get a file list of a directory?
Subj. I need to list files in a directory. Cannot find this functions in PowerScript :( Thanks. DirList(). -- pbm_hopethishelps, Roy Kiesler [TeamSybase] Sybase Developers Network (SDN) - http:///www.sybase.com/sdn "Patrick Teas" <patrick@comsenseinc.com> wrote in message news:5y#t3Cig$GA.201@forums.sybase.com... > Subj. > I need to list files in a directory. Cannot find this functions in > PowerScript :( > > Thanks. > > Look at the dirlist() function. -- Terry Dykstra (TeamSybase) Canadian Forest Oil Ltd. Check out S...

How can I get a directory listing of the server's current directory
Dear all, Using the Net::FTP, how can I get a directory listing of the server's current directory ? ftp->ls() methode does not return a directory listing, just ARRAY.. Thanks a lot. -shahn Please ask your question on a general perl list. This list only answers DBI related questions. Thank you Reinke ----- Original Message ----- From: "shahn" <shahn@asekr.com> To: <dbi-users@perl.org> Sent: Thursday, May 31, 2001 12:48 PM Subject: How can I get a directory listing of the server's current directory > Dear all, > ...

getting a list of file names from a windows directory
Hi All, How can I get the list of file names stored in a directory with Windows API? I need to get the file names and store them in a structure. I know about dirlist, but it has two flaws, 1) is that I need a listbox to get to the file names and, 2) it changes the current directory if you provide a full path to get the files. I think I saw an example somewhere but I just can't remember where. If anyone knows where to find a sample of what I need or can provide a sample with the API, I will appreciate it. Thanks, Ivan Use the windows API function FindFirstFile and then ...

List all directories and sub-directories.
I have some simple code i use to display all directories (or files) within a specific directory and pit it into a datagrid, and that works fine. However, i'm wondering if anyone knows of an easy way to display all the sub-directories as well, and just continue drilling on down until i have all the directories located within this one particular directory. It may one have one, then again, it may have 5 anf each of those may have 5, and so on. I hope this makes sense. Any suggestions would be appreciated! Try using a hierarchial control, to match a hierarchial layout.  Or piece together ...

Best way to get a list of files in a certain directory
What is the best way to get a file listing of a specific folder without using a visual object in powerbuilder? Example. I want to call a function to get the list of all .zip files and have them in an array or a delimited string. Any thoughts? You can use the Win32 API calls or the list box function dirlist. <John Owens> wrote in message news:40b1fcbc.6c40.1681692777@sybase.com... > What is the best way to get a file listing of a specific > folder without using a visual object in powerbuilder? > Example. I want to call a function to get the list of all > zip f...

Web resources about - getting list of all .html files in a directory and its directories - perl.beginners

List of web directories - Wikipedia, the free encyclopedia
Open Directory Project (a.k.a. ODP or dmoz) – The largest directory of the Web. Its open content is mirrored at many sites, including the Google ...

To Complement Its Search Engine, Facebook Launches People, Pages, and Places Directories
Facebook is now closer than ever to becoming the phone book for the internet. The site recently launched a Directory of people , Pages , and ...

Directories - ARN
The source for IT industry news, views and analysis across the channel, business and technology

Princess Diana leaked royal directories to Murdoch tabloid, court hears
The former royal editor of Rupert Murdoch's now defunct British tabloid the News of the World told a court on Thursday that the late Princess ...

Telstra slashes 800 jobs from directories arm Sensis
TELSTRA is slashing 800 jobs from its struggling directories arm Sensis just one month after the telco giant sold a 70 per cent stake in the ...

Government To Stop Publishing Contact Information In Blue Pages Of Phone Directories
... has decided to stop publishing contact information for all of its departments and agencies in the blue pages section of telephone directories.A ...

File System Programming Guide: About Files and Directories
Explains how to create and manage files and directories.

Fixing directories
Web directories are one of the key points of differentiation between plans on the Exchanges. A single company can offer half a dozen networks ...

PBJ Local Directories
... newsletters, events, print subscription, digital subscription or other special offers. Welcome to Philadelphia Business Journal's local directories. ...

Bradenton and Lakewood Ranch Guides - Directories and guides
The Bradenton Herald is your source for Bradenton and Lakewood Ranch directories and guides.

Resources last updated: 12/20/2015 3:05:16 PM