Removing a Pattern using a Regular Expression

I wrote a small perl program to more quickly read all the
subjects in an email list.  One of the things the script does is
to remove the mailing list name which repeats for every message
and consists of a [, some English text and finally a ].

	I was able to write a RE that identifies that text and
cause the script to save that string in a variable called
$remove.  That part works and looks like:

    foreach my $field (@fields) {    #Assemble the new subject.
if($field =~ m/\[(.*?)\]/) 
{ #$field is the blocked field.
$remove = $field;
} #field is the blocked field.
else
{ #$field is not the blocked string.
        $newest = $newest . $field;
} #$field is not the blocked string.
    }    #Assemble the new subject.

    if ( $newest eq $previous ) {    #Skip this iteration.
        $newest = "";
        next;
    }    #Skip this iteration.
else
{ #they are different.

	This is where things don't quite work yet.  At this
point, I have $remove which contains that bracketted list name
such as

[BLIND-HAMS] or any number of other names enclosed in brackets.
So, the next thing I do is to attempt to remove just that part of
the subject line, keeping everything else that was there.

   $subject =~ s/'$remove'//;
    print( $subject, "\n" );

	The example, here is the closest thing to anything
happening.  In the case of [BLIND-HAMS] the B is gone but the
brackets and everything else remains

	I looked around for examples of similar code and found

$subject =~ s/$remove\K.*?(?=\d+)//;

It looks like it should keep everything else in the $subject
string except [BLIND-HAMS] but it keeps everything including that
so there is no change.

	I actually think I am close but the line with the
brackets may be confusing the shell although single and double
quotes don't make any difference.

	I also may have damaged that last example when I modified
it to work with a string called $subject which is the whole
subject line and $remove which is the part I am trying to remove.

	The rest of the script appears to work and is designed to
only list the first message in a list of N messages of the same
subject. so, if there are 120 messages with the subject of "how
did you spend your Summer?", I read the first of those subject
lines and none until the first message that doesn't have that
title.

Any constructive ideas are appreciated.  Thank you.

Martin McCormick
0
martin
6/14/2018 2:21:12 AM
perl.beginners 29306 articles. 3 followers. Follow

4 Replies
41 Views

Similar Articles

[PageSpeed] 54

On Wed, 13 Jun 2018 21:21:12 -0500
"Martin McCormick" <martin.m@suddenlink.net> wrote:

> I wrote a small perl program to more quickly read all the
> subjects in an email list.  One of the things the script does is
> to remove the mailing list name which repeats for every message
> and consists of a [, some English text and finally a ].
> 
> 	I was able to write a RE that identifies that text and
> cause the script to save that string in a variable called
> $remove.  That part works and looks like:
> 
>     foreach my $field (@fields) {    #Assemble the new subject.
> if($field =~ m/\[(.*?)\]/) 
> { #$field is the blocked field.
> $remove = $field;
> } #field is the blocked field.
> else
> { #$field is not the blocked string.
>         $newest = $newest . $field;
> } #$field is not the blocked string.
>     }    #Assemble the new subject.
> 
>     if ( $newest eq $previous ) {    #Skip this iteration.
>         $newest = "";
>         next;
>     }    #Skip this iteration.
> else
> { #they are different.
> 
> 	This is where things don't quite work yet.  At this
> point, I have $remove which contains that bracketted list name
> such as
> 
> [BLIND-HAMS] or any number of other names enclosed in brackets.
> So, the next thing I do is to attempt to remove just that part of
> the subject line, keeping everything else that was there.
> 
>    $subject =~ s/'$remove'//;
>     print( $subject, "\n" );
> 
> 	The example, here is the closest thing to anything
> happening.  In the case of [BLIND-HAMS] the B is gone but the
> brackets and everything else remains
> 
> 	I looked around for examples of similar code and found
> 
> $subject =~ s/$remove\K.*?(?=\d+)//;
> 
> It looks like it should keep everything else in the $subject
> string except [BLIND-HAMS] but it keeps everything including that
> so there is no change.
> 
> 	I actually think I am close but the line with the
> brackets may be confusing the shell although single and double
> quotes don't make any difference.
> 
> 	I also may have damaged that last example when I modified
> it to work with a string called $subject which is the whole
> subject line and $remove which is the part I am trying to remove.
> 
> 	The rest of the script appears to work and is designed to
> only list the first message in a list of N messages of the same
> subject. so, if there are 120 messages with the subject of "how
> did you spend your Summer?", I read the first of those subject
> lines and none until the first message that doesn't have that
> title.
> 
> Any constructive ideas are appreciated.  Thank you.
> 
> Martin McCormick
> 
> -- 
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
> 
> 

I think it's because you have 

$subject =~ s/[BLIND-HAMS]//;

and it deletes first appeared symbol from the diapason.

You can try smth like $remove =~ s/([\[\]])/\\$1/g;

-- 
������� ����������� <dimenty@impulse-kiev.in.ua>
0
dimenty
6/14/2018 8:34:34 AM
Hi,

On Wed, 13 Jun 2018 21:21:12 -0500
"Martin McCormick" <martin.m@suddenlink.net> wrote:

> I wrote a small perl program to more quickly read all the
> subjects in an email list.  One of the things the script does is
> to remove the mailing list name which repeats for every message
> and consists of a [, some English text and finally a ].
>=20
> 	I was able to write a RE that identifies that text and
> cause the script to save that string in a variable called
> $remove.  That part works and looks like:
>=20
>     foreach my $field (@fields) {    #Assemble the new subject.
> if($field =3D~ m/\[(.*?)\]/)=20
> { #$field is the blocked field.
> $remove =3D $field;
> } #field is the blocked field.
> else
> { #$field is not the blocked string.
>         $newest =3D $newest . $field;
> } #$field is not the blocked string.
>     }    #Assemble the new subject.
>=20
>     if ( $newest eq $previous ) {    #Skip this iteration.
>         $newest =3D "";
>         next;
>     }    #Skip this iteration.
> else
> { #they are different.
>=20

1. Your indentation is erratic.

2. See http://perl-begin.org/tutorials/bad-elements/ .


> 	This is where things don't quite work yet.  At this
> point, I have $remove which contains that bracketted list name
> such as
>=20
> [BLIND-HAMS] or any number of other names enclosed in brackets.
> So, the next thing I do is to attempt to remove just that part of
> the subject line, keeping everything else that was there.
>=20
>    $subject =3D~ s/'$remove'//;
>     print( $subject, "\n" );
>=20

1. why did you add single quotes?

2. Perhaps use
http://perl-begin.org/tutorials/bad-elements/#re_string_interpolate

> 	The example, here is the closest thing to anything
> happening.  In the case of [BLIND-HAMS] the B is gone but the
> brackets and everything else remains
>=20
> 	I looked around for examples of similar code and found
>=20
> $subject =3D~ s/$remove\K.*?(?=3D\d+)//;
>=20
> It looks like it should keep everything else in the $subject
> string except [BLIND-HAMS] but it keeps everything including that
> so there is no change.
>=20
> 	I actually think I am close but the line with the
> brackets may be confusing the shell although single and double
> quotes don't make any difference.
>=20
> 	I also may have damaged that last example when I modified
> it to work with a string called $subject which is the whole
> subject line and $remove which is the part I am trying to remove.
>=20
> 	The rest of the script appears to work and is designed to
> only list the first message in a list of N messages of the same
> subject. so, if there are 120 messages with the subject of "how
> did you spend your Summer?", I read the first of those subject
> lines and none until the first message that doesn't have that
> title.
>=20
> Any constructive ideas are appreciated.  Thank you.
>=20
> Martin McCormick
>=20



--=20
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Funny Anti-Terrorism Story - http://shlom.in/enemy

Every successful open source project will eventually spawn a sub=E2=80=90pr=
oject.
    =E2=80=94 http://www.shlomifish.org/humour/fortunes/osp_rules.html

Please reply to list if it's a mailing list post - http://shlom.in/reply .
0
shlomif
6/14/2018 8:40:36 AM
Hi Dmitri,

On Thu, 14 Jun 2018 11:34:34 +0300
=D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D0=B9 =D0=90=D0=BD=D0=B0=D0=BD=D1=8C=
=D0=B5=D0=B2=D1=81=D0=BA=D0=B8=D0=B9 <dimenty@impulse-kiev.in.ua> wrote:

> On Wed, 13 Jun 2018 21:21:12 -0500
> "Martin McCormick" <martin.m@suddenlink.net> wrote:
>=20
> > I wrote a small perl program to more quickly read all the
> > subjects in an email list.  One of the things the script does is
> > to remove the mailing list name which repeats for every message
> > and consists of a [, some English text and finally a ].
> >=20
> > 	I was able to write a RE that identifies that text and
> > cause the script to save that string in a variable called
> > $remove.  That part works and looks like:
> >=20
> >     foreach my $field (@fields) {    #Assemble the new subject.
> > if($field =3D~ m/\[(.*?)\]/)=20
> > { #$field is the blocked field.
> > $remove =3D $field;
> > } #field is the blocked field.
> > else
> > { #$field is not the blocked string.
> >         $newest =3D $newest . $field;
> > } #$field is not the blocked string.
> >     }    #Assemble the new subject.
> >=20
> >     if ( $newest eq $previous ) {    #Skip this iteration.
> >         $newest =3D "";
> >         next;
> >     }    #Skip this iteration.
> > else
> > { #they are different.
> >=20
> > 	This is where things don't quite work yet.  At this
> > point, I have $remove which contains that bracketted list name
> > such as
> >=20
> > [BLIND-HAMS] or any number of other names enclosed in brackets.
> > So, the next thing I do is to attempt to remove just that part of
> > the subject line, keeping everything else that was there.
> >=20
> >    $subject =3D~ s/'$remove'//;
> >     print( $subject, "\n" );
> >=20
> > 	The example, here is the closest thing to anything
> > happening.  In the case of [BLIND-HAMS] the B is gone but the
> > brackets and everything else remains
> >=20
> > 	I looked around for examples of similar code and found
> >=20
> > $subject =3D~ s/$remove\K.*?(?=3D\d+)//;
> >=20
> > It looks like it should keep everything else in the $subject
> > string except [BLIND-HAMS] but it keeps everything including that
> > so there is no change.
> >=20
> > 	I actually think I am close but the line with the
> > brackets may be confusing the shell although single and double
> > quotes don't make any difference.
> >=20
> > 	I also may have damaged that last example when I modified
> > it to work with a string called $subject which is the whole
> > subject line and $remove which is the part I am trying to remove.
> >=20
> > 	The rest of the script appears to work and is designed to
> > only list the first message in a list of N messages of the same
> > subject. so, if there are 120 messages with the subject of "how
> > did you spend your Summer?", I read the first of those subject
> > lines and none until the first message that doesn't have that
> > title.
> >=20
> > Any constructive ideas are appreciated.  Thank you.
> >=20
> > Martin McCormick
> >=20
> > --=20
> > To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> > For additional commands, e-mail: beginners-help@perl.org
> > http://learn.perl.org/
> >=20
> >  =20
>=20
> I think it's because you have=20
>=20
> $subject =3D~ s/[BLIND-HAMS]//;
>=20
> and it deletes first appeared symbol from the diapason.
>=20

https://en.wiktionary.org/wiki/diapason - perhaps you mean "character class=
".

> You can try smth like $remove =3D~ s/([\[\]])/\\$1/g;
>=20

Why not use http://perldoc.perl.org/functions/quotemeta.html ?
0
shlomif
6/14/2018 9:07:06 AM
On Wed, 2018-06-13 at 21:21 -0500, Martin McCormick wrote:
> I wrote a small perl program to more quickly read all the
> subjects in an email list.  One of the things the script does is
> to remove the mailing list name which repeats for every message
> and consists of a [, some English text and finally a ].
> 
> 	I was able to write a RE that identifies that text and
> cause the script to save that string in a variable called
> $remove.  That part works and looks like:
> 
>     foreach my $field (@fields) {    #Assemble the new subject.
> if($field =~ m/\[(.*?)\]/) 

if you want to remove this string then why not just remove it here:

if ( $field =~ s/\[(.*?)\]// )


> { #$field is the blocked field.
> $remove = $field;
> } #field is the blocked field.
> else
> { #$field is not the blocked string.
>         $newest = $newest . $field;
> } #$field is not the blocked string.
>     }    #Assemble the new subject.
> 
>     if ( $newest eq $previous ) {    #Skip this iteration.
>         $newest = "";
>         next;
>     }    #Skip this iteration.
> else
> { #they are different.
> 
> 	This is where things don't quite work yet.  At this
> point, I have $remove which contains that bracketted list name
> such as
> 
> [BLIND-HAMS] or any number of other names enclosed in brackets.
> So, the next thing I do is to attempt to remove just that part of
> the subject line, keeping everything else that was there.
> 
>    $subject =~ s/'$remove'//;

After string interpolation you have:

    $subject =~  s/'[BLIND-HAMS]'//;

Which is a string of three characters consisting of the "'" character
followed by a character class followed by the "'" character.

The character class says to match one character that is either 'A' or
'B' or 'D' or 'E' or 'F' or 'G' or 'H' or 'I' or 'L' or 'M' or 'N' or
'S'.

You probably need to use quotemeta:

    $subject =~  s/'\Q$remove\E'//;



John
0
jwkrahn
6/14/2018 6:05:19 PM
Reply: