regex with HEX ascii chars

I have a text file (created by  pdftotext) that I've imported into my script.

It contains ASCII characters 251 for crosses and 252 for ticks.  If I load the 
file in gvim and do :as

it reports the characters as 

<u> 251, Hex 00fb, Octal 373
<u> 252, hex 00fc, Octal 374

However, when I try to seacch for it using

if ($line=~/[\xfb|\xfc]/) {

or even just 

if ($line=~/\xfb/) { 

it always fails.  What am I doing wrong?

Gary
0
gary
4/12/2018 4:26:57 PM
perl.beginners 29312 articles. 3 followers. Follow

4 Replies
70 Views

Similar Articles

[PageSpeed] 18

--0000000000000364eb0569aa8cbb
Content-Type: text/plain; charset="UTF-8"

> However, when I try to seacch for it using

if ($line=~/[\xfb|\xfc]/) {

Note, you're mixing the character class " [ab] " with grouping alternative
pipe "  (  a | b ) " here

> or even just

if ($line=~/\xfb/) {

Dunno, works here:
$ perl -e '$line = "hi" . chr 251 . "ho" . chr 252 ; if
($line=~/[\xfb\xfc]/) { print "yep" } print "\n"'
yep
$ perl -e '$line = "hi" . chr 250 . "ho" . chr 253 ; if
($line=~/[\xfb\xfc]/) { print "yep" } print "\n"'
[crickets]


So, I'd guess your $line doesn't have a \xfb or \xfc in it at the time of
the test.
$ perl -e '$line = "hi" . chr 251 . "ho" . chr 253 ; if
($line=~/([\xfb\xfc])/) { print "yep: $1" } print "\n"' | od -c
0000000   y   e   p   :     373  \n
0000007


On Thu, Apr 12, 2018 at 11:26 AM, Gary Stainburn <
gary.stainburn@ringways.co.uk> wrote:

> I have a text file (created by  pdftotext) that I've imported into my
> script.
>
> It contains ASCII characters 251 for crosses and 252 for ticks.  If I load
> the
> file in gvim and do :as
>
> it reports the characters as
>
> <u> 251, Hex 00fb, Octal 373
> <u> 252, hex 00fc, Octal 374
>
> However, when I try to seacch for it using
>
> if ($line=~/[\xfb|\xfc]/) {
>
> or even just
>
> if ($line=~/\xfb/) {
>
> it always fails.  What am I doing wrong?
>
> Gary
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
>


-- 

a

Andy Bach,
afbach@gmail.com
608 658-1890 cell
608 261-5738 wk

--0000000000000364eb0569aa8cbb
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>&gt; However, when I try to seacch for it using<br>
<br>
if ($line=3D~/[\xfb|\xfc]/) {<br>
<br></div>Note, you&#39;re mixing the character class &quot; [ab] &quot; wi=
th grouping alternative pipe &quot;=C2=A0 (=C2=A0 a | b ) &quot; here<br><d=
iv>
<br>&gt; or even just<br>
<br>
if ($line=3D~/\xfb/) {<br><br></div><div>Dunno, works here:<br>$ perl -e &#=
39;$line =3D &quot;hi&quot; . chr 251 . &quot;ho&quot; . chr 252 ; if ($lin=
e=3D~/[\xfb\xfc]/) { print &quot;yep&quot; } print &quot;\n&quot;&#39;<br>y=
ep<br>$ perl -e &#39;$line =3D &quot;hi&quot; . chr 250 . &quot;ho&quot; . =
chr 253 ; if ($line=3D~/[\xfb\xfc]/) { print &quot;yep&quot; } print &quot;=
\n&quot;&#39;<br></div><div>[crickets]<br></div><div><br><br></div><div>So,=
 I&#39;d guess your $line doesn&#39;t have a \xfb or \xfc in it at the time=
 of the test.=C2=A0 <br>$ perl -e &#39;$line =3D &quot;hi&quot; . chr 251 .=
 &quot;ho&quot; . chr 253 ; if ($line=3D~/([\xfb\xfc])/) { print &quot;yep:=
 $1&quot; } print &quot;\n&quot;&#39; | od -c<br>0000000=C2=A0=C2=A0 y=C2=
=A0=C2=A0 e=C2=A0=C2=A0 p=C2=A0=C2=A0 :=C2=A0=C2=A0=C2=A0=C2=A0 373=C2=A0 \=
n<br>0000007<br><br></div></div><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote">On Thu, Apr 12, 2018 at 11:26 AM, Gary Stainburn <span dir=
=3D"ltr">&lt;<a href=3D"mailto:gary.stainburn@ringways.co.uk" target=3D"_bl=
ank">gary.stainburn@ringways.co.uk</a>&gt;</span> wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex">I have a text file (created by=C2=A0 pdftotext) that I&#39;=
ve imported into my script.<br>
<br>
It contains ASCII characters 251 for crosses and 252 for ticks.=C2=A0 If I =
load the<br>
file in gvim and do :as<br>
<br>
it reports the characters as<br>
<br>
&lt;u&gt; 251, Hex 00fb, Octal 373<br>
&lt;u&gt; 252, hex 00fc, Octal 374<br>
<br>
However, when I try to seacch for it using<br>
<br>
if ($line=3D~/[\xfb|\xfc]/) {<br>
<br>
or even just<br>
<br>
if ($line=3D~/\xfb/) {<br>
<br>
it always fails.=C2=A0 What am I doing wrong?<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
Gary<br>
<br>
--<br>
To unsubscribe, e-mail: <a href=3D"mailto:beginners-unsubscribe@perl.org">b=
eginners-unsubscribe@perl.org</a><br>
For additional commands, e-mail: <a href=3D"mailto:beginners-help@perl.org"=
>beginners-help@perl.org</a><br>
<a href=3D"http://learn.perl.org/" rel=3D"noreferrer" target=3D"_blank">htt=
p://learn.perl.org/</a><br>
<br>
<br>
</font></span></blockquote></div><br><br clear=3D"all"><br>-- <br><div clas=
s=3D"gmail_signature" data-smartmail=3D"gmail_signature"><br>a<br><br>Andy =
Bach,<br><a href=3D"mailto:afbach@gmail.com" target=3D"_blank">afbach@gmail=
..com</a><br>608 658-1890 cell<br>608 261-5738 wk</div>
</div>

--0000000000000364eb0569aa8cbb--
0
afbach
4/12/2018 6:01:37 PM
On Thu, 12 Apr 2018 17:26:57 +0100
Gary Stainburn <gary.stainburn@ringways.co.uk> wrote:

> I have a text file (created by  pdftotext) that I've imported into my scr=
ipt.
>=20
> It contains ASCII characters 251 for crosses and 252 for ticks.  If I load
> the file in gvim and do :as
>=20
> it reports the characters as=20
>=20
> <u> 251, Hex 00fb, Octal 373
> <u> 252, hex 00fc, Octal 374
>=20
> However, when I try to seacch for it using
>=20
> if ($line=3D~/[\xfb|\xfc]/) {
>=20
> or even just=20
>=20
> if ($line=3D~/\xfb/) {=20
>=20
> it always fails.  What am I doing wrong?
>=20

Perhaps see http://perldoc.perl.org/perlunitut.html - you may need to read =
the
file as binary or iso8859-1 or whatever. Also see
https://github.com/shlomif/how-to-share-code-online and read what Andy note=
d.

> Gary
>=20



--=20
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
https://github.com/shlomif/what-you-should-know-about-automated-testing

It=E2=80=99s easier to port a shell than a shell script.
    =E2=80=94 http://en.wikiquote.org/wiki/Larry_Wall

Please reply to list if it's a mailing list post - http://shlom.in/reply .
0
shlomif
4/12/2018 6:53:16 PM
On Thursday 12 April 2018 19:53:16 Shlomi Fish wrote:
> Perhaps see http://perldoc.perl.org/perlunitut.html - you may need to read
> the file as binary or iso8859-1 or whatever. Also see

Thanks for this Shlomi. I have looked into that before briefly when doing h=
ttp=20
gets and reading office documents, but this time I didn't think I was going=
=20
to need this.

> https://github.com/shlomif/how-to-share-code-online and read what Andy
> noted.

I thought the problem with my concepts rather than the program itself.  The=
=20
following code shows that I was wrong.

#!/usr/bin/perl=20

use strict;
use warnings;

my $line=3D"A =C3=BB =C3=BC  =C3=BB";
my @arr=3D($line=3D~/(\xc3.)/g);
my $tick=3D"\xc3\xbc";
my $cross=3D"\xc3\xbb";

foreach my $c (split //,$line) {
  printf "%s =3D %X %d\n",$c,ord($c),ord($c);
}
if ($line=3D~/\xc3\xbb/) { print "true\n";}
foreach my $a (@arr) {
  print "start\n";
  if ($a eq $tick)  { print "tick\n";}
  if ($a eq $cross) { print "cross\n";}
}

[root@lou inet]# ./t1
A =3D 41 65
  =3D 20 32
=EF=BF=BD =3D C3 195
=EF=BF=BD =3D BB 187
  =3D 20 32
=EF=BF=BD =3D C3 195
=EF=BF=BD =3D BC 188
  =3D 20 32
  =3D 20 32
=EF=BF=BD =3D C3 195
=EF=BF=BD =3D BB 187
true
start
cross
start
tick
start
cross
[root@lou inet]#=20

When I went back to gvim I noticed that it started showing two column value=
s=20
as as go past these fields, which should have given me a clue.

My production code now includes the following working code:

my $tick=3D"\xc3\xbc";
my $cross=3D"\xc3\xbb";

my @ticks=3D($line=3D~/(\xc3.)/g);
if (scalar(@ticks) =3D=3D 5) {
  if ($ticks[0] eq $tick) {$job{sj_mot}=3D'true';}
  if ($ticks[1] eq $tick) {$wuw=3D'true'; $job{sj_wait}=3D20;}
  if ($ticks[2] eq $tick) {$job{sj_c_car}=3D'true';}
  # 3 =3D advisor which we don't use
  if ($ticks[4] eq $tick) {$job{sj_wait}=3D30;}
} else {
  debugprint(1,"incorrect tick/cross count returned");
}
0
gary
4/13/2018 8:19:55 AM
On Thu, 2018-04-12 at 17:26 +0100, Gary Stainburn wrote:
> I have a text file (created by  pdftotext) that I've imported into my
> script.
> 
> It contains ASCII characters 251 for crosses and 252 for ticks.

ASCII defines 128 characters so those characters are not ASCII.


John
0
jwkrahn
4/13/2018 6:07:31 PM
Reply: