help with a stat script

Hello,

My web is powered by Apache and PHP,its access log seems as blow,

xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET 
/2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.net/" 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"

A perl script for stat purpose of this log:

tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - - 
\[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2'

I was totally confused about it.
  what does m{...} and its content stand for?
Can you help give a explain?

thanks in advance.
0
lauren
7/12/2018 11:35:14 AM
perl.beginners 29324 articles. 3 followers. Follow

13 Replies
108 Views

Similar Articles

[PageSpeed] 13

--CUfgB8w4ZwR/yMy5
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

2018-07-12 19:35:14 +0800 Lauren C.:
> Hello,
>=20
> My web is powered by Apache and PHP,its access log seems as blow,
>=20
> xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET
> /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.net/"
> "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTM=
L,
> like Gecko) Chrome/67.0.3396.99 Safari/537.36"
>=20
> A perl script for stat purpose of this log:
>=20
> tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - -
> \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2'
>=20
> I was totally confused about it.
>  what does m{...} and its content stand for?
> Can you help give a explain?

Hi, Lauren

The m{...} is a regular expression (regexp). If you not familiar with
regexps in Perl, I advise you to read these pages:

- http://perldoc.perl.org/perlintro.html#Regular-expressions
- http://perldoc.perl.org/perlrequick.html

> thanks in advance.
>=20
> --=20
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>=20
>=20

--CUfgB8w4ZwR/yMy5
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJbR0XsAAoJEBEnhHIHIOC6JUIP/3cquSdWf6UEjy+7vY5jAqM3
E6cNgwv56gtGXygU8SUZooSTNo+i8x44VUkHvGF0wjWm243D/weD6G9l9tDdYAkv
Ji28fdHOHd98tXTwYYuHAAyb7HaUbci1GLzbM9gXanigPOH5Le1PIBwCGndJeJE4
q5kYaWJ7iw+fwMDbNET8d9ob/ySug5bce619PFdJR1zYX3wbNFUpLPV2sBrSVUiF
pXy24dM/p6Oue82CoQOjF1xALIA6Gpv9ZZ4bn4Qe/36/EUkAnVzmNHGwwEcjwGiA
ZXae303pt/ZXhtYzl4r0aGE2tavpF/s2uy6gbDGN1MG31t7+LoT8tntWFvcFAI2d
ZWf+TaEDFnZHqQ82XmSWWfJvlrxEM44ob99ceXQKHnVKLVr+9jPY5IAduM+JLpn3
2Tc8GtaeU5UpBTxdqS/7sNMsax+NlWZxpy4Cb8hwQpmWM+mjrGzP93OM3aqFs00Y
ZwqhD9cJlq3uMvu2N/wHH/B/+j+Y8T1wEjP9ezoCi0MN1GclCY1Kx0KJefjGKVFT
RzjBeApK9/MUXV07TDH+SvmwyEnmtfuKpj5dVldfbjLxUj0YyB5t5MSu8nbzMmNw
klByLvHQWBFqS6r9dc2HpLfHfw6xkPPhAbk/goRTnaZHzNQMZi6SIa475oKgBeyl
rMxBZW0fm2YfdfQnvb15
=sMRU
-----END PGP SIGNATURE-----

--CUfgB8w4ZwR/yMy5--
0
gilmagno
7/12/2018 12:13:32 PM
Hi!

"m{ pattern }" is regular expression to parse log string.

It's equal to just "/ pattern /". Using different delimiter is 
convenient here because usually symbol "/" must be escaped with 
backslash "\", but if we use another delimiter - we can left "/" symbol 
unescaped and reges is more readable.

You can further explore regex with this site https://regex101.com/r/4CGCcB/2


On 7/12/18 2:35 PM, Lauren C. wrote:
> Hello,
>
> My web is powered by Apache and PHP,its access log seems as blow,
>
> xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET 
> /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 
> "https://miscnote.net/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 
> 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 
> Safari/537.36"
>
> A perl script for stat purpose of this log:
>
> tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - 
> - \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2'
>
> I was totally confused about it.
>  what does m{...} and its content stand for?
> Can you help give a explain?
>
> thanks in advance.
>
0
elcamlost
7/12/2018 12:37:44 PM
thanks Magno. i will check it.

On 2018/7/12 星期四 PM 8:13, Gil Magno wrote:
> Hi, Lauren
> 
> The m{...} is a regular expression (regexp). If you not familiar with
> regexps in Perl, I advise you to read these pages:
> 
> -http://perldoc.perl.org/perlintro.html#Regular-expressions
> -http://perldoc.perl.org/perlrequick.html
0
lauren
7/12/2018 12:48:48 PM
thanks for the kind helps.
do you know what the expression in { } stands for?

^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+



On 2018/7/12 星期四 PM 8:37, Илья Рассадин wrote:
> "m{ pattern }" is regular expression to parse log string.
> 
> It's equal to just "/ pattern /". Using different delimiter is 
> convenient here because usually symbol "/" must be escaped with 
> backslash "\", but if we use another delimiter - we can left "/" symbol 
> unescaped and reges is more readable.
> 
> You can further explore regex with this site https://regex101.com/r/4CGCcB/2
0
lauren
7/12/2018 12:50:22 PM
> On Jul 12, 2018, at 5:50 AM, Lauren C. <lauren@miscnote.net> wrote:
>=20
> thanks for the kind helps.
> do you know what the expression in { } stands for?
>=20
> ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+

Here is a breakdown:

^		Start looking for matches at beginning of string
(\S+)	Match a consecutive sequence of non-whitespace characters and =
save in the $1 variable
=E2=80=94 		Match the literal string =E2=80=98 =E2=80=94 =E2=80=
=98
\[		Match the character =E2=80=98[=E2=80=98
(\S+)	Match a consecutive sequence of non-whitespace characters and =
save in the $2 variable
..*		Match any consecutive zero or more characters
\]		Match the character =E2=80=98]=E2=80=99
(space)	Match a space character
\=E2=80=9D		Match the character =E2=80=98=E2=80=9C=E2=80=98
GET		Match the literal string =E2=80=98GET =E2=80=98 (with a =
space at the end)
(.*?/)	Match the shortest string of any consecutive characters up to =
but not including a following whitespace and save in $3
\s+		Match any consecutive sequence of whitespace characters

If all of the above entities are matched, then the regular expression =
evaluation returns true and the 41, $2, and $3 variables are assigned to =
their captured matches.
0
jimsgibson
7/12/2018 2:00:05 PM
--Nq2Wo0NMKNjxTN9z
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

2018-07-12 20:50:22 +0800 Lauren C.:
> thanks for the kind helps.
> do you know what the expression in { } stands for?
>=20
> ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+

Hi, Lauren

This is quickly explained in http://perldoc.perl.org/perlrequick.html#Using=
-character-classes

\s (lowercase) stands for a "whitespace". \S (uppercase) stands for the opp=
osite of \s. So

$name =3D "lauren";
if ($name =3D~ m{\s}) { print 'it matched' }

This will not match, because there's no "whitespace" in the string. But this

$name =3D "lauren";
if ($name =3D~ m{\S}) { print 'it matched' }

will match, because in the string there is a character which is *not* "whit=
espace".

For the ^ [] and .*? in the regex, those pages I the previous email help yo=
u.

Best

gil

> On 2018/7/12 =E6=98=9F=E6=9C=9F=E5=9B=9B PM 8:37, =D0=98=D0=BB=D1=8C=D1=
=8F =D0=A0=D0=B0=D1=81=D1=81=D0=B0=D0=B4=D0=B8=D0=BD wrote:
> >"m{ pattern }" is regular expression to parse log string.
> >
> >It's equal to just "/ pattern /". Using different delimiter is convenient
> >here because usually symbol "/" must be escaped with backslash "\", but =
if
> >we use another delimiter - we can left "/" symbol unescaped and reges is
> >more readable.
> >
> >You can further explore regex with this site https://regex101.com/r/4CGC=
cB/2
>=20
> --=20
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>=20
>=20

--Nq2Wo0NMKNjxTN9z
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJbR19nAAoJEBEnhHIHIOC6N2wP/06QyzQVw2lTVhs8JOPYmfH+
/6AgCE6UECWhZEj21HpAsgHz/Za7VhUdMD6DDuNgr1KZYWQdonOch+NvWup6sAZy
Lj8kw/kc5u7++0hb4R3E04+OB7xg2SqdqEhIgBSbo5J+6lRNTJzDzWTTToHVoIbw
L/ASI3zR02/X6PdQiP9YW9mBWiUrBLq34iffbmSrxgH5fCKbfw60eW0547tId1Bm
Gv9i8ujjgdAyNOHG3NNWxVQB+krjKpAXtyrvBR4OtlFjmFwV2QAXW0CpwpvzpFug
8cxkhnEQsvg04vOW3lOzj+a1RxU2S+RzopGfpdsXwXIC7Bcs72XMCMMQn5kN7xlY
R7ipy4cRFNfZ5WIPeSjnfoyD4EiGeQcPDdckZnvrrF4IPiuZpHKyspczEnK/Goas
MqlteffH80ZJjT5RV6QA7m5ZBxDOSOLNcn9H1NPVxNsoCqY5RT11xtwcsLnaxbZz
fy1FfR6BYC5hcScVqUTZXEaIVCmBgO/Qrzj/KUmQmANMXAw+uGvJpgq97OVaLimb
38yDBty5dtDnSeZDb3P76GipNUbeNvlzcGGgfLMm0EqAAbsOByw9xZ+mZLMQvsg2
GUZcF1NxjKOeTviSQAHNacoQEUPsuPkrBasuCNyOn6Wb7Iht8rvAbk2jvv8gKI6v
hIIj+ftEIXQ+CBxV2pDX
=IZyZ
-----END PGP SIGNATURE-----

--Nq2Wo0NMKNjxTN9z--
0
gilmagno
7/12/2018 2:02:15 PM
On Thu, 2018-07-12 at 19:35 +0800, Lauren C. wrote:
> 
> My web is powered by Apache and PHP,its access log seems as blow,
> 
> xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET 
> /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.ne
> t/" 
> "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 
> (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
> 
> A perl script for stat purpose of this log:
> 
> tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) -
> - 
> \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2'
> 
> I was totally confused about it.
>   what does m{...} and its content stand for?



m{^

Start with the (^) beginning of line anchor, the following pattern must
match at the beginning of the line.

(\S+)

Match one or more non-whitespace characters and store the match in the
$1 variable.  This matches the "xx.xx.xx.xx" portion of your string.

' - - \['

Match the literal characters SPACE HYPHEN SPACE HYPHEN SPACE LEFT-
BRACKET.

(\S+)

Match one or more non-whitespace characters and store the match in the
$2 variable.  This matches the "12/Jul/2018:19:29:43" portion of your
string.

'.*\] \"GET '

Match zero or more non-newline characters followed by the literal
string '] "GET '.

(.*?/)

Match as few as possible non-newline characters followed by a '/'
character and store the match in the $3 variable.  This matches the
"/2018/07/06/antique-internet/" portion of your string.

\s+}

And finally, match one or more whitespace characters so that the
previous non-greedy pattern will match correctly.  The modifier is
redundant so it could simply be:

\s}



John
0
jwkrahn
7/12/2018 6:23:09 PM
Thanks Jim. that explains clearly.

On 2018/7/12 星期四 PM 10:00, Jim Gibson wrote:
> 
>> On Jul 12, 2018, at 5:50 AM, Lauren C. <lauren@miscnote.net> wrote:
>>
>> thanks for the kind helps.
>> do you know what the expression in { } stands for?
>>
>> ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+
> 
> Here is a breakdown:
> 
> ^		Start looking for matches at beginning of string
> (\S+)	Match a consecutive sequence of non-whitespace characters and save in the $1 variable
> — 		Match the literal string ‘ — ‘
> \[		Match the character ‘[‘
> (\S+)	Match a consecutive sequence of non-whitespace characters and save in the $2 variable
> .*		Match any consecutive zero or more characters
> \]		Match the character ‘]’
> (space)	Match a space character
> \”		Match the character ‘“‘
> GET		Match the literal string ‘GET ‘ (with a space at the end)
> (.*?/)	Match the shortest string of any consecutive characters up to but not including a following whitespace and save in $3
> \s+		Match any consecutive sequence of whitespace characters
> 
> If all of the above entities are matched, then the regular expression evaluation returns true and the 41, $2, and $3 variables are assigned to their captured matches.
> 
0
lauren
7/13/2018 12:52:10 AM
OK I see, thanks Gil.
I think the main problem is I don't know much about regex.
I will re-learn them this day.

On 2018/7/12 星期四 PM 10:02, Gil Magno wrote:
> 2018-07-12 20:50:22 +0800 Lauren C.:
>> thanks for the kind helps.
>> do you know what the expression in { } stands for?
>>
>> ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+
> 
> Hi, Lauren
> 
> This is quickly explained in http://perldoc.perl.org/perlrequick.html#Using-character-classes
> 
> \s (lowercase) stands for a "whitespace". \S (uppercase) stands for the opposite of \s. So
> 
> $name = "lauren";
> if ($name =~ m{\s}) { print 'it matched' }
> 
> This will not match, because there's no "whitespace" in the string. But this
> 
> $name = "lauren";
> if ($name =~ m{\S}) { print 'it matched' }
> 
> will match, because in the string there is a character which is *not* "whitespace".
> 
> For the ^ [] and .*? in the regex, those pages I the previous email help you.
> 
> Best
> 
> gil
> 
>> On 2018/7/12 星期四 PM 8:37, Илья Рассадин wrote:
>>> "m{ pattern }" is regular expression to parse log string.
>>>
>>> It's equal to just "/ pattern /". Using different delimiter is convenient
>>> here because usually symbol "/" must be escaped with backslash "\", but if
>>> we use another delimiter - we can left "/" symbol unescaped and reges is
>>> more readable.
>>>
>>> You can further explore regex with this site https://regex101.com/r/4CGCcB/2
>>
>> -- 
>> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
>> For additional commands, e-mail: beginners-help@perl.org
>> http://learn.perl.org/
>>
>>
0
lauren
7/13/2018 12:53:50 AM
Thanks John.

Those symbols made me crazy entirely.
As what you explained, some are metadata of regex, some are regular 
characters, it's not clear to me, due to my poor knowledge on regex.

Yes I will learn them more.

thanks.

On 2018/7/13 星期五 AM 2:23, John W. Krahn wrote:
> On Thu, 2018-07-12 at 19:35 +0800, Lauren C. wrote:
>>
>> My web is powered by Apache and PHP,its access log seems as blow,
>>
>> xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET
>> /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.ne
>> t/"
>> "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36
>> (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
>>
>> A perl script for stat purpose of this log:
>>
>> tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) -
>> -
>> \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2'
>>
>> I was totally confused about it.
>>    what does m{...} and its content stand for?
> 
> 
> 
> m{^
> 
> Start with the (^) beginning of line anchor, the following pattern must
> match at the beginning of the line.
> 
> (\S+)
> 
> Match one or more non-whitespace characters and store the match in the
> $1 variable.  This matches the "xx.xx.xx.xx" portion of your string.
> 
> ' - - \['
> 
> Match the literal characters SPACE HYPHEN SPACE HYPHEN SPACE LEFT-
> BRACKET.
> 
> (\S+)
> 
> Match one or more non-whitespace characters and store the match in the
> $2 variable.  This matches the "12/Jul/2018:19:29:43" portion of your
> string.
> 
> '.*\] \"GET'
> 
> Match zero or more non-newline characters followed by the literal
> string '] "GET '.
> 
> (.*?/)
> 
> Match as few as possible non-newline characters followed by a '/'
> character and store the match in the $3 variable.  This matches the
> "/2018/07/06/antique-internet/" portion of your string.
> 
> \s+}
> 
> And finally, match one or more whitespace characters so that the
> previous non-greedy pattern will match correctly.  The modifier is
> redundant so it could simply be:
> 
> \s}
> 
> 
> 
> John
> 
0
lauren
7/13/2018 12:57:57 AM
On 07/12/2018 08:53 PM, Lauren C. wrote:
> OK I see, thanks Gil.
> I think the main problem is I don't know much about regex.
> I will re-learn them this day.
heh, relearning regexes will take a lifetime, not just one day! :)

but seriously, regexes are a key feature in perl and most modern 
languages. it is hard to do any text or data processing without them. i 
recommend you read those tutorials mentioned earlier and possibly other 
materials. stay away from most 'perl' or 'regex' tutorials on the net as 
many are very poorly written and full of mistakes.

and if you need more help with regexes, emailing here is a good thing!

uri
0
uri
7/13/2018 3:18:28 AM
Hi Uri,

I was reading this page:
https://www.rexegg.com/regex-lookarounds.html

the content of "Mastering Lookahead and Lookbehind" make me confused.

(?=foo)
(?<=foo)
(?!foo)
(?<!foo)

They are too hard to understand for.
In my opinion PHP doesn't have this kind of stuff.

How do you think it?


On 2018/7/13 星期五 AM 11:18, Uri Guttman wrote:
> but seriously, regexes are a key feature in perl and most modern 
> languages. it is hard to do any text or data processing without them. i 
> recommend you read those tutorials mentioned earlier and possibly other 
> materials. stay away from most 'perl' or 'regex' tutorials on the net as 
> many are very poorly written and full of mistakes.
0
lauren
7/13/2018 3:40:29 AM
On 07/12/2018 11:40 PM, Lauren C. wrote:
> Hi Uri,
>
> I was reading this page:
> https://www.rexegg.com/regex-lookarounds.html
>
> the content of "Mastering Lookahead and Lookbehind" make me confused.
>
> (?=foo)
> (?<=foo)
> (?!foo)
> (?<!foo)
>
> They are too hard to understand for.
> In my opinion PHP doesn't have this kind of stuff.
>
> How do you think it?
>
>
i suggest you don't study lookarounds until you are stronger with basic 
regex stuff. they are useful but not needed that often. you should start 
with simpler stuff like character classes and their shortcuts, grouping 
and grabbing and quantifiers (repeat counts). then move on to simple 
zero-width assertions and other stuff. after you are very comfortable 
with all that, there are plenty of deeper things to learn like 
lookaround. walk before you run! :)

the site you list above seems like it is well written but its ordering 
of lessons is way too fast and wrong IMO.

i highly recommend you read the official perl tutorial on regexes 
(mentioned by someone else earlier)

https://perldoc.perl.org/perlretut.html

it has the right pace and topic order to learn simpler and more common 
things first and builds on those. the site you found is more like a 
firehose and your asking about lookaround is why it isn't a good tutorial.

uri
0
uri
7/13/2018 4:22:40 AM
Reply: