regexec.c: LGTM analysis warns about two comparisons, but we actually need them

Here is a case where the LGTM.com analysis of the Perl 5 source code 
produces results that are at first plausible but ultimately incorrect.

The following section of regexec.c is cited 
(https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840804) 
with the warning-level alert "Comparison result is always the same":

#####
  9732 STATIC bool
  9733 S_reginclass(pTHX_ regexp * const prog, const regnode * const n, 
const U8* const p, const U8* const p_end      , const bool utf8_target)
  9734 {
  9735     dVAR;
  9736     const char flags = ANYOF_FLAGS(n);
  9737     bool match = FALSE;
  9738     UV c = *p;
....
  9762     /* If this character is potentially in the bitmap, check it */
  9763     if (c < NUM_ANYOF_CODE_POINTS) {
  9764     if (ANYOF_BITMAP_TEST(n, c))
  9765         match = TRUE;
  9766     else if ((flags
  9767                 & 
ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER)
  9768                   && OP(n) == ANYOFD
  9769           && ! utf8_target
  9770           && ! isASCII(c))
  9771     {
  9772         match = TRUE;
  9773     }
  9774     else if (flags & ANYOF_LOCALE_FLAGS) {
  9775         if ((flags & ANYOFL_FOLD)
  9776                 && c < 256
  9777         && ANYOF_BITMAP_TEST(n, PL_fold_locale[c]))
  9778             {
  9779                 match = TRUE;
  9780             }
  9781             else if (ANYOF_POSIXL_TEST_ANY_SET(n)
  9782                      && c < 256
  9783             ) {
  9784
....
  9827                 }
  9828         }
  9829     }
  9830     }
....
#####

At lines 9776 and 9782, the 'c < 256' statements are flagged as 
problematic because "Comparison is always true because c <= 255."

By inserting debugging statements (as suggested by xenu++) and then 
running the test suite, I confirmed that, at least within the test suite 
the value of 'c' ranges between 0 and 255.

I then created a branch (jkeenan/lgtm-regexec-c; 
https://perl5.git.perl.org/perl.git/commitdiff/1a6c6e69bfa70bc55eb0d1e85538d7420659a652) 
in which I simply removed those two comparisons.  Notwithstanding the 
alleged superfluousness of the two comparisons, I got these failures:

#####
Test Summary Report
-------------------
re/regexp_noamp.t                                                (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp.t                                                      (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_nonull.t                                               (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_qr.t                                                   (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_notrie.t                                               (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_qr_embed.t                                             (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_trielist.t                                             (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
re/regexp_qr_embed_thr.t                                         (Wstat: 
0 Tests: 2006 Failed: 4)
   Failed tests:  1830-1831, 1838-1839
Files=2630, Tests=1181079, 271 wallclock secs (77.61 usr 10.15 sys + 
555.09 cusr 52.55 csys = 695.39 CPU)
Result: FAIL
*** Error code 32
#####

Each of the failing test files uses fixture data kept in t/re/re_tests:

#####
    8 # pat|string|y/n/etc|expr|expected-expr|skip-reason|comment$
....
1829 # These mostly exercize various paths in the optimizer$
1830 /(?l:a?\w)/|b|y|$&|b$
....
1838 /\s?\s/l|\t|y|$&|\t$
1839 /\s?\d/l|3|y|$&|3$
#####

(In regexec.c, what I have represented as pipe-characters above are 
actually hard-tabs.)

So we cannot remove the two comparisons and must somehow inform LGTM 
that these warnings are spurious.

Can anyone explain why these unit tests should preclude removal of the 
two comparisons?

Thank you very much.
Jim Keenan
0
jkeenan
12/6/2018 3:28:54 AM
perl.perl5.porters 47439 articles. 0 followers. Follow

12 Replies
17 Views

Similar Articles

[PageSpeed] 57

James E Keenan <jkeenan@pobox.com> wrote:
:Here is a case where the LGTM.com analysis of the Perl 5 source code 
:produces results that are at first plausible but ultimately incorrect.

I think the analysis is indeed correct, but it is relying on the stability
of things deliberately hidden behind macros. The question here is not
whether what it has spotted is correct today, but whether relying on that
is a safe thing to do.

The initial check on which it is relying is:

9763:    if (c < NUM_ANYOF_CODE_POINTS) {

... checking against a macro defined in regcomp.h thus:

> /* This give the number of code points that can be in the bitmap of an ANYOF
>  * node.  The shift number must currently be one of: 8..12.  It can't be less
>  * than 8 (256) because some code relies on it being at least that.  Above 12
>  * (4096), and you start running into warnings that some data structure widths
>  * have been exceeded, though the test suite as of this writing still passes
>  * for up through 16, which is as high as anyone would ever want to go,
>  * encompassing all of the Unicode BMP, and thus including all the economically
>  * important world scripts.  At 12 most of them are: including Arabic,
>  * Cyrillic, Greek, Hebrew, Indian subcontinent, Latin, and Thai; but not Han,
>  * Japanese, nor Korean.  (The regarglen structure in regnodes.h is a U8, and
>  * the trie types TRIEC and AHOCORASICKC are larger than U8 for shift values
>  * below above 12.)  Be sure to benchmark before changing, as larger sizes do
>  * significantly slow down the test suite */
> #define NUM_ANYOF_CODE_POINTS   (1 << 8)

So for the current definition, it is entirely true that code guarded by
that check knows c is less than 256, but the contract implied by those
comments are that things should continue to work correctly even if that
definition changes (within the described range).

Now in the supposedly redundant checks, we're comparing against a literal 256,
and it isn't clear why - it is doing so in the first case just before
an ANYOF table lookup, suggesting a table-size guard that you'd think
would use the NUM_ANYOF_CODE_POINTS macro; in the second case it is
just _after_ such a lookup, which is somewhat crazier, but probably
explained by the '(U8) c' cast in the guarded block (though if that
were the reason, the literal 256 should more correctly/usefully be
U8_MAX).

If we're lucky, Karl might understand the intent of those tests and
be able to advise. It is certainly possible that one or both is
redundant.

:By inserting debugging statements (as suggested by xenu++) and then 
:running the test suite, I confirmed that, at least within the test suite 
:the value of 'c' ranges between 0 and 255.
:
:I then created a branch (jkeenan/lgtm-regexec-c; 
:https://perl5.git.perl.org/perl.git/commitdiff/1a6c6e69bfa70bc55eb0d1e85538d7420659a652) 
:in which I simply removed those two comparisons.  Notwithstanding the 
:alleged superfluousness of the two comparisons, I got these failures:

Do those test failures go away for you when you remove the debugging
statements? I think it's those causing your problem rather than the
code change.

But for the reasons above even if the tests were to pass we should not
make this change without understanding the purpose of those checks.

Hugo
0
hv
12/6/2018 8:04:44 AM
On Wed, Dec 05, 2018 at 10:28:54PM -0500, James E Keenan wrote:
> Here is a case where the LGTM.com analysis of the Perl 5 source code
> produces results that are at first plausible but ultimately incorrect.
> 
> The following section of regexec.c is cited (https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840804)
> with the warning-level alert "Comparison result is always the same":

This was discussed in #133686, the comparison needs to stay to allow
for local configuration of larger values of NUM_ANYOF_CODE_POINTS.

> I then created a branch (jkeenan/lgtm-regexec-c; https://perl5.git.perl.org/perl.git/commitdiff/1a6c6e69bfa70bc55eb0d1e85538d7420659a652)
> in which I simply removed those two comparisons.  Notwithstanding the
> alleged superfluousness of the two comparisons, I got these failures:
> 
> #####
> Test Summary Report
> -------------------
> re/regexp_noamp.t                                                (Wstat: 0
> Tests: 2006 Failed: 4)
>   Failed tests:  1830-1831, 1838-1839
> re/regexp.t                                                      (Wstat: 0
> Tests: 2006 Failed: 4)
>   Failed tests:  1830-1831, 1838-1839

The tests are failing because of the added
_CHECK_AND_OUTPUT_WIDE_LOCALE_CP_MSG(c);, from your commit:

diff --git a/regexec.c b/regexec.c
index eb24567e04..a265cc84cb 100644
--- a/regexec.c
+++ b/regexec.c
@@ -9772,15 +9772,13 @@ S_reginclass(pTHX_ regexp * const prog, const regnode * const n, const U8* const
            match = TRUE;
        }
        else if (flags & ANYOF_LOCALE_FLAGS) {
+            _CHECK_AND_OUTPUT_WIDE_LOCALE_CP_MSG(c);
            if ((flags & ANYOFL_FOLD)
-                && c < 256
                && ANYOF_BITMAP_TEST(n, PL_fold_locale[c]))
             {
                 match = TRUE;


Remove that and they pass.

> So we cannot remove the two comparisons and must somehow inform LGTM that
> these warnings are spurious.
> 
> Can anyone explain why these unit tests should preclude removal of the two
> comparisons?

As was discussed with you in #p5p and mentioned in #133686 these
comparisons are needed to alloww for local configuration of
NUM_ANYOF_CODE_POINTS.

Since NUM_ANYOF_CODE_POINTS is 256 (1<<8) by default LGTM considers
the comparisons superfluous, but if NUM_ANYOF_CODE_POINTS is defined
to 512 (or perhaps larger) to cover the Latin characters) the
comparisons against 256 later in the code would no longer be
superfluous.

The documented LGTM mechanism to suppress a warning is a "// lgtm"
comment[1], but since we support C89 we cannot use such comments.

Since LGTM doesn't provide an admin UI like coverity does for marking
warnings as "working as intended", we just have to live with LGTM
warning about that code.

Tony

[1] https://lgtm.com/help/lgtm/alert-suppression#c-c++
0
tony
12/6/2018 8:16:41 AM
Just a general observation,  sometimes in the regex engine we use a
value like 256 in vars which look like they only hold an octet as a
flag/sentinel for things like end of string.

It is not the case /here/ but it does happen.

Yves
On Thu, 6 Dec 2018 at 09:17, Tony Cook <tony@develop-help.com> wrote:
>
> On Wed, Dec 05, 2018 at 10:28:54PM -0500, James E Keenan wrote:
> > Here is a case where the LGTM.com analysis of the Perl 5 source code
> > produces results that are at first plausible but ultimately incorrect.
> >
> > The following section of regexec.c is cited (https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840804)
> > with the warning-level alert "Comparison result is always the same":
>
> This was discussed in #133686, the comparison needs to stay to allow
> for local configuration of larger values of NUM_ANYOF_CODE_POINTS.
>
> > I then created a branch (jkeenan/lgtm-regexec-c; https://perl5.git.perl.org/perl.git/commitdiff/1a6c6e69bfa70bc55eb0d1e85538d7420659a652)
> > in which I simply removed those two comparisons.  Notwithstanding the
> > alleged superfluousness of the two comparisons, I got these failures:
> >
> > #####
> > Test Summary Report
> > -------------------
> > re/regexp_noamp.t                                                (Wstat: 0
> > Tests: 2006 Failed: 4)
> >   Failed tests:  1830-1831, 1838-1839
> > re/regexp.t                                                      (Wstat: 0
> > Tests: 2006 Failed: 4)
> >   Failed tests:  1830-1831, 1838-1839
>
> The tests are failing because of the added
> _CHECK_AND_OUTPUT_WIDE_LOCALE_CP_MSG(c);, from your commit:
>
> diff --git a/regexec.c b/regexec.c
> index eb24567e04..a265cc84cb 100644
> --- a/regexec.c
> +++ b/regexec.c
> @@ -9772,15 +9772,13 @@ S_reginclass(pTHX_ regexp * const prog, const regnode * const n, const U8* const
>             match = TRUE;
>         }
>         else if (flags & ANYOF_LOCALE_FLAGS) {
> +            _CHECK_AND_OUTPUT_WIDE_LOCALE_CP_MSG(c);
>             if ((flags & ANYOFL_FOLD)
> -                && c < 256
>                 && ANYOF_BITMAP_TEST(n, PL_fold_locale[c]))
>              {
>                  match = TRUE;
>
>
> Remove that and they pass.
>
> > So we cannot remove the two comparisons and must somehow inform LGTM that
> > these warnings are spurious.
> >
> > Can anyone explain why these unit tests should preclude removal of the two
> > comparisons?
>
> As was discussed with you in #p5p and mentioned in #133686 these
> comparisons are needed to alloww for local configuration of
> NUM_ANYOF_CODE_POINTS.
>
> Since NUM_ANYOF_CODE_POINTS is 256 (1<<8) by default LGTM considers
> the comparisons superfluous, but if NUM_ANYOF_CODE_POINTS is defined
> to 512 (or perhaps larger) to cover the Latin characters) the
> comparisons against 256 later in the code would no longer be
> superfluous.
>
> The documented LGTM mechanism to suppress a warning is a "// lgtm"
> comment[1], but since we support C89 we cannot use such comments.
>
> Since LGTM doesn't provide an admin UI like coverity does for marking
> warnings as "working as intended", we just have to live with LGTM
> warning about that code.
>
> Tony
>
> [1] https://lgtm.com/help/lgtm/alert-suppression#c-c++



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"
0
demerphq
12/6/2018 8:32:35 AM
On Thu, Dec 06, 2018 at 08:04:44AM +0000, hv@crypt.org wrote:
> James E Keenan <jkeenan@pobox.com> wrote:
> :Here is a case where the LGTM.com analysis of the Perl 5 source code 
> :produces results that are at first plausible but ultimately incorrect.
> 
> I think the analysis is indeed correct, but it is relying on the stability
> of things deliberately hidden behind macros. The question here is not
> whether what it has spotted is correct today, but whether relying on that
> is a safe thing to do.
> 
> The initial check on which it is relying is:
> 
> 9763:    if (c < NUM_ANYOF_CODE_POINTS) {
> 
> .. checking against a macro defined in regcomp.h thus:
> 
> > /* This give the number of code points that can be in the bitmap of an ANYOF
> >  * node.  The shift number must currently be one of: 8..12.  It can't be less
> >  * than 8 (256) because some code relies on it being at least that.  Above 12
> >  * (4096), and you start running into warnings that some data structure widths
> >  * have been exceeded, though the test suite as of this writing still passes
> >  * for up through 16, which is as high as anyone would ever want to go,
> >  * encompassing all of the Unicode BMP, and thus including all the economically
> >  * important world scripts.  At 12 most of them are: including Arabic,
> >  * Cyrillic, Greek, Hebrew, Indian subcontinent, Latin, and Thai; but not Han,
> >  * Japanese, nor Korean.  (The regarglen structure in regnodes.h is a U8, and
> >  * the trie types TRIEC and AHOCORASICKC are larger than U8 for shift values
> >  * below above 12.)  Be sure to benchmark before changing, as larger sizes do
> >  * significantly slow down the test suite */
> > #define NUM_ANYOF_CODE_POINTS   (1 << 8)
> 
> So for the current definition, it is entirely true that code guarded by
> that check knows c is less than 256, but the contract implied by those
> comments are that things should continue to work correctly even if that
> definition changes (within the described range).
> 
> Now in the supposedly redundant checks, we're comparing against a literal 256,
> and it isn't clear why - it is doing so in the first case just before
> an ANYOF table lookup, suggesting a table-size guard that you'd think
> would use the NUM_ANYOF_CODE_POINTS macro; in the second case it is
> just _after_ such a lookup, which is somewhat crazier, but probably
> explained by the '(U8) c' cast in the guarded block (though if that
> were the reason, the literal 256 should more correctly/usefully be
> U8_MAX).
> 
> If we're lucky, Karl might understand the intent of those tests and
> be able to advise. It is certainly possible that one or both is
> redundant.

In both cases the code point is used where it's limited to 8 bits.

In the first case it's used as an index in PL_fold_locale[], which is
an array of 256 folded code points.

In the second case it's used as a parameter to Perl_isFOO_lc() for
which the character parameter is a U8.

It might be preferable to make the first c < 256 into a comparison
against a symbolic constant (so a possible local change could be
extending PL_fold_locale to 512 or more fold entries, though the type
would also need to change from unsigned char[] to unsigned short[].

The second might be better to compare against a cast value, eg.
something like c == (U8)c

Tony
0
tony
12/6/2018 9:28:24 AM
On 12/6/18 3:04 AM, hv@crypt.org wrote:
> James E Keenan <jkeenan@pobox.com> wrote:
> :Here is a case where the LGTM.com analysis of the Perl 5 source code
> :produces results that are at first plausible but ultimately incorrect.
> 
> I think the analysis is indeed correct, but it is relying on the stability
> of things deliberately hidden behind macros. The question here is not
> whether what it has spotted is correct today, but whether relying on that
> is a safe thing to do.
> 
> The initial check on which it is relying is:
> 
> 9763:    if (c < NUM_ANYOF_CODE_POINTS) {
> 
> .. checking against a macro defined in regcomp.h thus:
> 
>> /* This give the number of code points that can be in the bitmap of an ANYOF
>>   * node.  The shift number must currently be one of: 8..12.  It can't be less
>>   * than 8 (256) because some code relies on it being at least that.  Above 12
>>   * (4096), and you start running into warnings that some data structure widths
>>   * have been exceeded, though the test suite as of this writing still passes
>>   * for up through 16, which is as high as anyone would ever want to go,
>>   * encompassing all of the Unicode BMP, and thus including all the economically
>>   * important world scripts.  At 12 most of them are: including Arabic,
>>   * Cyrillic, Greek, Hebrew, Indian subcontinent, Latin, and Thai; but not Han,
>>   * Japanese, nor Korean.  (The regarglen structure in regnodes.h is a U8, and
>>   * the trie types TRIEC and AHOCORASICKC are larger than U8 for shift values
>>   * below above 12.)  Be sure to benchmark before changing, as larger sizes do
>>   * significantly slow down the test suite */
>> #define NUM_ANYOF_CODE_POINTS   (1 << 8)
> 
> So for the current definition, it is entirely true that code guarded by
> that check knows c is less than 256, but the contract implied by those
> comments are that things should continue to work correctly even if that
> definition changes (within the described range).
> 
> Now in the supposedly redundant checks, we're comparing against a literal 256,
> and it isn't clear why - it is doing so in the first case just before
> an ANYOF table lookup, suggesting a table-size guard that you'd think
> would use the NUM_ANYOF_CODE_POINTS macro; in the second case it is
> just _after_ such a lookup, which is somewhat crazier, but probably
> explained by the '(U8) c' cast in the guarded block (though if that
> were the reason, the literal 256 should more correctly/usefully be
> U8_MAX).
> 
> If we're lucky, Karl might understand the intent of those tests and
> be able to advise. It is certainly possible that one or both is
> redundant.
> 
> :By inserting debugging statements (as suggested by xenu++) and then
> :running the test suite, I confirmed that, at least within the test suite
> :the value of 'c' ranges between 0 and 255.
> :
> :I then created a branch (jkeenan/lgtm-regexec-c;
> :https://perl5.git.perl.org/perl.git/commitdiff/1a6c6e69bfa70bc55eb0d1e85538d7420659a652)
> :in which I simply removed those two comparisons.  Notwithstanding the
> :alleged superfluousness of the two comparisons, I got these failures:
> 
> Do those test failures go away for you when you remove the debugging
> statements? I think it's those causing your problem rather than the
> code change.

No, the debugging statements were put into the code before I 
experimented with removal of the two comparisons.  They were then 
removed when I created the branch.  So they cannot be the cause of the 
problem.

> 
> But for the reasons above even if the tests were to pass we should not
> make this change without understanding the purpose of those checks.
> 
> Hugo
> 
0
jkeenan
12/6/2018 1:15:25 PM
On 12/6/18 3:16 AM, Tony Cook wrote:
> On Wed, Dec 05, 2018 at 10:28:54PM -0500, James E Keenan wrote:
>> Here is a case where the LGTM.com analysis of the Perl 5 source code
>> produces results that are at first plausible but ultimately incorrect.
>>
>> The following section of regexec.c is cited (https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840804)
>> with the warning-level alert "Comparison result is always the same":
> 
> This was discussed in #133686, the comparison needs to stay to allow
> for local configuration of larger values of NUM_ANYOF_CODE_POINTS.
> 

Would it be okay if I inserted a comment to that effect into regexec.c 
so that we don't stumble upon this again in the future?

Thank you very much.
Jim Keenan
0
jkeenan
12/6/2018 1:20:50 PM
On Thu, Dec 06, 2018 at 08:20:50AM -0500, James E Keenan wrote:
> On 12/6/18 3:16 AM, Tony Cook wrote:
> > On Wed, Dec 05, 2018 at 10:28:54PM -0500, James E Keenan wrote:
> > > Here is a case where the LGTM.com analysis of the Perl 5 source code
> > > produces results that are at first plausible but ultimately incorrect.
> > > 
> > > The following section of regexec.c is cited (https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840804)
> > > with the warning-level alert "Comparison result is always the same":
> > 
> > This was discussed in #133686, the comparison needs to stay to allow
> > for local configuration of larger values of NUM_ANYOF_CODE_POINTS.
> > 
> 
> Would it be okay if I inserted a comment to that effect into regexec.c so
> that we don't stumble upon this again in the future?

That sounds sensible to me regardless of systems like LGTM. You *could*
even use the LGTM suppression syntax (in addition to a human-readable
explanation); this would mean that if in the future we can implement
/* */ suppression comments in LGTM, they would be recognised.

You can watch this topic for more updates on that feature request:

https://discuss.lgtm.com/t/rfe-support-for-supression-comments-in-comment-blocks/1364

In case you weren't aware, if you think that a certain type of alert
will *never* be interesting for a given code-base you can disable it
entirely in .lgtm.yml:

https://lgtm.com/help/lgtm/showing-hiding-query-results

Cheers,
Dominic.
0
dom
12/6/2018 1:30:58 PM
On 12/6/18 3:16 AM, Tony Cook wrote:
> On Wed, Dec 05, 2018 at 10:28:54PM -0500, James E Keenan wrote:
>> Here is a case where the LGTM.com analysis of the Perl 5 source code
>> produces results that are at first plausible but ultimately incorrect.
>>
>> The following section of regexec.c is cited (https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840804)
>> with the warning-level alert "Comparison result is always the same":
> 
> This was discussed in #133686, the comparison needs to stay to allow
> for local configuration of larger values of NUM_ANYOF_CODE_POINTS.
> 
>> I then created a branch (jkeenan/lgtm-regexec-c; https://perl5.git.perl.org/perl.git/commitdiff/1a6c6e69bfa70bc55eb0d1e85538d7420659a652)
>> in which I simply removed those two comparisons.  Notwithstanding the
>> alleged superfluousness of the two comparisons, I got these failures:
>>
>> #####
>> Test Summary Report
>> -------------------
>> re/regexp_noamp.t                                                (Wstat: 0
>> Tests: 2006 Failed: 4)
>>    Failed tests:  1830-1831, 1838-1839
>> re/regexp.t                                                      (Wstat: 0
>> Tests: 2006 Failed: 4)
>>    Failed tests:  1830-1831, 1838-1839
> 
> The tests are failing because of the added
> _CHECK_AND_OUTPUT_WIDE_LOCALE_CP_MSG(c);, from your commit:
> 
> diff --git a/regexec.c b/regexec.c
> index eb24567e04..a265cc84cb 100644
> --- a/regexec.c
> +++ b/regexec.c
> @@ -9772,15 +9772,13 @@ S_reginclass(pTHX_ regexp * const prog, const regnode * const n, const U8* const
>              match = TRUE;
>          }
>          else if (flags & ANYOF_LOCALE_FLAGS) {
> +            _CHECK_AND_OUTPUT_WIDE_LOCALE_CP_MSG(c);
>              if ((flags & ANYOFL_FOLD)
> -                && c < 256
>                  && ANYOF_BITMAP_TEST(n, PL_fold_locale[c]))
>               {
>                   match = TRUE;
> 
> 
> Remove that and they pass.
> 

You are correct, as I now see in this newer branch: 
jkeenan/2nd-lgtm-regexec-c.

I don't know how that line got into my original branch.  Sorry for the 
confusion.

Thank you very much.
Jim Keenan
0
jkeenan
12/6/2018 7:48:34 PM
On 12/6/18 6:30 AM, Dominic Hargreaves wrote:
> On Thu, Dec 06, 2018 at 08:20:50AM -0500, James E Keenan wrote:
>> On 12/6/18 3:16 AM, Tony Cook wrote:
>>> On Wed, Dec 05, 2018 at 10:28:54PM -0500, James E Keenan wrote:
>>>> Here is a case where the LGTM.com analysis of the Perl 5 source code
>>>> produces results that are at first plausible but ultimately incorrect.
>>>>
>>>> The following section of regexec.c is cited (https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840804)
>>>> with the warning-level alert "Comparison result is always the same":
>>>
>>> This was discussed in #133686, the comparison needs to stay to allow
>>> for local configuration of larger values of NUM_ANYOF_CODE_POINTS.
>>>
>>
>> Would it be okay if I inserted a comment to that effect into regexec.c so
>> that we don't stumble upon this again in the future?

I have pushed a branch with this commit that changes those numbers to 
symbols to make things clear

https://perl5.git.perl.org/perl.git/commitdiff/ef5a8af164793929880381dcf36b727440050d44
> 
> That sounds sensible to me regardless of systems like LGTM. You *could*
> even use the LGTM suppression syntax (in addition to a human-readable
> explanation); this would mean that if in the future we can implement
> /* */ suppression comments in LGTM, they would be recognised.
> 
> You can watch this topic for more updates on that feature request:
> 
> https://discuss.lgtm.com/t/rfe-support-for-supression-comments-in-comment-blocks/1364
> 
> In case you weren't aware, if you think that a certain type of alert
> will *never* be interesting for a given code-base you can disable it
> entirely in .lgtm.yml:
> 
> https://lgtm.com/help/lgtm/showing-hiding-query-results
> 
> Cheers,
> Dominic.
> 
0
public
12/6/2018 7:54:11 PM
On 12/6/18 12:54 PM, Karl Williamson wrote:
> On 12/6/18 6:30 AM, Dominic Hargreaves wrote:
>> On Thu, Dec 06, 2018 at 08:20:50AM -0500, James E Keenan wrote:
>>> On 12/6/18 3:16 AM, Tony Cook wrote:
>>>> On Wed, Dec 05, 2018 at 10:28:54PM -0500, James E Keenan wrote:
>>>>> Here is a case where the LGTM.com analysis of the Perl 5 source code
>>>>> produces results that are at first plausible but ultimately incorrect.
>>>>>
>>>>> The following section of regexec.c is cited 
>>>>> (https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840804) 
>>>>>
>>>>> with the warning-level alert "Comparison result is always the same":
>>>>
>>>> This was discussed in #133686, the comparison needs to stay to allow
>>>> for local configuration of larger values of NUM_ANYOF_CODE_POINTS.
>>>>
>>>
>>> Would it be okay if I inserted a comment to that effect into 
>>> regexec.c so
>>> that we don't stumble upon this again in the future?
> 
> I have pushed a branch with this commit that changes those numbers to 
> symbols to make things clear
> 
> https://perl5.git.perl.org/perl.git/commitdiff/ef5a8af164793929880381dcf36b727440050d44 

The branch has not been pushed to blead as g540fa6e13e.  Hopefully this 
fixes the LGTM notice, as the hard-coded numbers have been replaced by 
the appropriate mnemonic.
> 
>>
>> That sounds sensible to me regardless of systems like LGTM. You *could*
>> even use the LGTM suppression syntax (in addition to a human-readable
>> explanation); this would mean that if in the future we can implement
>> /* */ suppression comments in LGTM, they would be recognised.
>>
>> You can watch this topic for more updates on that feature request:
>>
>> https://discuss.lgtm.com/t/rfe-support-for-supression-comments-in-comment-blocks/1364 
>>
>>
>> In case you weren't aware, if you think that a certain type of alert
>> will *never* be interesting for a given code-base you can disable it
>> entirely in .lgtm.yml:
>>
>> https://lgtm.com/help/lgtm/showing-hiding-query-results
>>
>> Cheers,
>> Dominic.
>>
> 
0
public
12/8/2018 4:20:30 AM
On Fri, Dec 07, 2018 at 09:20:30PM -0700, Karl Williamson wrote:
> On 12/6/18 12:54 PM, Karl Williamson wrote:
> > I have pushed a branch with this commit that changes those numbers to
> > symbols to make things clear
> > 
> > https://perl5.git.perl.org/perl.git/commitdiff/ef5a8af164793929880381dcf36b727440050d44
> 
> The branch has not been pushed to blead as g540fa6e13e.  Hopefully this
> fixes the LGTM notice, as the hard-coded numbers have been replaced by the
> appropriate mnemonic.

I don't expect it to prevent LGTM warning about the code, but
hopefully the next person who sees the warning will see the note on
NUM_ANYOF_CODE_POINTS and understand that the two comparisons that
LGTM complains about aren't necessarily redundant.

Tony
0
tony
12/9/2018 11:08:25 PM
On 12/9/18 4:08 PM, Tony Cook wrote:
> On Fri, Dec 07, 2018 at 09:20:30PM -0700, Karl Williamson wrote:
>> On 12/6/18 12:54 PM, Karl Williamson wrote:
>>> I have pushed a branch with this commit that changes those numbers to
>>> symbols to make things clear
>>>
>>> https://perl5.git.perl.org/perl.git/commitdiff/ef5a8af164793929880381dcf36b727440050d44
>>
>> The branch has not been pushed to blead as g540fa6e13e.  Hopefully this
>> fixes the LGTM notice, as the hard-coded numbers have been replaced by the
>> appropriate mnemonic.
> 
> I don't expect it to prevent LGTM warning about the code, but
> hopefully the next person who sees the warning will see the note on
> NUM_ANYOF_CODE_POINTS and understand that the two comparisons that
> LGTM complains about aren't necessarily redundant.
> 
> Tony
> 

s/not pushed to blead/now pushed to blead/
0
public
12/10/2018 1:46:12 AM
Reply: