------=_Part_157968_2041415.1174351234197 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline There is a problem with how the regexp engine handles certain types of escapes and strings of different encodings. For instance: perl -wle'$x=3Dqq(\x{DF}); $x=3D~/$x|\x{100}/ and print qq(ok)' produces the following: Malformed UTF-8 character (unexpected non-continuation byte 0x7c, immediately after start byte 0xdf) in regexp compilation at -e line 1. As far as I can tell in blead this is because when the \x{100} is parsed during the sizing phase it switches the pattern is utf8 flag to true, but doesnt upgrade the string to utf8. On the second pass it tries to read the string as utf8 and fails. The attached patch fixes this so that when it notices this might happen it upgrades the string to utf8 and then redoes[2] the sizing phase since the recoding might have altered the required allocation. This could have caused a buffer overrun error.[1] D:\dev\perl\ver\zoro\t\win32>..\perl -wle"$x=3Dqq(\x{DF}); $x=3D~/$x|\x{100}/ and print 'ok'" ok \x{DF} is =DF by the way. Pesky thing. As a bonus this patch includes two bug fixes which I came across while working out the utf8 encoding problem. One is for the trie code charclass logic which was doing the wrong thing under utf8 and the other was in some debugging output code that was using the wrong utf8 flag. Not bad for number of bugs per single test case really. :-) Yves [1] I almost wonder if this could have been responsible for the sizing bug in the xml code from a while back.. Ill have to try reverting that patch with this patch applied and see. [2] This is far from the most efficient way to deal with this. It would be nice to fail-fast the parse somehow so that the least work possible is done in the first parse pass following the time we know we have to upgrade the string . This could be far into the parse recursion stack so its a bit difficult to do. --=20 perl -Mre=3Ddebug -e "/just|another|perl|hacker/" ------=_Part_157968_2041415.1174351234197 Content-Type: text/x-patch; name=encoding_madness.patch; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: base64 X-Attachment-Id: f_ezhm4mbm Content-Disposition: attachment; filename="encoding_madness.patch" SW5kZXg6IEQ6L2Rldi9wZXJsL3Zlci96b3JvL3JlZ2NvbXAuYwo9PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBEOi9k ZXYvcGVybC92ZXIvem9yby9yZWdjb21wLmMJKHJldmlzaW9uIDEyNzMpCisrKyBEOi9kZXYvcGVy bC92ZXIvem9yby9yZWdjb21wLmMJKHJldmlzaW9uIDEyNzYpCkBAIC0xMjQsNyArMTI0LDExIEBA CiAgICAgcmVnbm9kZQkqKm9wZW5fcGFyZW5zOwkJLyogcG9pbnRlcnMgdG8gb3BlbiBwYXJlbnMg Ki8KICAgICByZWdub2RlCSoqY2xvc2VfcGFyZW5zOwkJLyogcG9pbnRlcnMgdG8gY2xvc2UgcGFy ZW5zICovCiAgICAgcmVnbm9kZQkqb3BlbmQ7CQkJLyogRU5EIG5vZGUgaW4gcHJvZ3JhbSAqLwot ICAgIEkzMgkJdXRmODsKKyAgICBJMzIJCXV0Zjg7ICAgICAgICAgICAgICAgICAgIC8qIHdoZXRo ZXIgdGhlIHBhdHRlcm4gaXMgdXRmOCBvciBub3QgKi8KKyAgICBJMzIgICAgICAgICBvcmlnX3V0 Zjg7ICAgICAgICAgICAgICAvKiB3aGV0aGVyIHRoZSBwYXR0ZXJuIHdhcyBvcmlnaW5hbGx5IGlu IHV0ZjgKKyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBYWFggdXNl IGZvciBmdXR1cmUgb3B0aW1pc2F0aW9uIG9mIGNhc2UKKyAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICB3aGVyZSBwYXR0ZXJuIG11c3QgYmUgdXBncmFkZWQgdG8gdXRm OC4KKyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAqLwogICAgIEhW CQkqY2hhcm5hbWVzOwkJLyogY2FjaGUgb2YgbmFtZWQgc2VxdWVuY2VzICovCiAgICAgSFYJCSpw YXJlbl9uYW1lczsJCS8qIFBhcmVuIG5hbWVzICovCiAgICAgCkBAIC0xNjgsNiArMTcyLDcgQEAK ICNkZWZpbmUgUkV4Q19zZWVuX3plcm9sZW4JKHBSRXhDX3N0YXRlLT5zZWVuX3plcm9sZW4pCiAj ZGVmaW5lIFJFeENfc2Vlbl9ldmFscwkocFJFeENfc3RhdGUtPnNlZW5fZXZhbHMpCiAjZGVmaW5l IFJFeENfdXRmOAkocFJFeENfc3RhdGUtPnV0ZjgpCisjZGVmaW5lIFJFeENfb3JpZ191dGY4CShw UkV4Q19zdGF0ZS0+b3JpZ191dGY4KQogI2RlZmluZSBSRXhDX2NoYXJuYW1lcyAgKHBSRXhDX3N0 YXRlLT5jaGFybmFtZXMpCiAjZGVmaW5lIFJFeENfb3Blbl9wYXJlbnMJKHBSRXhDX3N0YXRlLT5v cGVuX3BhcmVucykKICNkZWZpbmUgUkV4Q19jbG9zZV9wYXJlbnMJKHBSRXhDX3N0YXRlLT5jbG9z ZV9wYXJlbnMpCkBAIC0xMzc1LDE2ICsxMzgwLDE3IEBACiAgICAgICAgIFU4IGZvbGRidWZbIFVU RjhfTUFYQllURVNfQ0FTRSArIDEgXTsKICAgICAgICAgY29uc3QgVTggKnNjYW4gPSAoVTgqKU5V TEw7CiAgICAgICAgIFUzMiB3b3JkbGVuICAgICAgPSAwOyAgICAgICAgIC8qIHJlcXVpcmVkIGlu aXQgKi8KLSAgICAgICAgU1RSTEVOIGNoYXJzPTA7CisgICAgICAgIFNUUkxFTiBjaGFycyA9IDA7 CisgICAgICAgIGJvb2wgc2V0X2JpdCA9IHRyaWUtPmJpdG1hcCA/IDEgOiAwOyAvKnN0b3JlIHRo ZSBmaXJzdCBjaGFyIGluIHRoZSBiaXRtYXA/Ki8KIAogICAgICAgICBpZiAoT1Aobm9wZXIpID09 IE5PVEhJTkcpIHsKICAgICAgICAgICAgIHRyaWUtPm1pbmxlbj0gMDsKICAgICAgICAgICAgIGNv bnRpbnVlOwogICAgICAgICB9Ci0gICAgICAgIGlmICh0cmllLT5iaXRtYXApIHsKLSAgICAgICAg ICAgIFRSSUVfQklUTUFQX1NFVCh0cmllLCp1Yyk7Ci0gICAgICAgICAgICBpZiAoIGZvbGRlciAp IFRSSUVfQklUTUFQX1NFVCh0cmllLGZvbGRlclsgKnVjIF0pOyAgICAgICAgICAgIAotICAgICAg ICB9CisgICAgICAgIGlmICggc2V0X2JpdCApIC8qIGJpdG1hcCBvbmx5IGFsbG9jZWQgd2hlbiAh KFVURiYmRm9sZGluZykgKi8KKyAgICAgICAgICAgIFRSSUVfQklUTUFQX1NFVCh0cmllLCp1Yyk7 IC8qc3RvcmUgdGhlIHJhdyBmaXJzdCBieXRlIAorICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICByZWdhcmRsZXNzIG9mIGVuY29kaW5nICovCisgICAgICAgICAKICAgICAg ICAgZm9yICggOyB1YyA8IGUgOyB1YyArPSBsZW4gKSB7CiAgICAgICAgICAgICBUUklFX0NIQVJD T1VOVCh0cmllKSsrOwogICAgICAgICAgICAgVFJJRV9SRUFEX0NIQVI7CkBAIC0xMzk2LDYgKzE0 MDIsMTMgQEAKICAgICAgICAgICAgICAgICAgICAgICAgIHRyaWUtPmNoYXJtYXBbIGZvbGRlclsg dXZjIF0gXSA9IHRyaWUtPmNoYXJtYXBbIHV2YyBdOwogICAgICAgICAgICAgICAgICAgICBUUklF X1NUT1JFX1JFVkNIQVI7CiAgICAgICAgICAgICAgICAgfQorICAgICAgICAgICAgICAgIGlmICgg c2V0X2JpdCApIHsKKyAgICAgICAgICAgICAgICAgICAgLyogc3RvcmUgdGhlIGNvZGVwb2ludCBp biB0aGUgYml0bWFwLCBhbmQgaWYgaXRzIGFzY2lpCisgICAgICAgICAgICAgICAgICAgICAgIGFs c28gc3RvcmUgaXRzIGZvbGRlZCBlcXVpdmVsZW50LiAqLworICAgICAgICAgICAgICAgICAgICBU UklFX0JJVE1BUF9TRVQodHJpZSx1dmMpOworICAgICAgICAgICAgICAgICAgICBpZiAoIGZvbGRl ciApIFRSSUVfQklUTUFQX1NFVCh0cmllLGZvbGRlclsgdXZjIF0pOyAgICAgICAgICAgIAorICAg ICAgICAgICAgICAgICAgICBzZXRfYml0ID0gMDsgLyogV2UndmUgZG9uZSBvdXIgYml0IDotKSAq LworICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgIH0gZWxzZSB7CiAgICAgICAgICAgICAg ICAgU1YqKiBzdnBwOwogICAgICAgICAgICAgICAgIGlmICggIXdpZGVjaGFybWFwICkKQEAgLTQw NTIsMTYgKzQwNjUsMTggQEAKICAgICBpZiAoZXhwID09IE5VTEwpCiAJRkFJTCgiTlVMTCByZWdl eHAgYXJndW1lbnQiKTsKIAotICAgIFJFeENfdXRmOCA9IHBtLT5vcF9wbWR5bmZsYWdzICYgUE1k Zl9DTVBfVVRGODsKLQotICAgIFJFeENfcHJlY29tcCA9IGV4cDsKKyAgICBSRXhDX3V0ZjggPSBS RXhDX29yaWdfdXRmOCA9IHBtLT5vcF9wbWR5bmZsYWdzICYgUE1kZl9DTVBfVVRGODsKKyAgICAK ICAgICBERUJVR19DT01QSUxFX3IoewogICAgICAgICBTViAqZHN2PSBzdl9uZXdtb3J0YWwoKTsK ICAgICAgICAgUkVfUFZfUVVPVEVEX0RFQ0wocywgUkV4Q191dGY4LAotICAgICAgICAgICAgZHN2 LCBSRXhDX3ByZWNvbXAsICh4ZW5kIC0gZXhwKSwgNjApOworICAgICAgICAgICAgZHN2LCBleHAs ICh4ZW5kIC0gZXhwKSwgNjApOwogICAgICAgICBQZXJsSU9fcHJpbnRmKFBlcmxfZGVidWdfbG9n LCAiJXNDb21waWxpbmcgUkV4JXMgJXNcbiIsCiAJCSAgICAgICBQTF9jb2xvcnNbNF0sUExfY29s b3JzWzVdLHMpOwogICAgIH0pOworCityZWRvX2ZpcnN0X3Bhc3M6CisgICAgUkV4Q19wcmVjb21w ID0gZXhwOwogICAgIFJFeENfZmxhZ3MgPSBwbS0+b3BfcG1mbGFnczsKICAgICBSRXhDX3Nhd2Jh Y2sgPSAwOwogCkBAIC00MTAwLDYgKzQxMTUsMjQgQEAKIAlSRXhDX3ByZWNvbXAgPSBOVUxMOwog CXJldHVybihOVUxMKTsKICAgICB9CisgICAgaWYgKFJFeENfdXRmOCAmJiAhUkV4Q19vcmlnX3V0 ZjgpIHsKKyAgICAgICAgLyogSXRzIHBvc3NpYmxlIHRvIHdyaXRlIGEgcmVnZXhwIGluIGFzY2lp IHRoYXQgcmVwcmVzZW50cyB1bmljb2RlIAorICAgICAgICBjb2RlcG9pbnRzIG91dHNpZGUgb2Yg dGhlIGJ5dGUgcmFuZ2UsIHN1Y2ggYXMgdmlhIFx4ezEwMH0uIElmIHdlIAorICAgICAgICBkZXRl Y3Qgc3VjaCBhIHNlcXVlbmNlIHdlIGhhdmUgdG8gY29udmVydCB0aGUgZW50aXJlIHBhdHRlcm4g dG8gdXRmOAorICAgICAgICBhbmQgdGhlbiByZWNvbXBpbGUsIGFzIG91ciBzaXppbmcgY2FsY3Vs YXRpb24gd2lsbCBoYXZlIGJlZW4gYmFzZWQKKyAgICAgICAgb24gMSBieXRlID09IDEgY2hhcmFj dGVyLCBidXQgd2Ugd2lsbCBuZWVkIHRvIHVzZSB1dGY4IHRvIGVuY29kZQorICAgICAgICBhdCBs ZWFzdCBzb21lIHBhcnQgb2YgdGhlIHBhdHRlcm4sIGFuZCB0aGVyZWZvcmUgbXVzdCBjb252ZXJ0 IHRoZSB3aG9sZQorICAgICAgICB0aGluZy4gCisgICAgICAgIFhYWDogc29tZWhvdyBmaWd1cmUg b3V0IGhvdyB0byBtYWtlIHRoaXMgbGVzcyBleHBlbnNpdmUuLi4gCisgICAgICAgIC0tIGRtcSAq LworICAgICAgICBTVFJMRU4gbGVuID0geGVuZC1leHA7CisgICAgICAgIERFQlVHX1BBUlNFX3Io UGVybElPX3ByaW50ZihQZXJsX2RlYnVnX2xvZywgIlVURjggbWlzbWF0Y2ghIENvbnZlcnRpbmcg dG8gdXRmOCBmb3IgcmVzaXppbmcgYW5kIGNvbXBpbGVcbiIpKTsgICAgCisgICAgICAgIChVOCop ZXhwPVBlcmxfYnl0ZXNfdG9fdXRmOChhVEhYXyAoVTgqKWV4cCwgJmxlbik7CisgICAgICAgIHhl bmQgPSBleHAgKyBsZW47CisgICAgICAgIFJFeENfb3JpZ191dGY4ID0gUkV4Q191dGY4OworICAg ICAgICBTQVZFRlJFRVBWKGV4cCk7CisgICAgICAgIGdvdG8gcmVkb19maXJzdF9wYXNzOworICAg IH0KICAgICBERUJVR19QQVJTRV9yKHsKICAgICAgICAgUGVybElPX3ByaW50ZihQZXJsX2RlYnVn X2xvZywgCiAgICAgICAgICAgICAiUmVxdWlyZWQgc2l6ZSAlIklWZGYiIG5vZGVzXG4iCkBAIC00 OTU2LDcgKzQ5ODksNiBAQAogICAgIEdFVF9SRV9ERUJVR19GTEFHU19ERUNMOwogICAgIERFQlVH X1BBUlNFKCJyZWcgIik7CiAKLQogICAgICpmbGFncCA9IDA7CQkJCS8qIFRlbnRhdGl2ZWx5LiAq LwogCiAKQEAgLTU3OTYsNiArNTgyOCw3IEBACiAgICAgSTMyIGZsYWdzID0gMCwgYyA9IDA7CiAg ICAgR0VUX1JFX0RFQlVHX0ZMQUdTX0RFQ0w7CiAgICAgREVCVUdfUEFSU0UoImJybmMiKTsKKyAg ICAgICAgCiAgICAgaWYgKGZpcnN0KQogCXJldCA9IE5VTEw7CiAgICAgZWxzZSB7CkluZGV4OiBE Oi9kZXYvcGVybC92ZXIvem9yby9wb2QvcGVybHJlZ3V0cy5wb2QKPT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0gRDov ZGV2L3BlcmwvdmVyL3pvcm8vcG9kL3BlcmxyZWd1dHMucG9kCShyZXZpc2lvbiAxMjczKQorKysg RDovZGV2L3BlcmwvdmVyL3pvcm8vcG9kL3BlcmxyZWd1dHMucG9kCShyZXZpc2lvbiAxMjc2KQpA QCAtMTAyOCw2ICsxMDI4LDE3IEBACiBwbSBhbmQgcmV0dXJuIGEgcG9pbnRlciB0byBhIHByZXBh cmVkIHJlZ2V4cCBzdHJ1Y3R1cmUgdGhhdCBjYW4gcGVyZm9ybQogdGhlIG1hdGNoLgogCitUaGUg dXRmOCduZXNzIG9mIHRoZSBzdHJpbmcgY2FuIGJlIGZvdW5kIGJ5IHRlc3RpbmcKKworICAgcG0t Pm9wX3BtZHluZmxhZ3MgJiBQTWRmX0NNUF9VVEY4CisgICAKK0FkZGl0aW9uYWwgdmFyaW91cyBm bGFncyByZWZsZWN0aW5nIHRoZSBtb2RpZmllcnMgdXNlZCBhcmUgY29udGFpbmVkIGluCisKKyAg IHBtLT5vcF9wbWZsYWdzICAgCisgICAKK3NvbWUgb2YgdGhlc2UgaGF2ZSBleGFjdCBlcXVpdmVs ZW50cyBpbiByZS0+ZXh0ZmxhZ3MuIFNlZSByZWdjb21wLmggYW5kIG9wLmgKK2ZvciBkZXRhaWxz IG9mIHRoZXNlIHZhbHVlcy4gICAKKwogPWl0ZW0gZXhlYwogCiAgICAgSTMyIGV4ZWMocmVnZXhw KiBwcm9nLApJbmRleDogRDovZGV2L3BlcmwvdmVyL3pvcm8vdC9vcC9wYXQudAo9PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 Ci0tLSBEOi9kZXYvcGVybC92ZXIvem9yby90L29wL3BhdC50CShyZXZpc2lvbiAxMjczKQorKysg RDovZGV2L3BlcmwvdmVyL3pvcm8vdC9vcC9wYXQudAkocmV2aXNpb24gMTI3NikKQEAgLTQzMTYs NiArNDMxNiwxNiBAQAogICAgICAgICAiQ2hlY2sgdGhhdCAoPyYuLikgdG8gYSBidWZmZXIgaW5z aWRlIGEgKD98Li4uKSBnb2VzIHRvIHRoZSBsZWZ0bW9zdCIpOwogfQogCit7CisgICAgdXNlIHdh cm5pbmdzOworICAgIGxvY2FsICRNZXNzYWdlID0gIkFTQ0lJIHBhdHRlcm4gdGhhdCByZWFsbHkg aXMgdXRmOCI7CisgICAgbXkgQHc7CisgICAgbG9jYWwgJFNJR3tfX1dBUk5fX309c3Vie3B1c2gg QHcsIkBfIn07CisgICAgbXkgJGM9cXEoXHh7REZ9KTsgCisgICAgb2soJGM9fi8ke2N9fFx4ezEw MH0vKTsKKyAgICBvayhAdz09MCk7Cit9ICAgIAorCiAjIFRlc3QgY291bnRlciBpcyBhdCBib3R0 b20gb2YgZmlsZS4gUHV0IG5ldyB0ZXN0cyBhYm92ZSBoZXJlLgogIy0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KICMgS2Vl cCB0aGUgZm9sbG93aW5nIHRlc3RzIGxhc3QgLS0gdGhleSBtYXkgY3Jhc2ggcGVybApAQCAtNDM4 NSw3ICs0Mzk1LDcgQEAKIGlzZXEoMCskOjp0ZXN0LCQ6OlRlc3RDb3VudCwiR290IHRoZSByaWdo dCBudW1iZXIgb2YgdGVzdHMhIik7CiAjIERvbid0IGZvcmdldCB0byB1cGRhdGUgdGhpcyEKIEJF R0lOIHsKLSAgICAkOjpUZXN0Q291bnQgPSAxNjUwOworICAgICQ6OlRlc3RDb3VudCA9IDE2NTI7 CiAgICAgcHJpbnQgIjEuLiQ6OlRlc3RDb3VudFxuIjsKIH0KIApJbmRleDogRDovZGV2L3Blcmwv dmVyL3pvcm8vcmVnZXhlYy5jCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIEQ6L2Rldi9wZXJsL3Zlci96b3JvL3Jl Z2V4ZWMuYwkocmV2aXNpb24gMTI3MykKKysrIEQ6L2Rldi9wZXJsL3Zlci96b3JvL3JlZ2V4ZWMu YwkocmV2aXNpb24gMTI3NikKQEAgLTE5OTcsNyArMTk5Nyw3IEBACiAJICAgIFNWICogY29uc3Qg cHJvcCA9IHN2X25ld21vcnRhbCgpOwogCSAgICByZWdwcm9wKHByb2csIHByb3AsIGMpOwogCSAg ICB7Ci0JCVJFX1BWX1FVT1RFRF9ERUNMKHF1b3RlZCxVVEYsUEVSTF9ERUJVR19QQURfWkVSTygx KSwKKwkJUkVfUFZfUVVPVEVEX0RFQ0wocXVvdGVkLGRvX3V0ZjgsUEVSTF9ERUJVR19QQURfWkVS TygxKSwKIAkJICAgIHMsc3RyZW5kLXMsNjApOwogCQlQZXJsSU9fcHJpbnRmKFBlcmxfZGVidWdf bG9nLAogCQkgICAgIk1hdGNoaW5nIHN0Y2xhc3MgJS4qcyBhZ2FpbnN0ICVzICglZCBjaGFycylc biIsCg== ------=_Part_157968_2041415.1174351234197--
![]() |
0 |
![]() |
On 20/03/07, demerphq <demerphq@gmail.com> wrote: > As a bonus this patch includes two bug fixes which I came across while > working out the utf8 encoding problem. One is for the trie code > charclass logic which was doing the wrong thing under utf8 and the > other was in some debugging output code that was using the wrong utf8 > flag. Thanks, applied as #30647, although I removed an assignment to cast, which might be illegal for some setups: (U8*)exp=Perl_bytes_to_utf8(aTHX_ (U8*)exp, &len); replaced by exp = (char*)Perl_bytes_to_utf8(aTHX_ (U8*)exp, &len);
![]() |
0 |
![]() |
On 3/20/07, Rafael Garcia-Suarez <rgarciasuarez@gmail.com> wrote: > On 20/03/07, demerphq <demerphq@gmail.com> wrote: > > As a bonus this patch includes two bug fixes which I came across while > > working out the utf8 encoding problem. One is for the trie code > > charclass logic which was doing the wrong thing under utf8 and the > > other was in some debugging output code that was using the wrong utf8 > > flag. > > Thanks, applied as #30647, although I removed an assignment to cast, > which might be illegal for some setups: > > (U8*)exp=Perl_bytes_to_utf8(aTHX_ (U8*)exp, &len); > > replaced by > > exp = (char*)Perl_bytes_to_utf8(aTHX_ (U8*)exp, &len); > Gah, I did that again eh, thats the second time. Sorry. :-( But cheers for applying the patch. :-) Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
![]() |
0 |
![]() |