Duplicate strings in mozilla.org project

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--zbqZA8PJRDuvrTNQ6MI7YBUf7SJHD5QSU
Content-Type: multipart/mixed; boundary="4nmHdiqviW5TYAkrdit3kVlvz1qHgMAjJ";
 protected-headers="v1"
From: Merike Sell <merikes@gmail.com>
Newsgroups: mozilla.dev.l10n.web
Subject: Duplicate strings in mozilla.org project

--4nmHdiqviW5TYAkrdit3kVlvz1qHgMAjJ
Content-Type: text/plain; charset=utf-8
Content-Language: en-GB
Content-Transfer-Encoding: quoted-printable

Hi

Just wanted to ask if it's just me getting a bit annoyed or what the
story is with this one.

I see repeating strings in whatsnew files for example. Is there a
technical limitation somewhere or could these be in a shared file
instead? et doesn't use Pontoon for www translations but for example the
"Congrats! You=E2=80=99re using the latest version of Firefox." didn't sh=
ow me
anything in history section before submitting a translation on git
unless it was because for et the project is set to read-only?

Just yesterday I was confused by strings in tracking protection tour
where the only difference was the ending quotes character used in the
string? Understandably locales might want to not follow English
quotation style but could we at least have a localization note in such
cases stating that only quotation was changed? Does Pontoon give
suggestions for cases when only quotation changed? It's definitely hard
to see that in a file when related string are not sequential and no note
is present either.

In short can the situation be improved such that less duplicate work is
needed and it's easier to notice strings being very similar to already
translated ones?

Best,
Merike



--4nmHdiqviW5TYAkrdit3kVlvz1qHgMAjJ--

--zbqZA8PJRDuvrTNQ6MI7YBUf7SJHD5QSU
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCAAGBQJbuNuNAAoJEEY/c+SeUHPya+kQAInRCqhJkifNl8I51gfLeYu3
i4Y/6Tr1pISflciz0AEYwjalXa5IxDCEITyzyLENhD0BFCcdD/5a+V1ukvLccFAQ
39l8hw6zcdSsjg+RE1FksFu6Ulxjq0EZBdInrIO4MiTsnU89C9WHWAfAMO/ipG7H
Fej4Dj2qpkvAN6hzq5W8HFK9QU+TQh7crkN7GvMEe93u0BzJlzxV3OgHgeYmdOkA
9w3mh7gdLxcZOrWAvMh3eP0cWJm5I1cJFyouHjecGYTPJ15Fl7PhEC/0+LCQjxYn
51VS1IUHl2gbpuv3l/ANAawUQm2RaP87vMYxN/yBBcbglULVHpUf+7k189E3Fy2F
d1VOv86BLk2LyjmFHnY6fx1BYbIYXygTphDjwe56lnMjBc9TA8Jm/PhXqLeCwp8a
36Br4BXbjMSmlJdFrBuQW7tXlT1/ik+VG5A1ONwrHkAAt3AjYeD6ZD97h9j78GhQ
kBxb8QpxB2Uq/ezAHodyEp8PGmnVEt24r08f/OYYA4YwvVctnLS2VqK3gDBr/rfY
6YIsXMci9wPXhl0AsA/VzNfSMxauW13EivQ+rNzkIFW/n1ztK2YN+OjtfPwfW5T0
BXIqDt9DktrqqV6+ARwnFxHdvCfTFBXkzTaqUjDtvU5Yy7D8vpVrHmVDEO78rLZx
kLfQJE8X8t8lpJF/1t46
=1Ncx
-----END PGP SIGNATURE-----

--zbqZA8PJRDuvrTNQ6MI7YBUf7SJHD5QSU--
0
Merike
10/6/2018 3:57:57 PM
mozilla.dev.l10n.web 1611 articles. 0 followers. Post Follow

3 Replies
12 Views

Similar Articles

[PageSpeed] 56

Hello Merike,

Thank you for sharing your thoughts on the current translation flow and
some issues you encountered.

On Sat, Oct 6, 2018 at 9:00 AM Merike Sell <merikes@gmail.com> wrote:

> Hi
>
> Just wanted to ask if it's just me getting a bit annoyed or what the
> story is with this one.
>
> I see repeating strings in whatsnew files for example. Is there a
> technical limitation somewhere or could these be in a shared file
> instead?

We do have a few shared files, some are for all files and others are under
a subcategory. We try to limit the number of strings strings in a shared
file such as main.lang because it is very big as of now.

et doesn't use Pontoon for www translations but for example the
> "Congrats! You=E2=80=99re using the latest version of Firefox." didn't sh=
ow me
> anything in history section before submitting a translation on git
> unless it was because for et the project is set to read-only?
>
You should be able to see same or similar strings under the "Machinery"
tab. "History" shows the reiterations of one string and that one string
only. Machinery will pull matches and fuzzy matches from different sources.

>
> Just yesterday I was confused by strings in tracking protection tour
> where the only difference was the ending quotes character used in the
> string? Understandably locales might want to not follow English
> quotation style but could we at least have a localization note in such
> cases stating that only quotation was changed?

Even the slightest change, punctuation or cases (cap or lower) will trigger
a new string. Localized strings can remain unchanged depending on the
language. However, it does require a new translation, even though it is the
same as before. Copy and paste will do for et.

Does Pontoon give
> suggestions for cases when only quotation changed? It's definitely hard
> to see that in a file when related string are not sequential and no note
> is present either.
>
Under Machinery tab, you should find match or fuzzy match string(s).


>
> In short can the situation be improved such that less duplicate work is
> needed and it's easier to notice strings being very similar to already
> translated ones?
>
Understood. We used to run a script to update strings already exist in
localized in other files. Given the timezones the localizers represent
globally, it is tricky to catch a window when no one is working on a
localized file or there would be conflicts. With the feature improvement in
Pontoon, Machinery helps solve the problem. If you want to reuse the same
string, you can click on it and it will fill the translation field. In et
case, you can copy and paste it to git.



> Best,
> Merike
>
>
> _______________________________________________
> dev-l10n-web mailing list
> dev-l10n-web@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-l10n-web
>
0
Peiying
10/7/2018 5:57:50 AM
Am 06.10.18 um 17:57 schrieb Merike Sell:
> Hi
> 
> Just wanted to ask if it's just me getting a bit annoyed or what the
> story is with this one.
> 
> I see repeating strings in whatsnew files for example. Is there a
> technical limitation somewhere or could these be in a shared file
> instead? et doesn't use Pontoon for www translations but for example the
> "Congrats! You’re using the latest version of Firefox." didn't show me
> anything in history section before submitting a translation on git
> unless it was because for et the project is set to read-only?

Yes, there are technical limitations and also process challenges.

The code that extracts strings from django templates into, eventually, 
..lang files doesn't deal well with shared files. The technical 
background of this is non-beautiful. Aka, we use tools that are built 
for gettext, and then transform that over to .lang and explicit files. 
We have https://bugzilla.mozilla.org/show_bug.cgi?id=1486686 on file, 
among other bugs. See the tracker that blocks.

On the process side, putting strings into shared files poses some 
challenges.

Firstly, you need to know that you'll actually use this string often. 
For boilerplate that's added to a couple of pages at the same time, 
that's easy to know. For pages that will be re-created in the future 
like whatsnew, that's quite a bit harder.

Then, you have the old gettext problem, when does the en-US content 
change context, and how do you add one? On mozilla.org, we're trying to 
use individual files for individual pages as much as possible, and 
they're providing the context. As long as you don't have duplicate 
strings on the same page, I guess.

And then there's the question about what should happen if we change the 
wording around a particular phrase in a localization. Should we actually 
update old whatsnew pages, or just use a new phrasing for pages going 
forward? I bet there are good examples for both.

> Just yesterday I was confused by strings in tracking protection tour
> where the only difference was the ending quotes character used in the
> string? Understandably locales might want to not follow English
> quotation style but could we at least have a localization note in such
> cases stating that only quotation was changed? Does Pontoon give
> suggestions for cases when only quotation changed? It's definitely hard
> to see that in a file when related string are not sequential and no note
> is present either.

Generally, we assume that localizers have tools with translation memory 
(TM). TM helps in quite a few ways to make translation quality better, 
as it encourages consistency. But it also allows us to not over-rotate 
on technical solutions that try to avoid any duplicate translation effort.

In the past, I wrote editor hooks for that, talking to transvision APIs. 
But I haven't looked into that in a couple of years now.

> In short can the situation be improved such that less duplicate work is
> needed and it's easier to notice strings being very similar to already
> translated ones?

I think it's OK to expect the developers to come up with comments to 
describe the string. OTH, describing change management in ways that 
they're not confusing to many localizers poses a challenge. I don't 
think I'd want to burden our web devs with that. Blocking each landing 
on our l10n project managers to make an assessment on what's in many 
localizations wouldn't scale either, I think.

As for duplicate work, I think that more often than not, string re-use 
leads to bugs, and duplication is a feature.

Axel

> Best,
> Merike
> 
> 

0
Axel
10/8/2018 2:07:19 PM
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--leAOxwDMdy74m6lCinebSlUqU6Qjt94oO
Content-Type: multipart/mixed; boundary="DrrVVKYzFjRLTPNSovOaM3xOJIlRsNJ7y";
 protected-headers="v1"
From: Merike Sell <merikes@gmail.com>
Newsgroups: mozilla.dev.l10n.web
Subject: Re: Duplicate strings in mozilla.org project
References: <18WdnRMhvMYSRiXGnZ2dnUU7-dHNnZ2d@mozilla.org>
 <eL6dnX6SJL0F-SbGnZ2dnUU7-QnNnZ2d@mozilla.org>
In-Reply-To: <eL6dnX6SJL0F-SbGnZ2dnUU7-QnNnZ2d@mozilla.org>

--DrrVVKYzFjRLTPNSovOaM3xOJIlRsNJ7y
Content-Type: text/plain; charset=utf-8
Content-Language: et
Content-Transfer-Encoding: quoted-printable

I see. This sounds very much like I'll end up writing myself an utility
script for such cases that text file translation and grepping doesn't
easily cover. That is if I can find the time to do it.

I just don't see how Pontoon would be more efficient use of my time in
situations where string difference is big enough for TM to not kick in
and navigating between untranslated strings for a single project and all
strings across projects using the same term/phrase is tedious to say the
least. And that sort of term search is very much a necessity when you're
used to the situation that a single quick grep in a folder will give you
near immediate answers. Not to mention the usefulness of having all
related strings fully visible at the same time and easy to copy-paste
and navigate (by keyboard) in a text editor with no network delays.

Merike

08.10.18 17:07 Axel Hecht kirjutas:
> Am 06.10.18 um 17:57 schrieb Merike Sell:
>> Hi
>>
>> Just wanted to ask if it's just me getting a bit annoyed or what the
>> story is with this one.
>>
>> I see repeating strings in whatsnew files for example. Is there a
>> technical limitation somewhere or could these be in a shared file
>> instead? et doesn't use Pontoon for www translations but for example t=
he
>> "Congrats! You=E2=80=99re using the latest version of Firefox." didn't=
 show me
>> anything in history section before submitting a translation on git
>> unless it was because for et the project is set to read-only?
>=20
> Yes, there are technical limitations and also process challenges.
>=20
> The code that extracts strings from django templates into, eventually,
> ..lang files doesn't deal well with shared files. The technical
> background of this is non-beautiful. Aka, we use tools that are built
> for gettext, and then transform that over to .lang and explicit files.
> We have https://bugzilla.mozilla.org/show_bug.cgi?id=3D1486686 on file,=

> among other bugs. See the tracker that blocks.
>=20
> On the process side, putting strings into shared files poses some
> challenges.
>=20
> Firstly, you need to know that you'll actually use this string often.
> For boilerplate that's added to a couple of pages at the same time,
> that's easy to know. For pages that will be re-created in the future
> like whatsnew, that's quite a bit harder.
>=20
> Then, you have the old gettext problem, when does the en-US content
> change context, and how do you add one? On mozilla.org, we're trying to=

> use individual files for individual pages as much as possible, and
> they're providing the context. As long as you don't have duplicate
> strings on the same page, I guess.
>=20
> And then there's the question about what should happen if we change the=

> wording around a particular phrase in a localization. Should we actuall=
y
> update old whatsnew pages, or just use a new phrasing for pages going
> forward? I bet there are good examples for both.
>=20
>> Just yesterday I was confused by strings in tracking protection tour
>> where the only difference was the ending quotes character used in the
>> string? Understandably locales might want to not follow English
>> quotation style but could we at least have a localization note in such=

>> cases stating that only quotation was changed? Does Pontoon give
>> suggestions for cases when only quotation changed? It's definitely har=
d
>> to see that in a file when related string are not sequential and no no=
te
>> is present either.
>=20
> Generally, we assume that localizers have tools with translation memory=

> (TM). TM helps in quite a few ways to make translation quality better,
> as it encourages consistency. But it also allows us to not over-rotate
> on technical solutions that try to avoid any duplicate translation effo=
rt.
>=20
> In the past, I wrote editor hooks for that, talking to transvision APIs=
=2E
> But I haven't looked into that in a couple of years now.
>=20
>> In short can the situation be improved such that less duplicate work i=
s
>> needed and it's easier to notice strings being very similar to already=

>> translated ones?
>=20
> I think it's OK to expect the developers to come up with comments to
> describe the string. OTH, describing change management in ways that
> they're not confusing to many localizers poses a challenge. I don't
> think I'd want to burden our web devs with that. Blocking each landing
> on our l10n project managers to make an assessment on what's in many
> localizations wouldn't scale either, I think.
>=20
> As for duplicate work, I think that more often than not, string re-use
> leads to bugs, and duplication is a feature.
>=20
> Axel
>=20
>> Best,
>> Merike
>>
>>
>=20



--DrrVVKYzFjRLTPNSovOaM3xOJIlRsNJ7y--

--leAOxwDMdy74m6lCinebSlUqU6Qjt94oO
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCAAGBQJbvN7OAAoJEEY/c+SeUHPy1fYQALg4Wz9lrz8lrh9Ii7gHbTsN
aolmpLmvfH11rrTElOnzbDwtzjVfVhwAIPtnPmIr1wheABO/Cew7DjbBdfGUDhVl
nfmFvoUxDJSXiiLXY8NDOpLx/nuIEnmijd1TZ7AuY1OTqqvWwfD6xcDQHXjt7Fva
xwO9L/tZvGhHtnINlOrt6z8ZDM7Zi3YIHQ1uDwSob7OplXzjtBfxsOXGWd79HpmK
faLDFA5v4yqnTzd2j1eQWJScVNd5NEQqFJR40sjEMOE76/Me8C3j/Z2yb2POwkQ0
354OEh6mLWB8uQCjeaPtm4fr0RqfU9zGpO84de7c8yrxCCJTGrzID+Anf5/U47be
IvazvUe+/WfPosin/woTv493x/QBSolRJ5adSRpqHkAQLTZiyD/+elUNe52vyZ1e
XM1KbHd5K2wg9+f0b+beFQ446sTby92yGTcxlch/goayxfnBKbY8bym1hNY5L9qe
hxgjt73CST8+zo1SPNfgixjJouotILgGquYcWIsH66RrGuHkblgfPXMX5ySyL3eb
Kb3IvXk4r3I1Utv/yJ4N1Xy5ZCQ3+gk+XQJAfNwfw/vuXifMYukWgLCx6oKmVT7K
XNM4+iRk9usQH3sx1itCW1GXM8SkYckQd9kfrA2srAU2dDQep9yQsivrdhlOf4Tw
iOO9p75Or1bqE09K7Fkb
=rEkQ
-----END PGP SIGNATURE-----

--leAOxwDMdy74m6lCinebSlUqU6Qjt94oO--
0
Merike
10/9/2018 5:00:54 PM
Reply: