The web-platform-tests testsuite has just landed on Mozilla-Central. It is an import of a testsuite collated by the W3C [1], which we intend to keep up-to-date with upstream. The tests are located in /testing/web-platform/tests/ and are now running in automation. Initially the testsuite, excluding the reftests, is running on Linux 64 opt builds only. If it doesn't cause problems there it will be rolled out to other configurations, once we are confident they will be equally stable. The jobs indicated on tbpl and treeherder by the symbols W1-W4. The reftests will be Wr once they are enabled. == How does this affect me? == Because web-platform-tests is imported from upstream we can't make assumptions like "all tests will pass". Instead we explicitly store the expected result of every test that doesn't just pass in an ini-like file with the same name as the test and a .ini suffix in /testing/web-platform/meta/. If you make a change that affects the result of a web-platform-test you need to update the expected results or the testsuite will go orange. Instructions for performing the updates are in the README file [2]. There is tooling available to help in the update process. == OK, so how do I run the tests? == Locally, using mach: mach web-platform-tests or, to run only a subset of tests: mach web-platform-tests --include=dom/ To run multiple tests at once (at the expense of undefined ordering and greater in-determinism), use the --processes=N option. The tests are also available on Try; the trychooser syntax is -u web-platform-tests Individual chunks can also be run, much like for mochitest. It's also possible to just start the web server and load tests into the browser, as long as you add the appropriates entries to your hosts file. These are documented in the web-platform-tests README file [3]. Once these are added running python serve.py in testing/web-platform/tests will start the server and allow the tests to be loaded from http://web-platform.test:8000. == What does it mean if the tests are green? == It means that there are no "unexpected" results. These expectations are set based on the existing behaviour of the browser. Every time the tests are updated the expectations will be updated to account for changes in the tests. It does *not* mean that there are no tests that fail. Indeed there may be tests that have even worse behaviour like hanging or crashing; as long as the behaviour is stable, the test will remain enabled (this can ocassionally have somewhat wonky interaction with the tbpl UI. When looking at jobs, unexpected results always start TEST-UNEXPECTED-). So far I haven't spent any time filing bugs about issues found by the tests, but there is a very basic report showing those that didn't pass at [4]. I am very happy to work with people with some insight into what bugs have already been filed to get new issues into Bugzilla. I will also look at making a continually updated HTML report. In the longer term I am hopeful that this kind of reporting can become part of the Treeherder UI so it's easy to see not just where we have unexpected results but also where there are expected failures indicating buggy code. == What kinds of things are covered by these tests? == web-platform-tests is, in theory, open to any tests for web technologies. In practice most of the tests cover technologies in the WHATWG/W3C stable e.g. HTML, DOM, various WebApps specs, and so on. The notable omission is CSS; for historical reasons the CSS tests are still in their own repository. Convergence here is a goal for the future. == We already have mochitests; why are we adding a new testsuite? == Unlike mochitests, web-platform-tests are designed to work in any browser. This means that they aren't just useful for avoiding regressions in Gecko, but also for improving cross-browser interop; when developing features we can run tests that other implementers have written, and they can run tests we have written. This will allow us to detect compatibility problems early in a feature's life-cycle, before they have the chance to become a source of frustration for authors. With poor browser compatibility being one of the main complaints about developing for the web, improvements in this area are critical for the ongoing success of the platform. == So who else is running the web-platform-tests? == * Blink run some of the tests in CI ([5] and various other locations scattered though their tree) * The Servo project are running all the tests for spec areas they have implemented in CI [6] * Microsoft have an Internet-Explorer compatible version of the test runner. In addition we are using web-platform-tests as one component of the FirefoxOS certification suite. The harness [7] we are using for testing Gecko is browser-agnostic so it's possible to experiment with running tests in other browsers. In particular it supports Firefox OS, Servo and Chrome, and Microsoft have patches to support IE. Adding support for other browsers supporting some sort of remote-control protocol (e.g. WebDriver) should be straigtforward. == Does this mean I should be writing web-platform-tests == Yes. When we are implementing web technologies, writing cross-browser tests is generally better than writing proprietary tests. Having tests that multiple vendors run helps advance the mission, by providing a concrete way of assessing spec conformance and improving interop. It also provides short term wins since we will discover compatibility issues closer to the time that the code is originally written, rather than having to investigate broken sites later on. This also applies to other vendors of course; by encouraging them to run tests that we have written they are less likely to introduce bugs that manifest as compatibility issues which, in the worst case, lead to us having to "fix" our implementation to match their mistakes. But. At the moment, the process for interacting with web-platform-tests requires direct submission to the upstream GitHub repository. In the near future this workflow will be improved by adding a directory for local modifications or additions to web-platform-tests in the Mozilla tree (e.g. testing/web-platform/local). Once landed in m-c any tests here will automatically be pushed upstream during the next web-platform-tests sync (as long as the test has r+ in Bugzilla it doesn't need to be reviewed again to land upstream). This, combined with the more limited featureset and platform coverage of web-platform-tests compared to mochitest, means that this email is explicitly *not* a call to change any policy around test formats at this time. == I'm feeling virtuous! Where's the documentation for writing tests? == The main documentation is at Test The Web Forward [8]. I am in the process of updating this to be more current; for now the most up to date documentation is in my fork of the website at [9]. This will be merged upstream in the near future. For tests that require server-side logic web-platform-tests uses a custom python-based server which allows test-specific behaviour through simple .py files. Documentation for this is found at [10]. If you have any questions, feel free to ask me. == How do I write tests that require non-web-exposed APIs? == One of the disadvantages of cross-browser testing is that you are limited to APIs that work in multiple browsers. This means that tests in web-platform-tests can't use e.g. SpecialPowers. For anything requiring this you will have to write a mochitest like today. In the future we plan to integrate WebDriver support into web-platform-tests which will make some privileged operations, and simulation of user interaction with the content area, possible. == You didn't answer my question! == If you have any further questions I'm very happy to answer them, either here, by email or on irc (#ateam on the Mozilla server or #testing on irc.mozilla.org). [1] https://github.com/w3c/web-platform-tests/ [2] https://hg.mozilla.org/mozilla-central/file/tip/testing/web-platform/README.md (or formatted https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/README.md) [3] https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/README.md [4] http://hoppipolla.co.uk/web-platform-tests/gecko_failures_2014-08-28.html [5] https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/LayoutTests/w3c/web-platform-tests/ [6] https://travis-ci.org/servo/servo/ (see the AFTER_BUILD=wpt jobs) [7] http://wptrunner.readthedocs.org/en/latest/ [8] http://testthewebforward.org [9] http://jgraham.github.io/docs/ [10] http://wptserve.readthedocs.org/en/latest/
![]() |
0 |
![]() |
On 9/5/14, 11:55 AM, James Graham wrote: > The web-platform-tests testsuite has just landed on > Mozilla-Central. This is fantastic. Thank you! Does this obsolete our existing "imptests" tests, or is this a set of tests disjoint from those? -Boris
![]() |
0 |
![]() |
On 05/09/14 18:00, Boris Zbarsky wrote: > On 9/5/14, 11:55 AM, James Graham wrote: >> The web-platform-tests testsuite has just landed on >> Mozilla-Central. > > This is fantastic. Thank you! > > Does this obsolete our existing "imptests" tests, or is this a set of > tests disjoint from those? I think Ms2ger has a better answer here, but I believe it obsoletes most of them, except a few that never got submitted to web-platform-tests (the editing tests are in that class, because the spec effort sort of died). I've filed bug 1063632 to remove the imptests once we have better platform coverage from web-platform-tests.
![]() |
0 |
![]() |
On 9/5/14, 11:55 AM, James Graham wrote: > Instructions for performing the updates are in the README file > [2]. There is tooling available to help in the update process. Is there a way to document the spec or test suite bugs in the expectations file? e.g. if I want to add an "expected: FAIL" and link to https://github.com/w3c/web-platform-tests/issues/1223 as an explanation for why exactly we're failing it. -Boris
![]() |
0 |
![]() |
On Fri, Sep 5, 2014 at 8:23 PM, James Graham <james@hoppipolla.co.uk> wrote: > I think Ms2ger has a better answer here, but I believe it obsoletes most > of them, except a few that never got submitted to web-platform-tests > (the editing tests are in that class, because the spec effort sort of died). FWIW, the editing tests are still very useful for regression-testing. They often catch unintended behavior changes when changing editor code, just because they test quite a lot of code paths. I think it would be very valuable for web-platform-tests to have a section for "tests we don't know are even vaguely correct, so don't try to use them to improve your conformance, but they're useful for regression testing anyway." That might not help interop, but it will help QoI, and it makes sense for browsers to share in that department as well. (This is leaving aside the fact that the editing tests are pathologically large and should be chopped up into a lot of smaller files. I have a vague idea to do this someday. They would also benefit from only being run by the Mozilla testing framework on commits that actually touch editor/, because it's very unlikely that they would be affected by code changes elsewhere that don't fail other tests as well. I think.)
![]() |
0 |
![]() |
On 06/09/14 05:05, Boris Zbarsky wrote: > On 9/5/14, 11:55 AM, James Graham wrote: >> Instructions for performing the updates are in the README file >> [2]. There is tooling available to help in the update process. > > Is there a way to document the spec or test suite bugs in the > expectations file? e.g. if I want to add an "expected: FAIL" and link > to https://github.com/w3c/web-platform-tests/issues/1223 as an > explanation for why exactly we're failing it. There isn't anything at the moment, but it seems like a good idea to invent something. The easiest thing would be a new key-value pair like expected-reason: Some reason string Do you have a preferred syntax here?
![]() |
0 |
![]() |
On 07/09/14 12:34, Aryeh Gregor wrote: > On Fri, Sep 5, 2014 at 8:23 PM, James Graham <james@hoppipolla.co.uk> wrote: >> I think Ms2ger has a better answer here, but I believe it obsoletes most >> of them, except a few that never got submitted to web-platform-tests >> (the editing tests are in that class, because the spec effort sort of died). > > FWIW, the editing tests are still very useful for regression-testing. > They often catch unintended behavior changes when changing editor > code, just because they test quite a lot of code paths. I think it > would be very valuable for web-platform-tests to have a section for > "tests we don't know are even vaguely correct, so don't try to use > them to improve your conformance, but they're useful for regression > testing anyway." That might not help interop, but it will help QoI, > and it makes sense for browsers to share in that department as well. Well, it would also make sense to have interop for editing of course :) I would certainly be in favour of someone pushing those tests through review so that they can land in web-platform-tests, but historically we haven't been that successful in getting review for large submissions where no one is actively working on the code (e.g. [1] which has a lot of tests, mostly written by me, for document loading). I don't really know how to fix that other than say "it's OK to land stuff no one has looked at because we can probably sort it out post-hoc", which has some appeal, but also substantial downsides if no one is making even basic checks for correct usage of the server or for patterns that are known to result in unstable tests. > (This is leaving aside the fact that the editing tests are > pathologically large and should be chopped up into a lot of smaller > files. I have a vague idea to do this someday. They would also > benefit from only being run by the Mozilla testing framework on > commits that actually touch editor/, because it's very unlikely that > they would be affected by code changes elsewhere that don't fail other > tests as well. I think.) In the long term I'm hopeful that we can end up with a much smarter testing system that uses a combination of human input and recorded data to prioritise the tests most likely to break for a given commit. For example a push only changing code in editor/ to Try, with default settings, might first run the editing tests and then, once they passed, run some additional tests from, say dom, or whatever else turns out to be likely to regress for broken patches in the changed code. On inbound a somewhat larger set of tests would run, and then on m-c we'd do a full testrun. Obviously we're a long way from that at the moment, but it's a reasonable thing to aim for and I think that some of the pieces are starting to come together. [1] https://critic.hoppipolla.co.uk/r/282
![]() |
0 |
![]() |
On 9/7/14, 10:21 AM, James Graham wrote: > There isn't anything at the moment, but it seems like a good idea to > invent something. The easiest thing would be a new key-value pair like > > expected-reason: Some reason string > > Do you have a preferred syntax here? Nope. Pretty much anything works for me. -Boris
![]() |
0 |
![]() |
On Sun, Sep 7, 2014 at 5:49 PM, James Graham <james@hoppipolla.co.uk> wrote: > Well, it would also make sense to have interop for editing of course :) Not a single major browser has significant resources invested in working on their editing code. Until that changes, nothing much is going to happen. > I would certainly be in favour of someone pushing those tests through > review so that they can land in web-platform-tests, but historically we > haven't been that successful in getting review for large submissions > where no one is actively working on the code (e.g. [1] which has a lot > of tests, mostly written by me, for document loading). I don't really > know how to fix that other than say "it's OK to land stuff no one has > looked at because we can probably sort it out post-hoc", which has some > appeal, but also substantial downsides if no one is making even basic > checks for correct usage of the server or for patterns that are known to > result in unstable tests. I think unreviewed tests should still be run by browsers' automated testing framework (obviously unless they take too long, are unreliable, etc.). They just shouldn't be counted toward any claims of conformance. Even if the expected values are entirely silly, which they probably aren't, they'll still help regression testing. There's already an external set of tests that Mozilla runs (browserscope) which I think is wrong in a number of its expected results, but it's still been useful for catching regressions in my experience.
![]() |
0 |
![]() |
On 2014-09-08, 6:47 AM, Aryeh Gregor wrote: > On Sun, Sep 7, 2014 at 5:49 PM, James Graham <james@hoppipolla.co.uk> wrote: >> Well, it would also make sense to have interop for editing of course :) > > Not a single major browser has significant resources invested in > working on their editing code. Until that changes, nothing much is > going to happen. > >> I would certainly be in favour of someone pushing those tests through >> review so that they can land in web-platform-tests, but historically we >> haven't been that successful in getting review for large submissions >> where no one is actively working on the code (e.g. [1] which has a lot >> of tests, mostly written by me, for document loading). I don't really >> know how to fix that other than say "it's OK to land stuff no one has >> looked at because we can probably sort it out post-hoc", which has some >> appeal, but also substantial downsides if no one is making even basic >> checks for correct usage of the server or for patterns that are known to >> result in unstable tests. > > I think unreviewed tests should still be run by browsers' automated > testing framework (obviously unless they take too long, are > unreliable, etc.). They just shouldn't be counted toward any claims > of conformance. Even if the expected values are entirely silly, which > they probably aren't, they'll still help regression testing. There's > already an external set of tests that Mozilla runs (browserscope) > which I think is wrong in a number of its expected results, but it's > still been useful for catching regressions in my experience. Yeah, I second this. There is a lot of value in having tests that detect the changes in Gecko's behavior.
![]() |
0 |
![]() |
On 08/09/14 19:42, Ehsan Akhgari wrote: >> I think unreviewed tests should still be run by browsers' automated >> testing framework (obviously unless they take too long, are >> unreliable, etc.). They just shouldn't be counted toward any claims >> of conformance. Even if the expected values are entirely silly, which >> they probably aren't, they'll still help regression testing. There's >> already an external set of tests that Mozilla runs (browserscope) >> which I think is wrong in a number of its expected results, but it's >> still been useful for catching regressions in my experience. > > Yeah, I second this. There is a lot of value in having tests that > detect the changes in Gecko's behavior. Yes, I agree too. One option I had considered was making a suite "web-platform-tests-mozilla" for things that we can't push upstream e.g. because the APIs aren't (yet) undergoing meaningful standardisation. Putting the editing tests into this bucket might make some sense.
![]() |
0 |
![]() |
On 2014-09-09, 8:44 AM, James Graham wrote: > On 08/09/14 19:42, Ehsan Akhgari wrote: >>> I think unreviewed tests should still be run by browsers' automated >>> testing framework (obviously unless they take too long, are >>> unreliable, etc.). They just shouldn't be counted toward any claims >>> of conformance. Even if the expected values are entirely silly, which >>> they probably aren't, they'll still help regression testing. There's >>> already an external set of tests that Mozilla runs (browserscope) >>> which I think is wrong in a number of its expected results, but it's >>> still been useful for catching regressions in my experience. >> >> Yeah, I second this. There is a lot of value in having tests that >> detect the changes in Gecko's behavior. > > Yes, I agree too. One option I had considered was making a suite > "web-platform-tests-mozilla" for things that we can't push upstream e.g. > because the APIs aren't (yet) undergoing meaningful standardisation. > Putting the editing tests into this bucket might make some sense. That sounds good to me. As long as we recognize and support this use case, I'd be happy to leave the exact solution to you. :-)
![]() |
0 |
![]() |
On Tue, Sep 9, 2014 at 3:44 PM, James Graham <james@hoppipolla.co.uk> wrote: > Yes, I agree too. One option I had considered was making a suite > "web-platform-tests-mozilla" for things that we can't push upstream e.g. > because the APIs aren't (yet) undergoing meaningful standardisation. > Putting the editing tests into this bucket might make some sense. That definitely sounds like a great idea, but I think it would be even better if upstream had a place for these tests, so we could share them with other engines (and hopefully they would reciprocate). Anyone who's just interested in conformance test figures would be free not to run these extra tests, of course. I don't see why upstream would mind hosting these tests. In the longer term, I think it would be very interesting if all simple mochitests were written in a shareable format, and if other engines did similarly. I imagine we'd find lots of interesting regressions if we ran a large chunk of WebKit/Blink tests as part of our regular test suite, even if many of the tests will expect the wrong results from our perspective.
![]() |
0 |
![]() |
On 10/09/14 19:32, Aryeh Gregor wrote: > On Tue, Sep 9, 2014 at 3:44 PM, James Graham <james@hoppipolla.co.uk> wrote: >> Yes, I agree too. One option I had considered was making a suite >> "web-platform-tests-mozilla" for things that we can't push upstream e.g. >> because the APIs aren't (yet) undergoing meaningful standardisation. >> Putting the editing tests into this bucket might make some sense. > > That definitely sounds like a great idea, but I think it would be even > better if upstream had a place for these tests, so we could share them > with other engines (and hopefully they would reciprocate). Anyone > who's just interested in conformance test figures would be free not to > run these extra tests, of course. I don't see why upstream would mind > hosting these tests. I tend to agree, but I suggest that you bring this up on public-test-infra. > In the longer term, I think it would be very interesting if all simple > mochitests were written in a shareable format, and if other engines > did similarly. I imagine we'd find lots of interesting regressions if > we ran a large chunk of WebKit/Blink tests as part of our regular test > suite, even if many of the tests will expect the wrong results from > our perspective. Yes, insofar as "written in a sharable format" means "written in one of the formats that is accepted into wpt". We should strive to make sharing our tests just as fundamental a part of our culture as working with open standards is today.
![]() |
0 |
![]() |