Can you spot the difference?

Everything you need to know about Visual Regression Testing in 2022

David Xu
14 min readMar 18, 2022

--

Introduction

Visual Regression Testing has come a long way and it’s growing increasingly popular. Could 2022 be the year where it really takes off? And if you want to implement it in your project, what tools do you choose? Fear not, for in this article we will give you a comprehensive overview of all there is to know about Visual Regression Testing.

What is Automated Visual Regression testing?

Visual testing is about checking for unintended changes to an application’s look and feel.

At a high level, automated visual regression testing follows the following steps:

  • Step 1: Open a browser
  • Step 2: Simulate user interaction on the app
  • Step 3: Take screenshots of the app
  • Step 4: Compare screenshots with the stored baseline image
  • Step 5: Present comparison results to dev/tester

Examples of things visual regression testing tests include checking for an app’s background colour, element positions, content overflows and animation.

What makes visual testing difficult to automate?

Steps 1–3 above can be achieved quite easily with many browser testing tools such as Selenium, Playwright or Cypress.

The challenge comes in Step 4 — screenshot comparison.

It’s easy to compare two images pixel by pixel but the trouble is that browsers don’t always render every pixel the same. Minute differences could occur if the page is rendered with different monitors, or different graphics cards, using different browser versions, or a different OS etc. This means that if we perform an exact 1:1 pixel comparison, two screenshots that look identical to humans could fail because they’ve been rendered with completely different pixels. For example:

Source: https://applitools.com/blog/visual-regression-testing-developers/

You can learn more detail about the difficulty of comparing screenshots, on Applitool’s blog:

All the solutions described below, both Free and Commercial attempt to solve this through a mix of controlling the hardware and software the screenshot are generated with and having some tolerance threshold below which differences are considered acceptable.

Free/Community solutions

There are many solutions to visual testing with PlayWright, Cypress, and Storybook. We will go into detail for each one below, however, it’s worth noting that apart from slightly different APIs, they behave the same way; launching the Chrome browser, and generating a diff image typically using a pixel matching library called pixelmatch. We go into more detail about pixelmatch in the PlayWright section.

Another common thread for the free solution is recommending running the screenshot generation in a docker environment to reduce variability.

Now we go into a little more detail about each one

PlayWright native API

With PlayWright, there’s a page.screenshot() API right out of the box and you could get it up and running by simply doing:

import { test, expect } from '@playwright/test';test('example test', async ({ page }) => {
await page.goto('https://playwright.dev');
expect(await page.screenshot()).toMatchSnapshot('landing.png');
});

As mentioned earlier, under the hood, PlayWright uses pixelmatch; a pixel-level image comparison library to compare the screenshots. This library allows the developer to configure a threshold value (ranges from 0 to 1). The smaller the value, the more sensitive the comparison.

For example, when the threshold is set to 0.5 if a baseline has a value of#ffffff, and the new image’s same pixel has a value of #fafafa, the comparison will still pass even if the value is different because the sensitivity is low. But if the threshold was set to 0.01, then the comparison will fail.

The library also tries to handle differences caused by different anti-aliasing.

You can learn more about Playwright’s visual testing solution by checking out Playwright’s documentation.

Cypress plugins

With Cypress, there is an API for taking screenshots, but there is no API out of the box for visual testing. To perform visual testing with Cypress, you need to install one of the many dozen plugins. Two of the most popular in terms of GitHub stars are:

These two plugins have slightly different APIs but are very similar in configuration and usage.

Like PlayWright, cypress-visual-regression uses pixelmatch under the hood as its diffing engine.

cypress-image-snapshot is a little different. cypress-image-snapshot uses jest-image-snapshot under the hood, which uses pixelmatch as the default diffing engine. The default threshold is set to 0.01. There is also an option to use ssim.js as the diffing engine instead. Instead of pixel comparison, SSIM (Structural Similarity Index Measure) does a structural similarity comparison, which reduces false positives and has higher sensitivity to actual changes in the image. Using SSIM is experimental for now but may become the new default in the future.

Storybook addons

If you just want to test your storybook, there are a couple of addons you could use. The Storybook documentation recommends using Chromatic, but that is a paid service so we’ll talk about it later. The free options are

With StoryShots it uses Puppeteer to launch the chrome browser and take screenshots, and it uses jest-image-snapshot described above to perform comparisons

Loki is a little bit more powerful in that it supports running Chrome in docker, which reduces the variability of screenshots generated. It also supports React-Native running in iOS simulator and Android emulator. Like all other free options, Loki uses pixelmatch to do pixel comparison. There is an option to use GraphicsMagick(gm) or looks-same as the diffing engine. gm is faster but less accurate so you should also lower the tolerance threshold to compensate. looks-same is slower but will give a better result especially if the images being compared have different pixel densities.

Commercial Solutions

There is 4 major contender in the commercial space for visual testing; Applitools, Percy, Happo and Chromatic. There are some common themes across the offerings:

  • All of them are cloud-based offerings, which means they can make sure on the cloud that the renderings are always rendered with the same hardware and same configurations.
  • All support cross-platform screenshot capturing.
  • All Require very little configuration; just enough to send the HTML/CSS/JavaScript assets to the cloud

Chromatic

Chromatic is the simplest solution of the four. It is created by the same team behind Storybook and works only with Storybook. You do not need to do anything special other than writing normal Storybook stories. Chromatic will take care of everything.

Famous companies that use Chromatic includes Adobe, Auth0, Seek and more.

Test Framework Integration

Storybook

Workflow

The way Chromatic works is that every time the code is pushed, the storybook for the project gets published onto Chromatic’s CDN. The publishing can be done manually via CLI as well as automatically through CI/CD integration.

Once the storybook is published, Chromatic can render screenshots for all stories and compare those screenshots to the baseline. A list of changes will be shown and the developer needs to decide to check if the changes are intentional or not. This is known as the UI Test. After the UI Test, the developer can assign these changes to teammates, usually designers and PO for a UI Review.

Chromatic generates screenshots using Chrome, Firefox, and IE11.

Diffing engine

Chromatic uses pixel comparison to determine diff. It’s possible and even likely that it uses the same pixelmatch library under the hood as the other free options. Chromatic exposes the diffThreshold parameter for tuning diff engine sensitivity. The diffThreshold is a number between 0 and 1 where 0 is the most accurate and 1 is the least accurate. The default threshold is .063.

CI Platforms

Chromatic provides documentation for integration with Github, Gitlab, Bitbucket, Circle CI, Travis, Jenkins, and Azure. It’s possible to configure Chromatic for other providers too but there are no official docs and you’ll need to contact Chromatic for assistance.

Source Code Integration

Chromatic works with Github, Bitbucket and GitLab by default. If using another source control service, a custom CI script will be needed to add a check for Chromatic.

Chromatic makes available three PR checks; Storybook Publish, UI Tests, and UI Review. These can be configured to block a PR from merging until the UI changes are tested and reviewed. You do not need to use all of them and can pick and choose the check you want.

Chromatic uses the git branches and git history to decide how to check stories for change for both UI Tests and UI Review. This means that if person A working on branch alpha, publishes the storybook and approve his/her changes, those changes will not reflect for person B working on branch beta unless alpha is merged into the master and B is trying to merge beta into master.

You can get more detail on how Chromatic manages Branches and Baselines on their documentation

Pricing

Chromatic has a free tier that offers 5000 free snapshots per month, and its paid plans start at 149/month for 35,000 snapshots and $0.005 per extra snapshot.

Happo

Happo is another cross-platform, cross-browser, screenshot testing tool. Happo is very much focused on component testing only. While there are ways to do full page screenshots, it is not what this service is built for.

Famous companies that use Happo includes Patreon, Lottie, Brigade and more.

Test Framework Integration

Happo has its way of writing test suits called Happo Examples. Additionally, it also integrates with Storybook, Cypress, Playwright, Stencil, Full-page, and Native Apps.

However, Happo focuses more on component screenshots as opposed to full-page screenshots. So even when using things like Cypress and Playwright, it recommends selecting only an element at a time to snapshot.

With Happo Examples, the syntax is kind of similar to Storybook. For example. if you have a component called Button, you would create a Button-happo.js file and put in something like this:

import React from 'react';
import Button from './Button';
export const primary = () => <Button type="primary">Primary</Button>;
export const secondary = () => <Button type="secondary">Secondary</Button>;

When the Code is pushed, Happo will render all the files that end in -happo.js and compare their screenshots.

Alternatively, Happo also works with Cypress and PlayWright. As mentioned earlier, Happo is just for visually testing components. So with things like Cypress and Playwright, you always need to select a child element before you can take a screenshot. For example:

describe('Home page', function () {
it('loads properly', function () {
cy.visit('/');
cy.get('.header').happoScreenshot();
});
});

CI/CD Integration

Assuming a pull-request model, Happo provides official scripts to work with Travis, circle cI, and GitHub-action. It could potentially be used to work with any CI environment via a generic happo-ci script.

Source Code Integration

Happo can post back status to the PR if using Github, or Bitbucket, Happo makes available only one PR status; whether there is diff or not. Happo generates the diff between the current commit and the HEAD commit of the same branch.

Workflow

At a high level, the flow to use Happo is quite similar to Chromatic too. Code gets pushed, Happo is run in CI, screenshots are taken and compared with previous versions and depending on if there are any changes, the check either passes or fails.

Happo supports Chrome, Firefox, IE11, Edge, Safari and iOS Safari.

Diffing engine

By default, Happo uses a technique called bitmap hashing, which means all the pixels in an image is used to generate a hash, and it’s the final hash that’s compared, instead of individual pixels. This means that if two screenshots have even just a single pixel difference, then it will be flagged as a diff.

There is an additional deep comparison that can be applied if needed. There are three parameters to configure the deep comparison

  • Compare threshold: How different individual pixel is allowed to be (Happo recommend 0.002)
  • Ignore threshold: Ignore pixels if other pixels around them are fine (Happo recommend allowing 5 pixels to above compare threshold in a 1000x500 pixel)
  • Apply blur: This can help smooth out some rough edges that may cause diff (Only recommended for high contrast screenshots where element edge alignment issues cause diff)

Pricing

Happo does NOT have a free tier. Instead, it has a 30 day free trial for all of its plans, and the lowest tier starts at 125/month for 10,000 snapshots and $0.012 per extra snapshot.

Percy

Percy describes itself as an “all-in-one visual testing and review platform”. At a glance, it is very similar to Happo, but in addition to component testing, it also supports testing both full pages. There are also more ways to integrate Percy, and more out of the box CI support too.

Famous companies that use Percy includes Google, Shopify, Canva and more.

Test Framework Integration

Percy has dozens of SDKs for different platforms. Additionally, Percy can also be integrated into Ember, Rails, Storybook, Cypress, Puppeteer, Playwright, Selenium, Nightmare, Nightwatch, Protractor and more. It even works with static sites like Gatsby and Jekyll.

But the easiest way to get started would be to use the CLI command Percy snapshot. Unlike Happo Examples where you need to create individual files and import the components, with Percy snapshot, you could simply define the route you want to test in a snapshots.yml file like:

http://localhost:8080
http://localhost:8080/two

And then run the command Percy snapshot snapshots.yml to snapshot test the given URLs.

Integration with End-to-end testing tools is similar to Happo, except that with Percy, you are allowed to take screenshots of the whole page, so you do not need to select an individual element first:

describe('Integration test with visual testing', function() {
it('Loads the homepage', function() {
// Load the page or perform any other interactions with the app.
cy.visit(<URL under test>);
// Take a snapshot for visual diffing
cy.percySnapshot();
});
});

Percy can even be integrated into static sites such as Gatsby, by adding a gatsby-plugin-Percy into the gatsby-config.js file. After that, when running gatsby-build the plugin will query the page and take a screenshot using Percy if Percy is running.

CI/CD Integration

Percy provides documentation for integration with AppVeyor, Buildkite, CodeShip, Drone, Netlify, Semaphore, Github, Gitlab, Bitbucket, Circle CI, Travis, Jenkins, and Azure. It’s possible to configure Percy for other providers too. There’s a very generic guide but if it doesn’t work, you’ll need to contact Percy for assistance.

Source Code Integration

Percy integrates with GitHub, GitLab, Bitbucket, Enterprise firewalls and Azure DevOps. It facilitates a two-way sync between Percy builds and pull/merge requests. By default, Percy approvals aren’t required before merging, but this can be easily changed.

Percy can update a PR’s commit status when:

  • A build is processing
  • A build fails due to missing resources, rendering timeout, or if no snapshots were uploaded
  • Visual changes are detected and ready for review
  • A build finishes processing and has been auto-approved
  • All build is approved
  • Changes have been requested within a build
  • Previously requested changes have been carried forward to a build

Workflow

The basic workflow of Percy is the same as Happo and Chromatic. No matter what kind of integration you use, Code gets pushed, Percy is run in CI, screenshots are taken and compared with previous versions.

Percy supports Chrome, Firefox, and Safari

Percy will automatically match and group snapshots that have the same visual change. For example, if the header changed, the change might appear on all the pages. In those cases, you could approve the change to the header once and apply it to all the pages. (The lack of this feature MIGHT be why Chromatic and Happo chose to focus on only individual components rather than checking entire screens)

Diffing engine

Percy uses a pixel-by-pixel comparison of pages and components to decide if there’s been a diff. The sensitivity of diffs can be adjusted for each project to match the desired error tolerance.

Percy also provides several functionalities to stabilise the generated screenshots. These includes:

  • Freezing animations — Percy will freeze GIFs on the first frame, and freeze most CSS animation and transition styles.
  • Ignoring 1px diffs — If there’s only 1px different within a 7px surrounding circle, it will not be marked as a visual change.
  • Video handling — Percy will freeze all <video> elements and show only the video’s static thumbnail. However, if you use video.play() JavaScript, and enable-javascript: true in Percy config, you must handle these manually because Percy cannot freeze the video.

Pricing

Percy has a free tier that offers 5000 free snapshots per month, and its paid plans start at 149/month for 25,000 snapshots and $0.012 per extra snapshot.

Captures the DOM snapshot and sends it to the server and renders on multiple browsers

Applitools Eyes

Applitools is arguably the most powerful testing suit on this list. It consists of two core products, Applitools Eyes and Ultrafast Grid. Applitools Eyes is the Visual AI algorithm for visual testing. Ultrafast Grid is Applitools’ online compute platform for running cross browsers, cross devices, cross viewport sizes tests in parallel.

In addition to visual testing, the platform can be used for functional testing, accessibility testing, PDF testing, native mobile testing and more.

Famous companies that use Applitools include Microsoft, Salesforce intuit and more.

Test Framework Integration

Selenium, Cypress ,WebdriverIO ,Storybook ,TestCafe,Watir ,Protractor ,Playwright ,Appium,Espresso ,ImageTester PDF ,Robot Framework

CI/CD Integration

Applitools provides documentation for integration with AppVeyor, Github, Gitlab, Bitbucket, Circle CI, Jenkins, Azure, Semaphore, TeamCity, Bamboo, Jira, Rally, Slack.

Workflow

The basic workflow of Applitools is the same as other paid offerings. Code gets pushed, Applitools is run in CI, screenshots are taken and compared with previous versions.

Percy supports Chrome, IE, Firefox, Safari, and Edge.

Diffing engine

The most impressive and different thing about Applitools is the Diffing engine. Applitools claims that:

“Applitools Eyes is powered by Visual AI, the only AI powered computer vision that replicates the human eyes and brain to quickly spot functional and visual regressions. Tests infused with Visual AI are created 5.8x faster, run 3.8 more stable, and catch 45% more bugs vs traditional functional testing” ~ Applitools

According to their website, the AI utilises machine learning to understand the meaning of the pages eg separate the foreground from the background, look at an item to determine if it’s an image or an icon, identify a table etc.

This is useful because it means unlike other services which can only compare screenshots between previous and current pages for the same browser, and tolerate minor differences due to hardware or browser version differences — Applitools can be used to compare a screenshot from Chrome to a screenshot from IE and understand what differences are important and what differences are negligible to a human.

Applitools also comes with three different algorithms; exact, strict, content, and layout which all pick up different things and you can use different ones depending on what you are looking for.

Pricing

There is no pricing information on their website. But there is a free account with a limit of 1 user and 100 checkpoints.

Summary and Recommendations

The main challenge with visual regression testing is reproducibility due to differences in browser versions/hardware configuration/display settings etc.

In terms of identifying differences, all the free/community libraries are all wrappers around pixelmatch; an image diffing library. They offer different API and different integrations but ultimately they are pretty much the same in terms of performance. They are all cheap to use but could be quite brittle. To ensure reproducibility it’s best to use a docker container.

In terms of identifying screenshot differences, I suspect the commercial products are doing the same thing as the free options. The only stand out in terms of visual difference identification is Applitools which uses AI and attempts to see the page more like a human. But it’s debatable how useful that is.

So for solo developers, or small teams projects, free/community project is the way to go. As long as docker is properly set up and maintained, and the developers talk to each other, these options should be sufficient.

For larger teams, especially if you want a platform where designers and product owners could jump in and review the visual changes, the commercial product is the way to go. The main selling point of commercial platforms is their platform and user/developer experience. While the diffing engine itself is not too different from the free ones, the commercial options run in the cloud, which saves developers the trouble of setting up and maintaining Docker containers and browser versions. They are also easily accessible by designers and POs so non-technical team members and stakeholders can formally comment and sign off on changes without cloning the repo and setting up docker etc.

And when choosing the commercial products, if the team is focused on maintaining a component library and design system with a storybook then Chromatic is the way to go. If the goal is to test the whole application journey then Percy seems to be the most polished platform for that. And if you want all the bells and whistles as well as the unique AI visual comparison tool then Applitools is the product of choice.

--

--