How to Test Visual Design

June 29, 2018 0 Comments

How to Test Visual Design



Summary: When evaluating fonts, colors, and other visual details, assess both aesthetic impressions and behavioral effects.

Visual details like fonts, colors, alignment, and images are increasingly expected to not just create a usable experience, but also to express the complex brand traits such as friendliness, reliability, or innovation.

Many teams begin by defining the target brand traits; then designers and stakeholders select the visual details they believe will best convey those brand traits.  This approach assumes that the opinions of designers and stakeholders will accurately predict users’ reactions. It’s a great first step, but it does not guarantee that what the designers thought looked ‘friendly’ will be perceived as such by users.

When organizations have a lot to gain from effective branding, aesthetic choices and their impact on users’ attitudes should be assessed through a rigorous, data-driven approach.

How to Test Perceptions of Visual Design

As with any form of UX research, recruit test participants who are representative of your target audience. They  don’t need to have any design expertise — people don’t need training in visual design to know whether they like something; in fact, users can reliably rate how much they like visuals in less than a tenth of a second (according to one study by Gitte Lindgaard and her colleagues). However, knowing whether someone likes a design doesn’t indicate whether the design conveys the right brand qualities.

(And for good measure, let’s emphasize that it’s also not a valid criterion whether you like the design or whether you think it expresses the targeted traits. You are not the user, and neither are the other members of your team nor your management.)

To measure brand perceptions, instead of just asking people whether they like a design, use a more structured approach including 2 main parts:

  1. Exposure to the visual stimulus: Show study participants the visual design, which could be a static image, a prototype, or a live interactive website or application.
  2. Assessment of user reactions to the stimulus: Measure users’ reactions to the design using either open-ended or strictly controlled questions.

Presenting the Visual-Design Test Stimulus

The “test stimulus” (that is, the visual representation of the design) you use can be easily adapted to work with several different types of research studies. When conducting in-person visual-design evaluations, you can simply show people a static image, either printed on paper or displayed on a screen. Printed pages should be a realistic size, and pages that are longer than 2 screens are typically better evaluated in a digital form, since printing them out would show users far more content at once than they would ever actually see on a screen. Use static images if you want to ensure that you get feedback about immediate first impressions of a specific visual design.

Aesthetic and brand impressions can also be assessed using remote, unmoderated methods, which allow testing with users who are difficult to meet in person, or with large groups of users (useful when you need a high degree of certainty in the findings). Any survey tool that can display images works for remote assessments.

If you’re interested in first impressions, present the visual stimulus to the user for a short amount of time. There are two ways in which you can achieve this goal:

  1. 5-second test:  With this type of test, you show the stimulus for 5 seconds (or for another short period of time). This approach is best for accurately capturing people’s ‘gut reaction.’ 5 seconds of viewing time is too short for reading copy or for noticing details like specific fonts or colors, but it is enough for forming an impression which accurately reflects the visual style.
  2. First-click test:  You give participants a specific instruction (such as “Find out more about this organization”) before they are exposed to the design and stop them after they click the location on the screen where they could complete that task. Most users will still spend only a few seconds on this type of test, but instead of intentionally looking at the whole page, they will search for a specific task-related feature or link, and only view the rest of the design peripherally. This test is best suited if you expect your users to already have a specific goal in mind the first time they encounter your site.

These two tests are easiest to administer remotely, using services such as 5 Second Test and Userzoom (for 5-second tests) or Chalkmark (for first-click tests).

Keep in mind that, with a first-click test, the exact task instructions you provide will certainly influence what participants notice and remember about the visual design. If your users are likely to have a variety of goals on your site, randomly assign users to one of several different task instructions, or stick to the more neutral 5-second test.

Comparing Multiple Design Variations

Frequently, showing users more than one possible visual designs helps them identify what they like (or dislike) about each variation. If you ask participants to assess more than one design, be sure to vary the order in which they see the alternatives, since part of people’s response may be influenced by which version they see first. (For example, if one version is easier to understand, those who see that one first will have learned about the content and will be less confused by the other variation.) Keep track of which version each person sees first, so you can take it into account when analyzing responses.

Also, when asking a user to evaluate different versions of the same design, the differences must be significant enough to be immediately detectable to a lay person. Small changes such as minor variations in font sizes or substitutions of similar fonts may be obvious to a visual designer but are often undetectable to the average user. Asking people to consciously identify and evaluate these subtle details will most likely just confuse them and waste your time. (Even worse, you may fall prey to the query effect, where users make up an answer simply to satisfy the question, even though they don’t really feel differently about the two overly-similar versions.)

Assessing User Reactions: Open-Ended vs. Structured 

Once participants have been exposed to the design, the next step is to measure their responses. People’s aesthetic impressions can be very idiosyncratic and will need to be systematically analyzed to identify meaningful trends. This can be done with open-ended feedback, but using a slightly more structured approach makes it easier to understand overall patterns. Here are a few techniques that can be used, ranging from completely open-ended to highly structured:

  • Open-ended preference explanation: Ask users to explain why they like a design
  • Open word choice: Ask users to list 3 to 5 words that describe the design
  • Closed word choice (desirability testing): Provide users with a list of terms and ask them to pick the words which best describe the design
  • Numerical ratings: Collect numerical ratings about how much the design exhibits specific brand qualities

Open-Ended Preference Explanation

The first method, simply asking people to explain why they like (or don’t like) a design, can work well for in-person sessions with highly motivated and articulate users. This question casts the broadest net and can be useful if you don’t know much about your audience’s expectations, and want to discover what matters to them. It can also help identify opinions that based on personal idiosyncrasies (such as “I like purple”), that can be screened out so you can focus on more substantive factors. The drawback of this approach is that you may get only brief or irrelevant responses if the participant is not motivated or just not very articulate.  This method is especially risky in an unmoderated remote setting (such as a survey), since you won’t be able to ask for more detail in followup questions if someone gives a vague response such as ‘It’s nice.’

Open Word Choice

A slightly more structured approach to assessing user perceptions is to ask test participants to list several words which describe the design. This format ensures you get at least some specific feedback, while still keeping the question open-ended to discover factors you may not have considered, but which are significant to your audience. You may get a wide range of descriptors back, and will need to analyze them carefully to identify meaningful themes. A good approach for this analysis is to categorize terms as generally positive, negative, or neutral, then group terms with similar meanings, and evaluate whether they match your target brand attributes. For example, the table below shows descriptors provided about a business-to-business website whose brand goal was to be trustworthy, contemporary, and helpful. None of these terms were specifically named by the study participants as descriptors, but many users described the design as simple (with both positive and negative connotations).

Positive Neutral Negative


Simple, Bold

Professional, Neat






3 parts


Bland, Typical, Safe

Too Simple

Simple, Generic




Too Much Information


Open-ended word-choice questions elicit a broad range of descriptors, which must be analyzed to determine whether they effectively express the desired brand traits.

Structured Word Choice

Requiring users to choose descriptors from a list of terms you provide is a controlled variation of the word-choice method. By supplying users with a limited set of words, this method focuses specifically on whether the target brand attributes are perceived by participants. The brand traits you hope to convey should be included in your list of terms, along with other choices which describe contradictory or divergent qualities. Structured word choice, (also known as “desirability testing”) is less sensitive than open word choice to discovering new points of view, but makes it easier to compare different versions of a design, or the reactions of different audience groups to the same design. This technique works well in an in-person study, where you can ask users follow up questions and let them refer to the design as they explain their reasoning for selecting each term. It can also be used in a remote study, but it's not a good idea to combine this with a '5-second' test format because looking through a long list of words may take so much time that by users get to the end they don't recall much about a design they only saw for 5 seconds. Instead, use a survey tool which allows people to see the design as they are choosing words from the list. 

Numerical Ratings of Brand Perceptions

Finally, the most controlled approach is to collect numerical ratings of how well each brand trait is expressed by the design. To avoid prohibitively long test sessions, pick the 3–5 most important brand qualities and ask people to rate how well each of them is captured by the design. (The more questions you have, the more difficult the questionnaire, and the higher the chance of random answers.) Because this paradigm limits the ability to discover different perspectives and reactions, numerical ratings are appropriate only if you’ve figured out the most common perceptions in previous research and simply want to assess the relative strength of each quality.

Finally, a word about focus groups: although they can be used to capture user preferences, this method is risky if you don’t have a talented, experienced focus-group facilitator available. Capturing detailed feedback about a visual design from each participant in a group conversation is difficult. One tactic that can help is to ask participants write down their own perceptions before discussing them as a group, and collect these written comments for later analysis. Also focus groups don’t capture any behavioral information.

Assessing Visual-Design Aesthetics Within Usability Testing

All the methods described above focus specifically on visual impressions; but the reality is that people do not encounter visual design in isolation, but rather as part of a holistic experience which also includes content and interaction. Each dimension of the user experience affects the other dimensions: more aesthetically appealing designs are often perceived as more usable. Likewise, users’ perceptions about brand traits are influenced by interaction-design choices: a design that appears simple and welcoming at first glance can quickly become confusing and frustrating if people can’t understand how to use it.

Although first impressions are important, they don’t tell the whole story. You should also assess how the visual design affects users’ behavior and task success when they actually interact with the system. In fact the effects of subtle changes, such as slightly increased header sizes, may only be apparent when people actually use a system; at first glance they may not even notice a difference, but when skimming an article, larger headers may make it easier to jump quickly to a specific section. This changed behavior may again improve users’ ability to find relevant information and make them like the like the site much more. People may even say that the writing is better (because they read more information of interest) even when the actual copy remained constant and the only change was to the typography.

Luckily, typical usability-test protocols can be easily modified to incorporate assessments of visual design. You can include specific questions about visual impressions and even word-choice exercises into a regular usability-testing session. However, instead of trying to capture the users’ first impression, these aesthetic assessments should happen after the behavioral usability portion of the study is complete.

The sequence is important because if you ask someone’s opinion about the visual design at the beginning of the session, you run the risk of biasing the behavioral portion of the study. Especially if users have seen multiple versions and picked a ‘favorite,’ they are likely to ignore or minimize any problems they experience with their ‘favorite’ version throughout the rest of the session.

Instead of asking users about their visual-design perceptions at the beginning of the session, have people complete the behavioral tasks first, and pay attention to actions or spontaneous comments that relate to the visual design. For example, in a recent test of a prototype of our company’s website, we asked users to complete normal usability tasks such as finding content. While attempting a task, one user casually commented that the new navigation menu at the top of the page was helpful. This menu was not actually a new feature of the design — it was the same menu present on the website that this user had visited regularly in the past, now displayed in a lighter font and without uppercase styling.

Screenshot of the primary navigation in a previous version with a heavy all-caps font
A study which compared two different fonts for the global navigation using normal usability-testing methods discovered that the original navigation menu (top) which was presented in a heavy, all-caps font became more discoverable and was perceived as a ‘new’ addition to the site when presented in the lighter, title-case font (bottom.)

Once the task-based portion of the study is complete, you can shift to assessing users’ perceptions of the brand traits. Their answers won’t be based exclusively on visual impressions like they would be in a 5-second test, but the impression formed from the combination of visuals, content, and interaction is actually closer to how users react in the real world.

Should Visual Impressions Be Tested in Isolation, or as Part of Usability Testing?

For interactive systems, assessing visual preferences should never be done instead of usability testing. If you only have time and resources to do one test, make it a usability test with added techniques to assess the effects of visual design.

Consider using the standalone methods described in this article when:

  • Time and resources permit multiple types of testing
  • Visual and brand perceptions may significantly influence the success of the product
  • Before testing an interactive prototype, to compare divergent visual approaches
  • After testing an interactive prototype, to confirm findings with a larger sample of users


Lindgaard, G., Fernandes, G., Dudek, C. and Brown, J. “Attention Web Designers: You Have 50 Milliseconds to Make a Good First Impression!” Behavior and Information Technology, 25(2), 2006.

Rohrer, Christian. “Desirability Studies: Measuring Aesthetic Response to Visual Designs.”, October 28, 2008.

Tag cloud