Remote Usability Testing

Remote Usability Testing for Mr.One

Last month, I did a set of remote testings in order to improve the usability of our company’s first product. As the only UX person in a startup, it was challenging. I managed to do it remotely with very limited resource and get some useful insights. I wanted to write down the process for my reference.

Time: 2017.4 - 2017.6

Role: User researcher

Method & Skill: Cognitive Walkthrough, Remote Usability Testing, Think Aloud


1. Planning

1.1 Goals

There were tons of things to prepare before actually testing with potential customers, and good preparation would help me get actionable information from the test efficiently. At this stage, we already had a fully functional product. We had 5 participants who volunteers to take the tests. And everyone in the team had some different thoughts on what problems our product has: some from internal testings and some were collected from demos for potential customers. So our two main goals of the tests are:

1. To validate existing assumptions of usability issues we have.
2. To discover useful features we might miss in the current product.

1.2 Task Flows

Since we already had our product built, a summative task-based usability testing perfectly suits our goals. As our participants were in different location, we plan to do a remote moderated testing to avoid the hassle and cost of commuting. Though it wouldn’t be as good as an in house usability testing, it would still help us get enough insights as a starting point for product improvement as I would observe and interact with participants remotely. To structure all the assumptions as well as to come up with a proper task flow, I did a detailed cognitive walkthrough in 3 different possible routes systematically. This also helped me get a clear image of different task routes user could possibly have. Then I picked 2 task flows for our two different target user groups: the business users and the data scientists. The flow for business users were mainly focused on creating project, selecting template, simple data handling and simple modeling, while the flow for data scientists were focused on advanced variable settings and modeling settings. The two task flows covers all the potential issues we thought users might encounter.

Rating table

Figure1: One task flow I did while doing cognitive walkthrough.

1.3 Task table and rating system

I used a spreadsheet with five columns to document the tasks, how to describe it to the users, the expected action sequences for each task, whether participants finished the task, and if participants are confused and helped. The tasks column documented how we describe it within our team, and how we will describe it to users during the test. The purpose to distinguish these two are 1. Having clear communication within our team. 2. Avoid using the texts that already used on our products’ interface so that the task description wouldn’t be guiding. There were always more than one available action route a user could take to finish a task, and sorting them all out before testing make note taking easier during and after the test. The last second column would be used to document participant’s performance route and a performance rating. And the last column was used for observers to write down notes during the tests.

Binary rating system is usually applied in usability testings with a large amount of participants. It is used to gather and analyse quantitative testing results. However, I choose to use three rating scales to answer questions “whether participants finished the task at ease”: ‘Yes’, ‘Pass’ and ‘No’. There are two reasons: first of all, it would have stronger descriptive power to record participants’ behavior; secondly, the three level ratings can reveal the severity of the issue thus help us set priorities on what to solve first. Also I had a column to record participants’ self-rating of easiness on each task in my table v1 but dropped it in my table v2 . Two concerns: 1. I will have only 5 participants and their opinion will not have statistical significance, so it’s much of a personal opinion that might be less helpful for us; 2. Adding a question after user finish each task will add up to the total testing time, which is less desirable for both sides. Not collecting participants’ attitudes on each task enabled me to focus on observing participants’ real behaviors, which are more likely to reveal actionable insights on product improvements.

Rating table

Table1: Example of a task based preparation table

1.4 Tools

There are various remote user testing software to use, Remote UX Research summarized a detailed list of remote ux research tools. However, I chose Zoom meeting, a remote communication tool for team collaboration instead. It offers functions such as video conference, screen sharing, video recording and remote control. The first two functions are already enough for a moderated remote testing: for remote communication during the test while observing how participants interact with our App on his/her screen. Video recording is for more detailed qualitative analysis afterwards thus optional and remote control enable testers to help participants when they encounter technical errors. Zoom offers 45 minutes’ free video session for anyone, so participants can just join the remote meeting by clicking a link. They don’t have to register a zoom account.

After I’ve done all the preparations for the test, there is still one important thing left: email my participants ahead with the time schedule, briefing of testing agenda, IT requirements and materials(sample data file) needed. I also make it clear in my email that it’s our product to be tested, not the participants.

2. Conducting the Test

2.1 Observation, probing, taking notes

The test we did were moderated tests, which meant I would join the online meeting at the same time with the participant, greeting him/her, watching him/her doing tasks and think aloud, taking notes and offering help. Moderated tests can gather more useful information because I got the contextual information. Observing participant’s facial expression and actions helped me understand if he/she is confused. And at some point I would prob further based on participant’s think aloud to understand “Why”. Then noted them down in the data table I prepared earlier.

For each test, I also invited one team member to join me watching how potential users interact with our product. They can help answer some professional questions. This would also save me a lot of effort after the test persuading my team to do the changes.

My colleague and I also offered help when participants couldn’t finish the task or asked for help. When I found a participant doing an unexpected activity, I would ask why and let participants further explain his/her logics. I noted down the reasons, and the notes were especially helpful afterwards.

When participants had went through all the tasks, I had a short discussion with them in regard of the whole process and thank them for participation. Asking “Is there anything you want to mention that I didn’t cover?” was magically useful and helped me get valuable insights that we didn’t think of before.

3. After the Test

3.1 Summarize findings, scorings into actionable insights

After each test, I quickly summarized my notes to google spreadsheet when my memories were still fresh. I recorded details such as which route did the user take, which step did the user get stuck and why he/she got confuse. Being detailed to a specific step helped me easily located areas that need improvements in the correspondence interface. And there was no better proof than comparison of what users actually did with what we expected them to do.

To gain an holistic understanding of the testing results, I summarized user ratings into another table. Comparing the success rate of each task from all 4 tests(5 participants among whom 2 did a test together), I marked tasks with more ‘fail’ and ‘pass’ rather than ‘yes’ to be the ones to further look into.

Rating table

Table2: Example of summary of rating table

Then I went back to my previous table of detailed tasks and checked my notes to understand why participants couldn’t finish these tasks. Then I added a column called “Areas of Improvement”. Under this column for each task that has less ‘yes’, I locate the errors to one specific action step, then the areas needs improvements are clear.

4. Reflections and Takeaways

4.1 Reflections

Since I had gone through the process a lot of times, both in cognitive walkthrough and pilot testing by myself, I am very familiar with the two different routes and each task users would do in testings. This helped me easily find problematic areas while watching users use the product. Also, being specific to each step in a single task enable us to summarize the results quickly and turn them into improvements suggestions.

Still, there are several things I wish I could do better next time. When I chose the dataset and design scenarios, I just chose one that was available and easy to understand. However, our data scientist later pointed out that it was not an usual prediction case for both business people and data scientist in business world. It’s better to use a more common case to resemble a real situation. Also, inform participants about the technical specification earlier is very important to avoid unexpected situations. Also, it’s good to invite colleagues to join the testing, but not too many at one time. Too many testers with one participant would possibly make the participant feel stressful. Then we might not get much useful insights from this testing as we could have. Finally, when the testing went for a very long time, it’s less possible to get useful information as participants got fatigue.

4.2 Some takeaways

This is the first time I did remote usability testings. It’s similar with traditional usability testings but still a bit different, below are things that I would pay more attention when I would do a remote usability testing next time:

- Cognitive walkthrough is helpful in structuring task flows and identifying possible issues.

- Depending on your goal, choose reasonable rating criteria.

- Be specific about the technical requirement(Browsers, OS, etc.) and let your participants know it ahead.

- Be courtesy and don’t let the meeting last too long (Ideally 45 - 60 min, no longer than one hour)

- It’s very important to go through the whole process by yourself and with your colleagues several times before the real testing.

- Invite your colleagues to join the testing is beneficial, but not too many at once.

That’s much from my first remote usability testing. I’d love to hear your experience doing remote usability testing. And you are more than welcome to discuss any related question with me.

-- Thanks for reading --
If you have any questions or simply want to chat with me about remote usability testing, feel free to drop me an email.