Usability Testing for a Middle School Science Education Video Game

Alex Urban
Feb 3
8 min read

Updated: Feb 11

Overview

Mission HydroSci is a 3D educational game designed to teach middle school students water science and scientific argumentation. During its development, I conducted usability testing to evaluate whether players could successfully complete tasks, identify interaction challenges, and assess the overall efficiency and satisfaction of the experience. My work ensured that player behaviors aligned with design intent, driving improvements to enhance engagement and usability.

Players of Mission HydroSci must learn to use the landscapes and its water systems to survive. — Players of *Mission HydroSci* must learn to use the landscapes and its water systems to survive.

Problem Statement

The project addressed the challenge of ensuring that gameplay and UI features aligned with user needs and functioned as intended. Usability testing during development aimed to identify and resolve design issues early, optimizing the player experience for effective classroom implementation.

Users & Audience

The primary audience for this application is students aged 11 to 14, particularly those at risk of underperformance in science education due to systemic inequities, such as underfunded schools, socioeconomic challenges, or disengagement from science learning. The secondary audience includes teachers who will integrate the game into their classroom instruction.

Mission HydroSci is designed for middle school students, requiring an understanding of both their expectations and those of their teachers.

Roles & Responsibilities

As the lead UX researcher and project manager, I collaborated with a cross-functional team of 20+ developers, designers, and artists to drive the user experience strategy. I managed the timeline for usability testing, ensuring that features underwent thorough quality assurance before being handed off for testing. After conducting an audit of each feature, I developed detailed testing protocols and mentored a junior researcher on the necessary procedures and steps required to successfully complete the testing tasks. Once testing was complete, I analyzed the findings, including usability issues, video footage, and screen captures, and synthesized actionable insights to present to leadership and the design team, driving design improvements.

I developed Helix Plan dashboards to create and manage sprint tasks and releases, ensuring timely execution of UX tests. Helix Plan is a Jira-like took that is commonly used for backlog and task management for video game development.

Scope & Constraints

In my role as scrum master, product owner, and lead UX researcher, I was responsible for balancing competing priorities, which often impacted the usability testing schedule. The project had a tight timeline focused on delivering a minimal viable product (MVP), which meant that there wasn’t enough time for extensive internal iterative testing before features were passed on for usability testing. As a result, some desired design polish had to be de-prioritized in favor of meeting deadlines and ensuring core functionality. This tight scope influenced both the testing process and the feedback cycle.

Our timeline was short, and I communicated laboratory and classroom testing milestones through Miro.

Process

I collaborated closely with the development team to ensure the readiness of specific gameplay features for usability testing. Instead of testing entire builds, we focused on targeted gameplay sequences, such as 3D puzzles (e.g., navigating a dungeon with knowledge of water flow and sediments) and 2D user interface elements (e.g., categorizing parts of an argument in an argumentation sequence).

To standardize the testing process, I created feature-specific usability testing protocols for a junior researcher to follow. Each protocol was tailored to the feature being evaluated. For example, testing a 3D puzzle emphasized problem-solving pathways and ease of navigation, while testing a UI element focused on the clarity of labels and the responsiveness of controls.

The junior researcher managed participant recruitment. After IRB approval, participants were recruited on a first-come, first-served basis through a university-hosted announcement website and email system, reaching both university-affiliated individuals and community members.

I designed usability tests to focus on specific features and collaborated with junior researchers to create easy-to-follow protocols.

Testing Procedure

Each session involved four participants, who played individually. The junior researcher introduced the study before gameplay began, ensuring participants understood the process. We used Panopto, a video recording software, to capture gameplay, facial expressions, and interview responses, providing a comprehensive view of the user experience. The researcher also transcribed participant feedback as-close-to verbatim as possible during the interviews.

During each session, feature-specific questions were asked first, tailored to the specific feature under investigation. For example, when testing the argumentation sequence, a question might include questions like:

Was it clear how to categorize the parts of an argument? Why or why not?

Once all features had been tested, core questions were asked to gather general feedback across the session, such as:

Did anything surprise you while playing this game today?
Was there anything you thought would happen but didn’t?
If you had a magic wand to change one thing about the game, what would it be?
Would your friends enjoy this game? Why or why not?
Would you be excited to play this in class? What about outside of class?

Prioritization and Metrics

To evaluate the usability of the gameplay features, I tracked both time on task and the frequency of usability incidents. This data was essential for understanding participant behavior and identifying areas for improvement. To ensure a clear focus on the most critical issues, I implemented a structured prioritization framework, which categorized incidents into three levels:

High Priority: Incidents that significantly impeded the participant’s ability to progress through a curricularly-heavy task, leading them to request assistance. These issues directly hindered task completion and were addressed as top priority for immediate resolution. High priority incidents typically affected all participants.
Medium Priority: Incidents related to suggestions or usability concerns that did not obstruct task progression but were encountered by two or more participants. These issues were important for enhancing the user experience but did not necessitate immediate action, given the overall usability flow.
Low Priority: Minor suggestions based on personal preferences or small changes that did not substantially impact gameplay. Due to time constraints, these were categorized as low priority and were not pursued unless there was remaining capacity or they became more prominent in later testing stages.

This structured approach allowed us to focus on the most critical usability improvements, ensuring that our limited testing time was spent addressing the issues with the greatest potential impact on player experience and task completion.

Analysis & Reporting

After each session, which involved four participants, the junior researcher compiled a summary of findings. I reviewed the transcripts, gameplay videos, and screen recordings to identify usability challenges. These findings were documented with supporting evidence (screenshots, anonymized video clips, and transcripts) and formatted into a Google Slides presentation for the design and development teams.

Within one week of conducting UX testing with at least four participants, I reviewed the findings and presented data and suggestions to the design team.

Following each presentation, I facilitated a discussion with the design team to determine final usability decisions. Given the team's tight timeline, Miro was used to record High Priority decisions. Any necessary handoffs to the art and development teams were tracked in Helix Plan, which served as our task tracker and backlog management system.

Different teams require different processes and tools. Primarily used by our design and art teams, Miro served as an ideal space to document product changes from UX testing and link to development tickets in Helix Plan.

Outcomes & Lessons Learned

The following are sample findings from usability testing conducted on specific gameplay features, including the Topographic Character Hunt and Watershed Glyph Game. While these examples highlight key usability issues identified during the sessions, they represent only a portion of the findings from the testing process. Other observations and insights were also shared with the design team, but for brevity, this summary includes the most significant outcomes.

Topographic Character Hunt

During usability testing, we identified several usability issues related to a Topographic Character Hunt gameplay sequence, particularly around providing adequate hints to guide players through the map and topographic challenges. Players were required to use their map and topographic knowledge to locate stranded characters based on directional and distance clues. However, many players found themselves needing additional guidance to interpret the instructions effectively. All four participants encountered difficulties interpreting the spatial directions given in the game via dialogue. P5 specifically mentioned that the information about “northwest and 90 ft” was helpful, but the way this was presented could be more intuitive. P6 expressed a similar need for more explicit guidance, saying, “Anderson is literally telling us where.”

Before Findings: All four participants encountered difficulties interpreting the spatial directions given in the game via dialogue. — ***Before Findings***: All four participants encountered difficulties interpreting the spatial directions given in the game via dialogue.

Prioritization: Given the frequency of player struggles and the impact on gameplay progression, this issue was categorized as High Priority. All four participants needed help at some point, and the design team was advised to integrate hint reminders into the UI to ensure smoother navigation and task completion without disrupting the player's flow.
Product Revision: Based on participant feedback, one recurring suggestion was to display hints directly on the UI, especially when selecting the waypoint on the map. For example, one player proposed a feature where the hint could be shown on the top of the screen, providing real-time visual guidance. This could help players who missed verbal hints or experienced confusion in their navigation.

After Findings: The design team included a pop-up window, which, connected to the game's quest system, provides to provides additional guidance to the player while still requiring the player to use their topographic knowledge. — ***After Findings***: The design team included a pop-up window, which, connected to the game's quest system, provides to provides additional guidance to the player while still requiring the player to use their topographic knowledge.

Topographic Glyph Game

During user testing, participants initially struggled with a matching task, which required pairing top-down views of topographic maps with their corresponding side views. Instead of making these connections, they attempted to match the most visually similar symbols, aligning top-down views with other top-down views and side views with other side views.

Participants' initial instinct was to match visually similar images rather than apply learned concepts.

Participant feedback highlighted this confusion. One student (P5) stated, “I think it was kind of confusing, because, at first, I thought you were supposed to match the pictures exactly–but it was a challenge, so–” Similarly, P6 expressed, “I thought I was supposed to match literally.” Another participant (P7) noted that the introduction did not include side-facing images, making it unclear that those represented different perspectives. P8 pointed out the difficulty of interpreting topography when only given the top-down perspective. Additionally, lab notes confirmed that a student initially tried to match images based on visual similarity rather than topography, but after receiving a hint, they found the task more manageable.

Prioritization: Addressing this issue was a high priority, as the misunderstanding interfered with players’ ability to engage with the intended learning objectives. Without clear differentiation between top-down and side views, students were unable to develop the necessary spatial reasoning skills, reducing the educational effectiveness of the game.
Product Revision: To mitigate confusion and reinforce correct associations, two key design changes were implemented: (1) Physical Layout Adjustment – All top-down view pieces were placed on the wall, while all side view pieces were positioned on the ground. This separation emphasized that the perspectives were distinct and should not be matched based on visual similarity alone. (2) Instructional Enhancement – The pre-game video lesson was modified to include additional side-facing topography images. This provided players with clearer examples of how top-down and side views corresponded, helping them develop the correct mental model before gameplay began.

In-game pieces were re-arranged within the environment to prevent reliance on player instincts when matching.

Conclusion & Next Steps

These testing sessions revealed several usability challenges that directly impacted players’ ability to progress and enjoy the game. The highlighted two issues—insufficient hint reminders and player instinct during the matching exercise—were all categorized as high priority due to their significant effect on gameplay flow.

I worked closely with the design team to ensure that these issues were addressed through targeted UI changes, enhanced visual clarity, and improved interaction design. These insights, captured through detailed feedback and observation, were essential in refining the gameplay experience and ensuring that students could engage with the educational content without frustration.