Skip to content

Generative AI is transforming how we design, test, and improve digital services. As a user-centred design team working with major government organisations, we’ve seen how AI disrupts traditional design and user research methods. However, we have taken this opportunity to explore how teams can adapt to ensure meaningful, actionable insights. This article speaks to a recent experience working with a major government agency to implement its first governed agentic solution. The solution was developed with Azure AI Foundry and Copilot Studio, integrated within Microsoft Teams, to generate tailored guidance based on specific, trusted, and decentralised sources. 

With One Big Thing 2025: AI for all and the Artificial Intelligence Playbook for the UK Government, it’s essential that civil servants are placed at the centre of AI agent design and development. This is so AI can be seamlessly integrated as part of the service design to solve problems, not to create new ones. By prioritising user needs, we ensure that AI services genuinely free up civil servants’ time, allowing them to focus on complex, high-value work. This confidence in AI’s capabilities is key to making digital transformation meaningful, empowering staff to deliver better outcomes for the public and shaping a modern, responsive Civil Service. 

Why Generative AI Changes the Usability Testing Game 

In classic usability testing, the researcher knows the user journey inside out. Success criteria are clear, and the service’s behaviour is predictable. How users start and end their journeys will be consistent. With generative AI, that vanishes. Users interact with the service in highly variable ways. What they type, how they phrase prompts, and what they expect can differ dramatically. Users approach AI services with shifting goals and intentions, and they determine for themselves when to end their experience, rather than being served a green success banner, like in traditional government transactional services. The AI’s responses are equally variable, shaped by complex models, vast data sources and what the users have prompted.  

This variability means that “success” is no longer a fixed point. For some users, success might mean getting a direct answer or decision from the AI. For others, it’s about receiving guidance that helps them make their own decisions. Experience level and openness to AI play a huge role: seasoned users may navigate ambiguity with ease, while newcomers might feel frustrated if the AI doesn’t behave as expected, making a few prompts before losing confidence and ultimately giving up.  

Redefining Success: Collaborating with the Product Team and listening to user needs 

Faced with this challenge, user researcher Dr. Jianjian Mou led our service team to collaboratively redefine what success looks like both for users and the department. Together, we agreed on a set of key measures to bring some stability to our testing: 

  • Time on task: How long does it take users to complete their goal? 
  • Task success rate: Are users able to achieve what they set out to do? 
  • Number of interactions: How many exchanges with the AI before the task ends? 
  • Trust in the AI: Do users feel confident that the AI can support their needs? 

These metrics gave us a framework to compare sessions and spot patterns, even when individual journeys varied widely. We also paid close attention to user intent, recognising that what one person deems “successful” might differ from another, depending on their expectations and experience. As Jianjian noted, “With GenAI, every user journey is unique. The variability isn’t a problem to fix, but a behaviour to understand. Our goal was to understand how users form trust and confidence, not just whether they complete a task. This mindset has since shaped how our UCD team approaches AI experience testing across other services. 

From the first round of usability testing, or what we refer to as ‘AI experience testing’, 13 actionable insights were identified and 8 were actioned by the team. This demonstrates the department’s commitment to a user-centric approach to delivering AI. 

Iterating for Clarity and Trust 

One key insight was around the GenAI’s role as a guidance assistant. It was designed to empower civil servants to do their tasks more effectively and efficiently, and not to make decisions for them, aligning with the GDS AI Playbook’s principles.  

As the content designer, I used every opportunity during the onboarding of the assistant to inform users of the AI’s capability to guide. During testing, most users interpreted the agent’s description and welcome message as intended, but during the tasks some still expected the AI to make decisions for them, prompting for a definitive answer.  

When the AI skirted direct requests and offered only considerations, users felt frustrated that the assistant wasn’t listening. This feedback led the service team to prioritise clearer agent instructions so it can respond more effectively to users. Using this evidence, I created a taxonomy for the agent to use when it’s instructed to decide. This taxonomy defines a pattern for the AI to follow when generating a response that is direct, empathetic and helpful. This has meant users are immediately reminded of the AI’s capability in their fluid use of the service, rather than expecting them to remember their very first interactions.  

In summary 

Conducting user research with GenAI services means embracing variability, collaborating to define success, and iterating to build trust. By focusing on clear measures and user intent, teams can surface actionable insights, even when the journey is anything but predictable.  

It is a key reminder that while the look and feel of GenAI is far from the traditional government services that have been designed and delivered over the past decade, humans are still at the centre, and designing for them and testing with them is critical for AI’s success.  

Dr. Kate O’Leary

Author Spotlight

Dr. Kate O’Leary

Dr. Kate O’Leary is an expert in human-computer interaction, bringing extensive research experience across education, academia, the civil service, and consultancy. She has led user research at the Department for Education, Defra, His Majesty’s Treasury, and arm’s length bodies including the Environment Agency, Education and Skills Funding Agency, and the Office for Students. Kate is dedicated to upholding research ethics and is passionate about designing inclusive, accessible services that meet the needs of all. Her career is distinguished by a dedication to generating positive impact, as acknowledged by the Internet Society and the British Council in recognition of her contributions.