Simplified Agent Install

My Role

I led a Product Design team of 2 from my team of 5 direct reports to form a project team. I was responsible for planning, requirements gathering, strategy, wireframing, research, customer interviews, and executive presentations. My team also partnered with the UX research team, and Product Managers for research, scope, and roadmap planning. I also partnered with Engineering leads to plan project feasibility, and resource assignments.

I was primarily involved during the discovery and definition phase, and acted in a supportive role as the team carried the project into the development phase.

Agent Install

AppDynamics is like an alarm system, the product monitors application performance and tells people when there’s a problem. Traditionally finding that problem takes time, and the longer it takes to find that problem the longer the app is down, and customers lose money. Fixing the problem fast means they don’t have to spend time hunting, and their apps experience minimal downtime.

The way people monitor their applications is by installing what we call “agents”. The agents sit on the server where the code lives, and watches the code. If there’s a problem, the agents report it. We have a different agent for each programming language an application is written in. It’s not easy to install these agents. In fact, it’s quite cumbersome. When I started this project, installation took hundreds of steps administering complicated code with a detailed plan.

Further Down the Rabbit hole

Additionally, people usually have different versions of their applications, hosted in different environments on different servers. People do this for testing and security reasons. Installing many agents on many servers is something that cannot be done all at once, usually requires customer support, and a lot time intensive custom naming structures. It has taken some customers around 6 months to define all the custom naming, and complete installation. We’re losing customers due to this complexity and it’s one of the biggest pain points we hear.

The scope of this project was to completely redesign the agent installation experience.

Persona

Aparna, DevOps

Aparna works on a DevOps team. She started her career as an IT admin and used that experience in her new DevOps role to write Python scripts for simple automation tasks. During the development of the new platform, which was a long and often painful learning experience both for the engineers and DevOps, she wrote dozens of scripts to stitch together the pieces, and automate the deployment of a heterogeneous system.

What she likes: Every day is different. Learning and playing with new technologies, being in control of that complex landscape, improving her tool set continuously, problem solving.

What she dislikes: Routine work, such as pulling reports for management.

We can do better for Aparna

One of the most painful experiences within AppDynamics is agent installation. Agent install is a very manual process. In 2018 we lost significant contracts due to the difficulty of agent install. The agents need to be downloaded one at a time. When our customers want to monitor multiple applications, or even one application running different programming languages, they have to download each agent individually. There’s no way to download multiple agents all at once.

The agents cannot be managed in the product UI, and there’s no way to pause or resume monitoring in the UI. Updating the agents is also a manual process with no ability to trigger an update across all the agents.

Before an agent is deployed, it must first be mapped to an app and tier. This is a manual process, that requires significant up front planning. All this effort takes time away from getting right in and monitoring the environment.

It’s too complicated to install agents, and we’re falling behind the industry
It’s a painful experience, and the people that rely on AppDynamics to do their jobs are starting to look elsewhere.

Goals

Reduce manual effort; The installer can automatically detect which processes are running and decide whether or not they should be monitored.
Provide clarity in an interface; Instructions are clear and easy to understand
Reduce time to set-up; The user can install multiple agents, in fewer steps without changing context

There has to be a better way!

When I began working on this project, I immediately partnered with the lead Product Manager to make sense of this mess. We started by working backwards. We wanted to create a vision of the best possible experience and figure out what it would take to make it a reality. We were losing customers due to how difficult and time intensive our agent installation process was, and we knew there had to be a better way.

Understanding the competitive landscape

I wanted to better understand where the industry was. So we conducted a competitive analysis. We looked at Dynatrace and New Relic. I poured over the technical documentation taking it all in—looking for opportunities to improve. Dynatrace has a single agent installation experience, so I know the technology is within our grasp. What would it take for us to do better? To understand this I started talking to Engineers.

The Engineer had my ear

What I learned is that it’s basically a grouping and sorting exercise. People had objects; in this case processes, that were part of a virtual host environment, and they needed to be grouped into apps and tiers for monitoring. This all sat on a physical server.

Part of what took so long during set up is that before an agent could be deployed, it had to be physically mapped to a specific app and tier. Any additional categorization had to be included as part of lengthy naming structure. Each agent had to be downloaded one at a time, with no ability to download all the agents at once. Some companies have thousands of apps, across hundreds of servers.

Strategy workshop

Once I had a working knowledge of the problem space the PM and I met with a Lead Engineer and started workshopping a strategy. We wanted our vision to be grounded in reality, so we brainstormed some approaches. Could we bundle our agents into a package that could automatically detect all the processes running on the host environment, then automatically suggest how they should be grouped? Could we also suggest human readable names based on the processes that were detected? Thus taking out all the guesswork?

Categorizing the dream

We knew from observational inquiries, that people needed to differentiate the host environments on the server being monitored. How could we make that more simple? We needed to create some way people could categorize and sort their processes during and after setup.

Some things people need to categorize are service, application, business unit, geography, environment or something else. We landed on a tagging concept. Once the agent was deployed on the server, it could theoretically detect all the process running, and suggest the apps and tiers it belonged to. It could even suggest tags. We could present all this in human readable language in a UI that allowed the user to choose which of the processes they wanted to monitor. The system could learn, and it could all be automated on additional servers. That was the dream.

Mapping the possibilities

Once agents start reporting, they visually map onto a topology we call the flow map. I wondered if we could take this flow map and make it part of the agent install experience to visually illustrate that the agents installed correctly, and show how the environment was connected?

Once we had a vision we aspired to, we set out to build it. I pulled in the Product Designers on my team and we started drafting wireframes while the Engineering team built a proof of concept.

A grounded concept

The concept is grounded in the ability to take one package, called the agent installer, and simply run the command on the target host. All the user would have to do is copy and paste two commands. From there, they could go to the UI and see all the things that could be instrumented and gain the ability to toggle the instrumentation status. It would be presented in human readable names, not a long complex structure.

From prototype to stage

I needed reinforcements, so I pulled in two of my direct reports. We revised the wireframes, fleshed out the concepts, and put together a prototype.

I developed a narrative around it and presented it to executive leadership. I also presented it alongside the lead Product Manager to an audience during our annual product kickoff conference.

We had some concerns

Coming out of the conference, there were some concerns around how the flow map would perform. There were also concerns about our ability to support tagging with the current tech stack. Once the Engineering team came back the with proof of concept, we had an idea of what would be possible.

Narrowing in on the vision

We were ready to make the dream a reality. I asked the team to explore how the experience would change without the flowmap, or tagging. Meanwhile, The Product Manager started scoping requirements for the engineering team. With a narrowed scope and requirements, the team developed revisions that we could build into the current product as a beta version.

UX Research results

With a new prototype, we partnered with the UX Research team and conducted a small usability study with 4 participants. It tested high in ease of use, but relatively low in expected functionality.

The biggest concern was around scale, and the ability to search, filter and save filter sets. There was also some confusion around pagination, and people wanted more visibility into agent upgrades— specifically for this to be an automated process.

This iteration was lacking filtering and pagination.

Revisionist history

My design team revised the mockups based on what we learned from research. The team improved filtering and searching. They also included a robust pagination solution. Through our partnership with the Engineering team, we learned that we could include automatic upgrades.

During iterative agile cycles, the team built and launched a limited beta experience. This bundles two agents, includes the first version of automatic tier naming, and instruments all or a subset of processes on the target host.

Based on user feedback we included the filter on the left, and pagination to handle scale on the bottom right.

Some takeaways

We successfully launched a beta of the simplified version of agent install, supporting automatic upgrades with two of our most popular agents—Machine Agent and Java, with .Net, Node.js, Windows, and Linux on the way. With the beta in the hands of select customers, the team will continue to collect feedback, and make incremental improvements as we prepare for a general audience release. Tagging is still something the development is working to support. Two of the target metrics the team will be looking at is a 10% increase in user adoption, and decrease in hours to deployment— which is currently at over 2 hours. Agent install no longer has to be such a painstaking manual process. Whew!