When Good Quality Ideas Don't Stick: Lessons from a Failed Framework That I Loved

Red neon sign reading "THERE IS A BETTER WAY" mounted on a brick wall

In my current pursuit to organize all of the knowledge I have stored across tools and apps, I've been decluttering my digital “stuff” and came across some old docs from a quality assessment framework we used when I started at Dropbox back in 2016.

I first learned about the Quality Maturity Evaluation (QME) framework, developed by Dropbox’s founding QA Engineer Alex Hoffer, after I had been with the company for a month or two. My onboarding buddy, also on the QA team, explained that I was to assess the quality maturity of each product team I worked with, write detailed reports, and assign each a score based on their practices.

I thought it was brilliant.

…But, it ultimately didn't survive the evolution of our org over time. Some teams found it useful but also very heavyweight, and as the QA team got smaller over time it required more dedicated time than we could spare. We tried evolving the process to account for these changes by creating a "QME Lite" version and even retro-style formats, but none of them stuck either. Still, there was so much there that helped teams think critically about and improve their quality practices.

What were those things that helped? Looking back, these are the things that stood out as genuinely useful.

What Actually Worked

Asking about effectiveness, not just existence. The QME evaluated teams across 10 categories: Release Process, Triage Process, Bug/Tech Debt, Manual Test Coverage, Automated Test Coverage, Testability, Communication, Field Bug Capture, Spec Review, and Inspection. But instead of asking "do you have automated tests?" we'd ask "how do your automated tests actually prevent regressions from reaching users?" or "how does your triage process ensure serious issues get fixed quickly?". This pushed people to think about whether their practices were actually working.

The team discussions were way more valuable than the scores. Instead of QA writing assessments about teams, we'd get everyone in a room - PMs, engineers, QA design, CX, etc - and have them talk through the team’s practices. We’d dig in to questions like "how do you handle incoming bugs and decide what gets fixed when?" and "what's your process for deciding if a feature is ready to release?". Teams consistently told us these conversations were useful even when they hated filling out the assessment.

We also evolved this approach over time based on feedback from the teams. The process started with QA engineers writing assessments about teams, moved to facilitated team discussions, then tried survey-style "QME Lite" versions, even experimented with retro "liked/lacked/longed for" formats. Each iteration taught us that teams resist assessment but embrace reflection when you change how you frame it.

Context was everything. Teams could weight each assessment category based on what mattered for their situation. A team with lower automation coverage might be doing exactly the right thing for their stage, while a mature product team with the same score would have real gaps - the scoring reflected that.

Patterns across teams showed things no individual team could see. When you're running these across 20+ teams, you start noticing patterns like "all teams in this area identified technical debt as their biggest challenge" or "teams that improved their project kick-offs consistently saw fewer post-release bugs". We'd spot things like team reorgs consistently hurting quality scores (who’d have thunk it?), and automation investment patterns that predicted which teams would struggle with releases. That kind of insight was super valuable to leadership.

It was also surprisingly effective at pointing new teams toward useful quality practices they should stand up early. By using the QME as a forward-looking framework, teams could easily identify which mature quality practices made sense to adopt for their own context and stage.

Why It Didn't Work

QMEs were great and all, but they weren’t perfect. The full evaluation was a lot: 10 categories, detailed scoring, formal write-ups with action items. Teams found the discussions valuable but the process exhausting. We're talking several hours of team time plus prep, research, writing, and follow-up for the QAE leading the evaluation.

Even our lighter versions needed more facilitation than we could manage as the QA team shrank and priorities shifted. Without dedicated support, many teams started skipping them entirely. Teams that didn't would sometimes rush through the evaluations just to “check the box”, which resulted in less valuable insights, which led to the argument that they weren't helpful… you see where this is going.

Plus the framework couldn’t keep up with how teams actually worked. The biggest example: our automation questions focused on basic stuff like coverage when teams were actually struggling with feature flag complexity, deployment pipeline reliability, and gating strategies. Unless you had a senior QAE running the evaluation who knew to dig deeper, you'd miss the real challenges teams faced.

Honestly, though, I think the biggest reason they just didn’t really work is we couldn’t consistently get teams to prioritize the resulting action items. Teams would identify the same problems quarter after quarter: flaky tests, unclear release criteria, poor incident response. But without commitment to actually fix these things, the assessments became an exercise in documenting known problems rather than solving them. Super demoralizing for everyone involved.

What Keeps Me Coming Back

Now, I wouldn’t blame you if you were reading and wondering why I want to do any of this again 😅

The thing that keeps me coming back is that the teams who bought in to the process would run into learnings that eventually benefitted the broader org. One team started reviewing specs as a team when starting new projects as a result of their QME discussions. Engineers found them so valuable that as they would reorg onto other teams without QE support they’d champion the process. Another team began collaborative test case brainstorming for complex features, which helped the whole team think through edge cases they would have missed otherwise. That practice was then shared across teams as part of our self-serve testing toolkit.

We were surprised to find that teams began using QMEs to identify which quality practices to stand up early instead of learning the hard way - they'd look at things that mature teams were doing and figure out which made sense to adopt. The teams that followed through on action items (tracking them in Jira, making them part of sprint planning, hearing leadership follow up on them) saw measurable improvements in how confident they felt about their ability to ship quality software. Some teams even shifted to holding frequent quality-focused retrospectives based on the QME discussions.

These teams figured out that the real value was in using the assessment data to identify gaps, then building shared understanding about what quality meant for their specific context and using that insight to guide their own improvements. That's what I want to rebuild - that data-driven model where teams reflect on their practices together and actually improve them. But how do you get all of that context together for meaningful quality discussions across teams without requiring a QA engineer/coach/unicorn to lead every conversation?

If I Joined a New Company Tomorrow

I don't have the perfect framework yet (is there one? 🫠), but I know that I'd start with:

  1. embedding these questions into existing team rhythms (retros, planning sessions, post-incident reviews,
  2. looking for teams that are already having quality conversations and figuring out how to amplify what's working, and
  3. making sure insights actually lead to organizational support for change

Some ideas are worth preserving even when their original implementation doesn't survive. This framework taught me that teams genuinely want to reflect on their quality practices - they just need the right format and support to actually fix what's broken. Because assessment without action is just really expensive organizational therapy 🤷‍♀️

This site uses Umami for privacy-friendly analytics. No cookies, no tracking, no personal data collected.