Making sense of practice levels

Two of the most common questions we get asked when someone is thinking about starting a new activity are: "How often should I do this?" and "How long do I need to do it for?"

These are great questions!

Unfortunately, they don't have simple answers.

When we started building Bearmore, we assumed we'd find clear research-backed guidelines for how to practice meditation, exercise, journaling, or any of the 40+ activities we wanted to include. What we found instead was a fascinating mess of specific studies, evidence gaps, and the complex reality of translating academic research into practical guidance.

This is the story of how we built our "Practice Levels" feature, a framework that tells users how long and how often to practice different wellbeing activities. It's also an honest account of what we discovered about the state of wellness research, and why giving people actionable advice is harder than it looks.

Extracting (actionable) evidence at scale

To handle this systematically across dozens of activities, we developed an AI-powered research process. Rather being exclusive to human researchers spending 6-18 months on a traditional systematic review for each activity, we built structured prompts that guide AI assistants through the evidence extraction process.

The AI examines our top-rated peer-reviewed papers for each activity, extracting specific mentions of practice duration and frequency, noting where this information appears, and documenting any outcomes associated with different practice levels. It then maps this evidence (or lack thereof) to our five-level framework: Brief, Short, Optimal, Long, and Exhaustive for duration; New or rarely, Infrequent, Optimal, Frequent, and Maxing out for frequency.

Crucially, every recommendation is clearly labelled with its evidence strength: "Direct evidence," "Indirect evidence," or "No direct evidence in our research database." We're not trying to hide the gaps. We're making them visible so users can make informed decisions about where to experiment.

The problem: evidence exists, but it's scattered and specific

Here's what surprised us most: the evidence actually exists.

After analysing 43 activities through our AI-assisted research process, we found at least some evidence supporting about 62-68% of our practice recommendations. Researchers have studied the wellbeing benefits of practicing our activities at these levels. The problem isn't that researchers haven't studied the questions, it's that their answers are incredibly specific and often don't translate neatly into general guidance.

Take acts of kindness. We found studies showing that people who performed "3 kind acts in a single day" or "5 acts on one day per week" experienced meaningful improvements in positive mood and mental wellbeing. That's wonderfully specific, but incredibly narrow. What about 2 acts? What about spreading them across different days? The studies don't tell us.

Or consider breath work, where we found evidence for specific protocols: "4-7-8 breathing for 4 cycles" or "box breathing for 10 minutes." But if you want to do 6 cycles instead of 4, or practice for 8 minutes instead of 10, you're extrapolating beyond what the research has actually tested.

This specificity creates a translation problem. Users don't want to know that "3 acts of kindness on Tuesday" were effective in a lab study. They want to know: "If I'm just starting out, how often should I aim to practice? If I want the optimal practice level to benefit ratio, what does that look like? How much is enough? How much is too much?"

When evidence is missing: the 30% problem

About 32-38% of our recommendations had to be inferred from indirect evidence or logical reasoning. This isn't because researchers are being lazy. There are systematic reasons why certain questions remain unanswered:

1) Small sample sizes and short follow-up periods: Wellbeing outcomes change slowly and are influenced by countless life factors. You need large, long-term studies to detect meaningful differences between, say, doing yoga 3 times per week versus 5 times per week. These studies are expensive and take years to complete.

2) Lack of industry funding. Unlike pharmaceuticals or sports equipment, there's no commercial sponsor paying for multi-arm, dose-finding trials of gardening or gratitude practices. Academic researchers are often working with limited budgets on shorter timescales.

3) Ethical and expectancy effects. When participants know they're "doing something good for their wellbeing," it boosts placebo responses and makes it harder to compare different "doses" of the same activity. You can't exactly give someone a placebo meditation session.

Take AI therapy as another example. We found papers describing digital mental health tools and their usage patterns, but no studies directly testing whether 3 sessions per week works better than 5, or whether 20-minute sessions are more effective than 45-minute ones. We had to infer recommendations from typical trial designs and clinical reasoning. When this is the case, we will clearly note it: e.g., "No direct evidence in our research database; inferred from typical usage patterns in digital mental health interventions."

Moving beyond fixed "Levels"

Our first attempt at practice levels used fixed combinations ("beginner," "intermediate," "advanced") with set frequencies and durations. It didn't work. The evidence was too diverse, and real-world practice too varied, to fit into neat categories.

We switched to a matrix approach that combines two independent factors: how often you practice (frequency) and how long each session lasts (duration). This creates 25 possible combinations rather than 5 fixed levels, allowing us to accommodate the specificity of research while giving users flexible guidance.

For example, someone might practice breath work at an "Optimal" frequency (6-9 sessions per two weeks) but with "Brief" duration (under 10 minutes per session), landing them in our "Moderate" practice level. Someone else might do "Infrequent" sessions (2-5 per two weeks) but with "Long" duration (30-45 minutes), also reaching "Moderate" level through a different path.

This matrix approach let us preserve the specific findings from research while creating a framework flexible enough to accommodate how people actually practice.

Current limitations and future plans

We're transparent about what this system can and can't do. Most importantly, these recommendations don't account for "activity stacking". Our recommendations assume you're doing one activity in isolation, because that's how research typically works. But real users often combine multiple practices (e.g., meditation plus journaling plus exercise) and we don't know how that changes optimal frequencies or durations.

We're also working with research that typically studies relatively homogeneous populations. Real-world effectiveness might vary based on your specific circumstances, goals, and how activities interact with each other.

The exciting development is that we're now tracking user outcomes. Every Bearmore user reports their mood, energy, and sense of calm before and after activities. Over time, this data will let us validate our research-based recommendations against real-world results, and adjust them where the evidence and user outcomes diverge.

What we've learned

Building this system taught us that evidence-based practice guidance is both more possible and more complex than we initially thought. The research exists, but it's specific, scattered, and often silent on the practical questions users most want answered.

The key insight is that perfect evidence may never exist for many wellness practices, and that's okay. What matters is being systematic about finding the evidence that does exist, transparent about where we're making inferences, and honest about the limitations. Users can then make informed decisions about where to follow the research closely and where to experiment based on their own experience.

Most importantly, we learned that "evidence-based" doesn't mean "evidence-complete." There will always be gaps. The goal is to map the territory honestly, marking where the paths are well-travelled, where they're faint, and where you might need to do some exploring of your own.

The Practice Levels feature is available across all activities in Bearmore and we'll continue to update our guidance as new research emerges, and user outcome data accumulates.