Acacia Parks, Ph.D., the Chief Scientist at digital therapeutics company Happify Health, recently gave a presentation in Boston that focused on strategies and approaches to clinical studies for digital therapeutics companies.
Parks broke down how, in Happify’s experience, the expectations and demands of the FDA and differ from what payers and employers would like to see in outcomes studies.
Given how popular this talk was at the event and the feedback from a number of attendees who told me it was the highlight of the event for them, I’ve written up her talk in full with a transcript of the session. Read on below for Parks’ talk in her own words:
This slide (below) shows results from a clinical trial that my team and I published that has been appealing to the payers we have worked with. It’s a randomized control study for a group of people using Happify. We have people using a comparison group. A sham digital intervention. We call it a psychoeducational comparison group. So they are learning about wellbeing but they are not being told how to improve it. We are looking at symptoms of depression and anxiety. We are looking at an index of resilience. We are looking at both statistical significance and clinical significance and they were reported on in this study. It was published, and in general, I would say when we give a sales presentation to a payer and we are trying to make the effectiveness piece the argument, not the cost savings but the part where we convince people what we are doing is working, this kind of clinical trial has been suitable for that purpose.
Things are quite different when you start talking to the FDA. There was this idea in the last presentation that you might do some work in the payer space and then submit what you have learned so far to the FDA. But that is sort of like an inside joke, right? Because unless you are gearing what you are doing to the FDA, the likelihood that what you were doing is going to meet their standards is slim. Arguably, the FDA likes to hear what you are going to do in advance, anyway. To say, well, we already did this and we are hoping you’ll say OK, might not necessarily set a good tone for an FDA conversation. I have published multiple studies but what the FDA wants is so completely different that we have to do different studies.
This talk will cover four particular design features of digital therapeutics studies. I am going to infuse in that the difference between when you are thinking about payers and employers vs. when you are thinking about the FDA. But also to infuse some of the lessons we have learned by working with the FDA and some of their opinions on these topics.
As others at this conference have mentioned, at some point, progress needs to move forward. We can’t all be learning little snippets and then hiding them in our little cookie jar and not sharing them. We need to share with each other so we can all make progress quickly.
Clinical endpoints. One key difference between the payer space and the employer space — I sort of lump those two together — where we are talking about per person per month (PPPM) type licensing models — in that study I showed you all the outcomes were self-reported. There was no problem publishing that. We don’t typically get serious, earth-shattering concerns about using self-reported scales with payers, but in the FDA world there is some suspsicion about self-report scales. Which is a problem in a digital therapeutic where we are often trying to track people’s outcomes. How are we getting those outcomes from people every couple of weeks? By self-report. We are asking them to fill out a mood scale. There are other ways to do it, but I’d say that is a pretty typical approach.
When you speak to the FDA and your primary outcome is self-report you are going to get pushback, because they want to hear clinician reports. They want to hear that a clinician has interacted with the patient and verified what the self-report is saying. You can have a fully formed clinical trial that is publishable — and that you can use with a variety of audiences — but when you try to bring it to the FDA, [it won’t work because] there is no clinician report. So that kind of thing, it is preferred in the clinical psychology world and that has allowed us to sort of serve the payer space, but in the FDA world the clinician rating is really important, particularly with psychiatric initiatives.
Psychometrics is also a thing in clinical psychology. I was always trained that when you do a study, you should always look at what other studies have done and do your best to do what other studies have done. Do your best to find one or two ways to be different but you don’t want to reinvent the wheel. So if you are picking measures for a study, you pick a similar study and you use the same measures that they used. No so much, necessarily, in the digital therapeutics case.
Many of us are in different therapeutics areas. So I can’t look at a predecessor and do exactly what I am trying to do. We wouldn’t be trying to do it, if somebody was already doing it. In clinical psychology, I might be studying depression and I’ll say, ‘oh what measures did this other person use?’ Many of our comparison cases are in pharma studies where they study drugs.
The test that is going to best measure the impact of a digital therapeutic from a psychometric perspective might not be the same test as the one that is going to do that best for a drug. It might address different aspects of the disorder. There’s a whole set of considerations that you want to add when you are thinking about measures. It goes beyond just what was in the most recently published study. You also have to think about whether it is going to serve you because what you are going to be doing as a digital therapeutic is fundamentally different from whatever the similar drug trial might have done.
I don’t publish papers without clinical significance because I believe clinical significance is important, but if I kept searching for the right journal I think I could get away with publishing an efficacy trial that doesn’t include a clinical significance index. I haven’t done that, but I expect it to be possible. In the FDA world, they kind of only care about clinical significance. That’s been my impression. Statistical significance is nice, but you have to show that the amount of change people are experiencing is enough that we care about it. They have particular opinions about how they want you to look at clinical significance, depending on your outcome. It might be hard to know what that is without meeting with the FDA first. It is high-priority to have clinical significance, and it is something that they have some strong opinions on.
So when it comes to clinical endpoints, when I first started off in research psychology, it was very much looking at previous research and trying to do what they are doing. That is not necessarily the case here because it’s about what is going to work for you. If there is one lesson I’ve learned here, it is that the FDA is making sure that what you test, the way you test your solution, is going to demonstrate that it is safe and effective. Meaning if it says your solution is effective, they will believe it. That does not mean that they are advising you on the best possible way for your study to work. That is not their role. So you might pick a measure that is not going to yield good results for you, and the FDA is not going to comment on that necessarily. It is your role to pick something that is going to maximize your chances of success. They want to see a measure of clinical significance, but it is up to you to figure out what the best measure is going to be to make that strong argument to them.
OK, comparison groups. I wrote a whole paper about this. I could talk all day about this. I’m going to try not to get lost down this rabbit hole. The main point I want to make here is that, although the published paper above and all the other papers that we have published use an active control group, there are many solutions in the digital health space that use things like: no comparison group at all, comparison groups where the groups aren’t randomized and people get to choose what they want to do, or a waitlist control group where people just receive nothing (which the evidence shows makes people worse). These are comparison groups that exist all over clinical psychology and all over the wellness space.
You will note that I particularly chose to not use those types of comparison groups, even though comparison groups like this can make an effect look quite large so they are appealing from an experimental design perspective. Compared to nothing? In some cases treatment as usual really is nothing. Lots of people, about two-thirds of people with a mental disorder get no treatment for it. So there is an argument to be made that offering a person nothing is, realistically, what they would really get. For example, if it were something like an employer setting, they might not choose to use any of the mental health services offered to them.
I’m not bad-mouthing these approaches. There are reasons people use them, but in the FDA world there is just no chance. This cannot stand.
The two types of comparison groups we have heard bandied about in conversations with the FDA are treatment as usual, which could mean so many different things. It could mean in-person treatment if you are doing something like a digital adaptation of an in-person therapy, which is what we hope we are doing. It could be psychoeducation. A doctor might explain to you about your heart condition and what things you could do to improve your heart condition, but that is the end of the treatment that you would get. It could be a medication. It could be all sorts of things. Treatment as usual typically requires some kind of physician oversight, and it is a much stronger comparison group in the sense that your intervention is equivalent to — if that is what you are looking for — or better than treatment as usual, you feel really compelled that that’s real vs. if you find something that is superior to a waitlist group, you might think, well, if I have them read Harry Potter books and they could also have that level of improvement. You don’t have any sense of comparison for just being asked to do anything or any of those placebo effects.
That leads me to the last thing, which is my impression, the FDA has become pretty passionate about the use of placebo and sham interventions in digital therapeutics, which can be daunting because there is not really guidance about what those should look like. Two years ago you didn’t need it. Now you do. At some point the FDA made the decision that a digital therapeutic has to be compared to a placebo or a sham. There has to be something that looks and feels like the digital intervention you are testing, but contains inert content. It has to be used at the same frequency, people using it can’t have the sense this isn’t real.
There are all sorts of reasons this is great. It lets you have blanket experimenters. They are getting a digital intervention but you don’t know which one. It’s also complex because there is a whole R&D process it might have to go into designing a faux-digital therapeutic. But if there is one thing I tell you that you might not have known today, because we learned it from the FDA in the past year, it is that you have to use a placebo or a sham.
That is definitely something that is not true in clinical psychology generally. That is not something I’d get as feedback from a payer: ‘Oh whoa, you used psychoeducation and not a sham? Get real!’ That kind of feedback doesn’t come, at least to me, from the payer side but the FDA is pretty passionate about it.
Of course, the appropriate comparison group comes from what you are trying to make a claim about.
If you want to know, for example, if you have a digital CBT, you might want to know whether it is not worse than in-person CBT. A non-inferiority type claim. In that case, a comparison group is going to be in-person CBT, treatment as usual comparison.
If you are trying to argue just that your digital therapeutic is safe and effective, you would have some more wiggle-room there. You just want some kind of comparable condition where you are monitoring safety and effectiveness. It could be many different things.
If you want to know if your digital intervention adds anything on top of standard care, you have to be pairing it with standard care. And then comparing it to standard care with a placebo — not alone. The not alone part is key. The FDA wants to know about placebo effects now is kind of my summary of that slide.
But it is a moving target. That has changed in the last year or two. Chances are it will change again — what they expect when it comes to comparison groups. That’s always a part of the puzzle.
I’m running out of time, so I am going to fast-forward a little bit. I want to talk about engagement for a minute because, especially if you get into non-inferiority trials and you are trying to show that your digital intervention is just as good as something else, it is important that you consider especially as you get to talking to payers that you show how it is better.
If it is equivalent (or, the FDA’s formal term is non-inferior), what that means is on primary outcomes they are not different. You want to make the argument that your intervention may not be better on primary outcomes, but it is better on engagement, accessibility, cost. Although the FDA isn’t necessarily asking you to think about those factors, you probably want to be thinking about those factors.
Engagement can be a potential differentiator if you have two studies and they have an equally-sized effect on the primary endpoint, you want to be able to show something like engagement as the differentiator.
The last one, and I think this has been a theme of the conference so far, is innovation. Two years ago, the thing that the agency changed was the type of control group or comparison group we were expected to have. We have had a lot of conversations about digital clinical trials at this conference. In general, when you are talking to a payer, the tolerance for innovation is a little bit higher in the sense that if a journal will publish it then you have an argument. If you go too far off the rails in the payer situation they may not believe it.
Everybody is nervous because the FDA can only absorb so much innovation at once. There is a lot of conversation about, well, people use these things digitally so should the trials be digital? Isn’t that more realistic? Here’s the thing, if there is another lesson I took away from the FDA it is that they care about realism — in the abstract — a lot. They don’t want to be doing things that don’t generalize to the real world.
If you ask me, despite the fact that the FDA can only tolerate so much innovation at once, sometime in the next few years, the digital clinical trial thing is going to happen. Somebody is going to convince them, and then that is what the rest of us are going to be expected to do. That is my personal opinion because that is the realistic direction of testing things in their natural environment. While the FDA can only handle so much innovation at once, this is probably an area we should be prepared to innovate in. That might not be all at the same time. It might start with hybrid trials with remote visits, and then the FDA may suggest we do the remote visits at pre- and post. No one has declared that they are going to be the ones that make this leap, but we are going to need to do it I think.
There are many ways to do this. One is just to say we want to use self-report measures in our digital therapeutic, we understand the FDA doesn’t trust them but what about validating them against clinician measures? That’s a very tiny step for making a positive direction for making digital modalities more acceptable.
As I said, there are remote visit options and someday we are going to get to the point where the FDA will be looking at fully online trials and taking them seriously.
The FDA didn’t say that to me — just so we are clear.
This is just a summary slide.
The last thing I want to say here is I am talking about the FDA, but there are other regulatory bodies in other countries and when we design products, it is important we don’t just think about our immediate little situation and the sample we are working with right now.
So this is just an example of some work we’re doing with our consumer database. Seeing whether the dose-response curve of Happify, which is just the more activities that you use the more you respond in terms of wellbeing, whether that works in a version that we translated into German and marketed to people in Germany. A version that we translated into Canadian French and made available in Canada. And so on. We are starting to see some generalizations of that dose curve, but it is important to check! What if I looked at this data and realized, ‘this doesn’t work in China’! It is easy for us to get excited and think about digital as something that is very easy to disseminate, to just assume we can. It is important to check and make sure that is the case.
While I am all about progress and moving forward as quickly as possible, also some guardrails should be there to make sure we are being responsible as we aim to take over the world with digital therapeutics.
I love being here because of the cooperation between all the companies, so here is a picture of my husband and me cooperating as we scuba dive. Thank you for the opportunity to speak and to listen to me and now we have a couple of minutes for questions.
Moderator Edward Cox, Founder 0f Dthera: To start, I am going to agree with you on one point and disagree with you on another, to bring a little edge to the discussion. You said that maybe the most important thing is that before you start your clinical trials, go sit with the FDA. That’s right. Don’t be dumb. Don’t go spend millions of dollars — and the problem is, there is an arrogance or ignorance that happens that is harmful to our entire industry. If you roll up to the FDA and say, hey I’m this cool tech company from San Francisco and you, the FDA, should be fine with this, right?
Parks: That is a good point because we are making an impression as an industry. What one company does will impact all of the other companies as well.
Cox: Absolutely. The part that I am going to push back on is the control groups and shams. The FDA is not a hive-mind. It depends on your indication. What we are seeing is it depends on what indications and what medical claims that it is you are trying to do. There are medical devices approved all the time with passive controls. If what you are doing is slip-streaming behind them in a 510(k), then that might be exactly what you do.
Parks: What I was talking about [with shams] is what happened during a discussion about a 510(k).
Cox: I agree with you, but I think this is one of the things we should bring back to what we agreed on before — go ask the FDA. Each of our indications is going to be different and will be reviewed by different groups at the FDA. Your friend at one company who says, ‘this was my exact interaction with the FDA so follow these steps, beat-by-beat,’ and then you go to the agency and have a wildly different experience. It’s because diabetes and Alzheimer’s are different. They are different groups and they are going to interact with us in different ways. This is, effectively, a false pushback, but I do think it reinforces your point that shams are the way of the future — they are — but also to ask the FDA before you run your studies.
Parks: This may be a little controversial, but — depending on whether your goal is speed or accuracy, in other words, if you wanted to do the leanest most experimental design possible — you could go to the FDA and not offer a sham and find out if they say, ‘well, what about a sham?’ And then you make a sham. But here is the problem with that. Then you have to design a sham and come back to the FDA, which adds another several months to your timeline in addition to the actual sham. So you kind of have to do a risk management assessment there and think, ‘well, should we just use a sham so maybe we can go to the FDA and they will say this sham is acceptable and we can move forward?’ or maybe they will say you don’t need a sham and you can move forward. But if they do, you will have that setback.
Athena Robinson, Chief Clinical Officer at Woebot Labs: I’m curious about payer expectations and FDA expectations in terms of follow-up. Meaning: Follow-up on timepoints sustained, clinical efficacy, etc.
Parks: I have been shocked by the general lack of requirement for follow-up everywhere. In clinical psychology, I would say, there is the expectation that you follow-up at one-year and no one wants to hear about your intervention if you haven’t done that. I have not found that so much in any of the contexts that we have discussed here today. It is much more about the immediate effect and safety during the active intervention period. I’ve never been asked to do longterm, but I do it anyway because I want to know but in the context of an FDA study that is not an area we have gotten pushback.
Cox: Thanks we are out of time!