Signs of depression can be difficult to spot, even for a trained professional. But could you teach an AI to reliably pick up on subtle changes in mood and behavior?
CompanionMX, a Boston-based startup, has created a mental health tool to do exactly that. Its platform uses AI to help users suffering from depression track changes in their emotional state over time.
To do so, co-founder Skyler Place and his team had to overcome one of emotion AI’s biggest hurdles — building an accurate data set for training. Because at the end of the day, if you collect the wrong data or get sloppy with the labeling, the resulting algorithms can quickly cause more harm than good.
The trend of emotion AI has been met with skepticism by many, for exactly that reason. In some cases, these concerns tie back to the limitations of relying on facial recognition, which some experts say results in a much-too-simplistic understanding of human emotions.
Holding Ph.D.s in psychology and cognitive science, Place knows as well as anyone that a simplistic approach won’t do.
To avoid bias and spurious correlations, CompanionMX rooted its analysis in known symptoms of depression. Through the app, the user records a 30-second audio diary commenting on their day, which an algorithm uses to analyze their vocal patterns to determine their energy level. Meanwhile, the app collects data on their texting frequency and phone calls out, as well as distance traveled via GPS — proxies for real-world behaviors like social engagement and physical activity.
Combined, these data points form a well-rounded picture of a patient’s mental state while minimizing the barrier to participation for users, Place said. Here’s what his team learned from translating those data points into a reliable model.
Training AI to Detect Emotion
- The source and quality of your training data matters as much as the quantity. A large data set can be a good thing, but not if it comes at the cost of introducing inaccurate data.
- Develop a hypothesis for the data you want to track, based on what you know about the subject and technology used. For CompanionMX, that meant predicting how a person with depression might operate on their phone.
- A diversity of annotators across gender, race and age is important in mitigating bias, but it’s also important to ensure they are qualified to annotate the data. Because CompanionMX evaluates mental health with technology, its annotators needed to be trained in psychology and engineering.
- Building a model on a mix of passive data sources and active ones can reduce bias and the burden on users. Measuring the steps a user takes via their phone’s GPS or their rate of texts can offer just as much insight into their emotional state as vocal recordings.
- Model clarity can help build trust in the algorithm — especially if it’s used in the medical field. If a doctor doesn’t understand why a data point is relevant, odds are they won’t trust the system.
Sometimes, less is more — if your data is accurate
Seven years ago, Place laid the foundation for how his team would collect emotional data.
At the time, CompanionMX and its parent company, Cogito, worked in partnership with DARPA to develop an algorithm that could track mental illness changes in veterans with post-traumatic stress disorder (PTSD). While the easiest and most affordable route to collect that data would be to conduct a self-report study, people tend to be unreliable in reporting on their own emotions, Place said.
How a person feels can change depending on the external factors around them that have nothing to do with their overall emotional state. For instance, the user might report higher levels of stress if they’re filling out a questionnaire while stuck in a long line at Starbucks. Moreover, if a person suffers from depression or trauma, they may be reluctant to talk about it.
“If your source of truth is inaccurate or fuzzy, like a self-report, it doesn’t matter how good your input data is,” Place said. “It’s almost like you’ve created a situation where it’s impossible to win, and that’s been a limitation in the field.”
Instead, CompanionMX opted to conduct a clinical study. While Place said doing so required both time and money, it allowed them to gather a more accurate picture of their users’ emotions that psychologists would trust.
For the study, the team recruited 73 participants that represented a mix of gender, race and veteran and civilian status, who reported at least one symptom of PTSD or depression verified by medical professionals. They started with the theory that behavioral patterns could be identified in real time through vocal recordings and digital trace data logged on a smartphone.
“That gave us a very robust, one-of-a-kind data set to build some of the most accurate and predictive models of behaviors that relate to mental illness.”
Each participant received a new phone, conducted a baseline questionnaire and had their clinical symptoms measured through a validated clinical interview with a trained clinician. For the next 12 weeks, participants recorded audio logs of how they were doing and had their phone data collected. The CompanionMX team tracked anything it felt could be linked to a user’s behavior state, including how many calls or texts went in and out, how the phone was handled using the gyroscope, the location of the device, screen time and battery status.
At the end, the participants went through a Structured Clinical Interview for Mental Disorders with a clinical psychologist, which is a standard method used in psychology to assess a person’s mental health. By doing the study this way, the team received more than 50 million data points from a variety of inputs to go along with 20 different diagnoses and sub-symptoms of depression.
“That gave us a very robust, one-of-a-kind data set to build some of the most accurate and predictive models of behaviors that relate to mental illness,” Place said.
Feature engineering can help determine what data to track
How does phone usage change with depression? That’s one of the questions CompanionMX needed to answer to make sense of its data.
To sort through its trove of information, the team used a hypothesis-driven process called feature engineering, Place said. Based on what is known in psychology about depression, the team created theories for how typical symptoms would manifest through phone usage patterns. Important symptoms included depressed mood most of the day, diminished interest in activities, fatigue, and avoidance of activities, places and people.
For diminished interest, the team predicted that a person would send fewer text messages and travel less from their home. To measure fatigue, they hypothesized that a person would make fewer calls and send fewer texts.
“It’s a process where we try different features, evaluate their predictive power and eliminate the features that were not predictive.”
In total, the team started with 14 of these input features based on the digital trace data. From there, CompanionMX began the trial-and-error process of building its algorithm models.
“That’s where the model building runs through an evaluation and an iterative process,” Place said. “It’s a process where we try different features, evaluate their predictive power and eliminate the features that were not predictive.”
The initial approach predicted symptoms using all 12 weeks of data, but the results revealed that it would be more accurate to focus on data one week prior to the users’ clinical assessment, according to the report.
Through extensive data exploration, the team also identified additional features that didn’t accurately represent user behavior. For example, the team eliminated any information regarding battery life and screen time because each device measured the information differently.
“When you’re designing your data set, you need to ask yourself where the variability is coming from,” Place said. “Is the variability coming from human behavior, which you’re trying to predict, or is it variability in some other attribute you can’t control for? Screen off and on seem like simple data points, but it was actually being measured differently by different device manufacturers. So the same human behavior would lead to different outcomes.”
Keeping your model simple can help build trust
One of the final rules Place had for his team when it came time to build their algorithm was that it needed to use features that were based on patterns humans can easily recognize.
When you’re working with millions of data points, it can be easy to create a model that is statistically accurate but impossible to explain. Since CompanionMX aimed to be a resource for psychologists and psychiatrists, Place said it was crucial that their model made sense to them.
“One lesson we learned was that you don’t need to jump to the most complicated inputs or throw the whole kitchen sink into it at the start,” Place said. “Working from data that is as clean as possible — and that you understand — can be beneficial to modeling outcomes.”
The team landed on three digital trace features all easy to understand in relation to human behavior: text message count, distance traveled and phone calls out.
“Working from data that is as clean as possible — and that you understand — can be beneficial to modeling outcomes.”
“What success looked like for us was not just having an accurate model, but having a model that we could explain to a psychologist in a way that they would feel comfortable and confident in using to make decisions about treating someone for mental illness,” Place said.
Building the right team to label vocal data
To train its second algorithm, CompanionMX needed to label 847 audio diary clips of at least 30 seconds in length.
Within those recordings, the team focused on the rate at which a person talked and their energy level.
While the speaking rate could be quantified through a process that measures the number of syllables used per minute, interpreting energy levels required a team of data labelers. In order to reduce the risk of bias, CompanionMX compiled a mix of 10 data scientists, both men and women, with backgrounds in psychology and engineering, to label the data.
“It gave us the data necessary to make corrections and adjustments to account for the variability in human perception and human performance.”
The labelers listened to the vocal recordings, and ranked pitch variation and vocal effort on the Likert scale of one to five. Since the data required human interpretation, the scale helped labelers find high levels of inter-rater reliability among the ratings.
“It gave us the data necessary to make corrections and adjustments to account for the variability in human perception and human performance,” Place said.
With the data labeled, the team built an algorithm that could conclude from the audio diary the user’s mean pitch variance and mean vocal effort. That information is then used in the team’s final model to identify whether the person exhibits a “depressed mood most of the day.”
Take a rigorous approach for testing
With the final algorithms in place, CompanionMX measured its model’s accuracy through a process of 10-fold cross-validation area under the curve testing — a commonly used method to measure an algorithm’s true positive rate.
The model cleared CompanionMX’s 50 percent threshold, accurately identifying each of the four depression symptoms at a rate higher than 50 percent of the time. But Place didn’t want to stop there. In order to earn the trust of clinical professionals, the team aims to prove its usefulness through a clinical trial.
“It’s going to be a tool that’s going to allow your doctor and the patients to be able to make better, more informed decisions.”
The results of the clinical trial haven’t been published yet, but the process involved testing its product with a group of doctors and patients against a control group. The team is also working on expanding CompanionMX to identify other mental health conditions, like bipolar disorder.
When it comes to understanding how you really feel, however, Place knows the only people equipped to answer that question are you and your doctor. CompanionMX is just there to provide another data point.
“I don’t think we’ll be in a world where AI is going to replace your doctor,” Place said. “It’s going to be a tool that’s going to allow your doctor and the patients to be able to make better, more informed decisions.”