Data science projects are implemented with built-in uncertainty. When you start, you usually don’t know if the project will succeed. You can teach the machine, and the machine will learn, but you can never be certain of the final accuracy. You don’t know whether you will reach your goals and solve the problem, or reach a solution that isn’t good enough to go live. As a leader, how do you navigate the complications that naturally arise in any given data science project?

7 boxes on your data science project success checklist

  1. Define the product goal.
  2. Discuss the level of time and energy you’re willing to put in.
  3. Understand how much data you need from the start of the project and how much you already have.
  4. Check if you can extract more data from the data you already have.
  5. Define a readiness test for the model before the project begins.
  6. Build a system that checks the data’s validity.
  7. Continuously check the accuracy level of the model as you go.

Read more about data scienceData Science vs. Computer Science: What’s the Difference?

 

Don’t Blame Your Team. Stay Communicative.

Your primary concern is the data: Is it similar to the data you will encounter in production? Is it tagged well enough? Do you have a big enough data set so that your algorithms can learn with high accuracy and create a great classifier?

One frustrating reality in working with data is that you might work really hard and take all the right actions, and still end up with poor data that results in a low accuracy level. In other cases, you might get amazing results without putting in any special effort. Developers who work with data science engineers around data science problems should be versed in the details of the project, what input the algorithm needs, and what output is expected. This helps avoid misunderstandings and wasted time along the way.

Cover of Anat Rapoport’s book Woman Up!Another way to avoid misunderstandings and unnecessary complications is to involve the data science team in decision-making, to ensure their voice is heard early on. For example, data science engineers often work with data that development collected from different sources and put into a specialized database. In such cases, involve the data science team before development arranges the data to make sure it is done in the most usable way possible.

Unlike a lot of development features, data science algorithms can work amazingly when implemented locally by the data scientist but work terribly in production. Don’t immediately blame the algorithm and the data scientist who worked on it. The algorithm might work terribly because the data in production is very different from the data tagged months earlier to teach the algorithm. Or it might be that the tagging of the data was bad so the algorithm learned the wrong things. To preserve your relationships with data science, check all possible reasons for the result in production and don’t immediately blame the people.

In some cases, the data science team will need to consult experts in another field. For example, your team might be experts in natural language processing, and at some point, they might find themselves needing expertise in another field like image processing. Becoming knowledgeable enough could take significant time, so it might be best to consult an outside expert so that they direct you to the right area of problems to learn.

 

Data Science Project Success Checklist

When you start a data science project, you can take certain steps to increase the probability of success. I am giving you my own checklist, but your specific topics and area may require additional actions.

  1. Define the product goal for the newly discussed problem. As discussed earlier, make sure to define the accuracy level in advance. Sometimes this will make the project a nonstarter.
  2. Discuss how much time and effort you are willing to put into the project. Once you understand the target accuracy level, decide if your data and tagging will enable you to reach this goal, and if not, how much extra effort will be required to get you there. Understanding the desired accuracy, the current data, and the time and effort required will help reduce uncertainty.
  3. Understand how much data you need and how much you have when the project starts. The sooner the topic of data is addressed, the better vision you will have of the project. If you know you need three months’ worth of human tagging to teach the algorithm, then it’s best to start with that now and not with the work of the data scientists.
  4. Check whether you can extract additional data from the data you already have. This is called feature extraction. For example, when you have an address in your data, you can enrich the data with the neighborhood name. Knowing the name would allow for more uses of the data, for instance, an insurance company could sell home insurance according to the neighborhood’s quality and history of home invasions. Specific companies supply dedicated solutions designed to enrich data. See if you can use them so that your results have a higher accuracy level.
  5. Define a readiness test for the model in advance. After you implement a model based on the algorithm’s learning, you have to test the model’s accuracy. Create tests that have obvious answers so that it is clear when the model fails. For example, let’s say your model classifies text for sentiment — positive, negative, and neutral. Your readiness test might say, “I am happy.” If the model identifies the text as negative or neutral, you know you have a huge problem.
  6. Build a system that checks the validity of the data. If your input is speech, for example, the number of words you get in a minute should be in a specific range, and you should be alerted if this changes. If the data has some specific distribution, make sure this distribution is more or less kept in production as the result of your algorithm. If it’s not, you might need to train your model again. For example, in the sentiment problem, you might reach the conclusion that 20 percent of the written text is negative, 15 percent is positive, and 65 percent is neutral. If the distribution changes drastically on certain days — for instance, 50 percent positive, 50 percent negative, and no neutral — then you need to verify that this is a real change in the data and not that the algorithm has started to fail.
  7. Continuously check the accuracy level of the model in production. Over time, the accuracy often decreases. When this happens, retrain the model either automatically or manually.

If you follow these steps and track the various aspects of the project, you will increase the chances of its success. 

Read more about women in techWomen in Tech Statistics: Despite Great Strides, Challenges Persist

 

A Balancing Act

I won’t lie; working in the middle between management and data science is hard. The best way to handle this balancing act is clear communication with both sides.

Represent the data science team as best you can to management. Remember that they are not in the details of the product, so they might not understand you when it comes to timelines and accuracy levels. Communicate to management what the data scientists are working on, why it’s complicated, why it takes time, and why they can’t commit to specific results. Remind them that sometimes it takes time to get and improve results.

On the other hand, explain to data scientists what the company’s goals are and why something is or isn’t acceptable. They might get a 76 percent accuracy level for a feature and think it’s amazing because it’s the best in the field. However, that might not cut it from the customer’s perspective. Data scientists think theoretically, and your job involves making sure they are connected with reality.

Excerpted from the book Woman Up!: Your Guide to Success in Engineering and Tech by Anat Rapoport. Lioncrest Publishing (May 31, 2023).

Great Companies Need Great People. That's Where We Come In.

Recruit With Us