Formulating business goals as modeling tasks is a real measure for a data scientist

Think of formulating the problem as a two-sided diagram with the goal of finding the best method in the right column of the left column business task. Some tasks can be solved in many ways, and some methods may not be suitable for any task. Image by the author.

The problem design is a data processing solution design process for a business problem. I assume that in this post, the business problem is defined and given; a different but related problem is to add value to the business from existing analytics or modeling.

  • “What information is used”
  • “What to do with the forecast”

The first thing we face is calibration. Many binary classifiers are trained and tuned in a way that just depends investment the result of the prediction points, not their actual values.¹ Assume that there are 3 users in the confirmation set and the predictions of the two models, A and B:

OK, we solved the calibration problem and we now have a flawless refractive prediction model. What should a customer success team do with it?

  • What if each user reacts differently to the actions of the customer’s success team? Does it matter whether the user’s pacing probability is 20% or 80% if our intervention has no effect on that user?
  • What if the customer’s success team contacts every customer in any case? In this case, our prediction model is useless; we should have learned instead What intervention is most effective.

Our articles trade-offs in conversion rate modeling shows an example of the importance of selecting a forecast destination. The ultimate task is to understand how well users convert from one step of the sequence (e.g., the funnel) to the next. We can formulate this as a data science task in two ways:

  1. Model how long it will take users can convert. This is a properly censored numeric target.

Excellent problem formulation is one of the clearest hallmarks of a strong data scientist and one of the most important things I look for in an interview. What makes someone good at it?

  • Integrity, to ourselves and our audience. We should give priority to the limitations of the methods and the correct interpretations of the results, especially when we know that the approach does not completely solve a particular problem.
  • Extent of knowledge, through a lot of hands – on experience with real, applied problems and reading from the experiences of other researchers. To evaluate alternative formulations, we need to be aware of the alternatives.
  • Anticipation, imagine a roadmap and architecture for each potential solution and identify the pros and cons before committing to one option.
  1. Logistic regression is a significant exception that has been calibrated through construction.
  2. The AUC is also identical for the two model outputs because the true positive and false positive rates are the same for all model thresholds.


Please enter your comment!
Please enter your name here