Alan Jacobson, Alteryx’s Head of Data and Analysis, talks about leading data scientists who create tools for use by other data researchers.
Alan Jacobson, Alteryx’s Head of Data and Analysis, joined us in the Data Science Mixer podcast to talk about leading teams of data researchers who themselves build tools used by other data researchers, including the Alteryx platform and open source Python libraries EvalML, Featuretools, Wood.
Alan shared with us what makes a solid data science team, how he thinks about the interpretability of the model, and how to communicate clearly about data science. Here are three “main components” of the conversation that make you think about these big questions in the industry.
Some of the best data scientists I’ve ever worked with have had an incredibly different background.
Builbuilding a successful team – and this isn’t just about data science; it’s true, I would say, for most, if not all, teams – one of the good arts is building a very diverse group of people. And a science that is very clear: Diversity produces better results for teams.
There is no doubt that when you deal with the combination of problems we deal with on a daily basis, getting a wide variety of people will certainly help. Some of the best data scientists I’ve ever worked with have had an incredibly different background: geologist, engineer, English major. They all came from different experiences. I think that’s one of the keys to building great teams – that ability has a wide range of abilities to leverage.
Which machine are you interested in boarding?
Let’s say you get on a plane, and I can show you all the math of the model we’ve designed the plane with. I can show you completely openly all the formulas and all the math. Glorious. Or I can tell you that we flew a million times and we have a model that worked 100% of the time. Want to be on a 1,000 001st flight?
You can either choose a plane that has worked a million times in its history and it was never wrong – or one that has never been flown before. It has no history, but I can show you all the math. Which machine are you interested in boarding? Personally, I would take the one who has flown a million times and who has worked every time.
Machine learning is somehow on this path, using a lot of historical data and building models that match historical data, compared to perhaps more of a statistical econometric approach using formulas. So there are different approaches. But when it comes to real transparency once you’ve built a model, machine learning makes it easy to understand how the model works and what it has. You can see the formulas if you want to see the formulas. I don’t know that seeing a formula necessarily makes it more understandable. I don’t think art is transparency – can I see everything in the box – but have I instead made it understandable enough for you to really understand what’s going on?
Take professional language and use examples that we can all illustrate in our daily lives.
Data science is certainly not very hard. There are, of course, concepts in knowledge that are more difficult than others. But most practicing data scientists do – it’s not thermodynamics. Thermodynamics was a very difficult course. At least for me, it was a really difficult course. Multidimensional calculation. It’s a pretty abstract, hard-to-describe thing.
I think most of the principles of data science are things that you can explain pretty clearly. I have two children. They are high school and high school students, and they can understand these concepts. Often when I train these concepts, I try to remove professional language and try to use examples that we can all illustrate in our daily lives.
I feel really passionate that data science is a field that should ultimately be part of what everyone does. It’s not just a Ph.D. data scientist to do sciences. It’s math. It’s for everyone. That’s not to say there aren’t profound, complex things you need a data scientist to do, but I’d really like to see that everyone in their daily work is able to take advantage of this thing.