While many companies pride themselves on having a data science team, the field is still relatively new. As such, best practices haven’t yet been developed around how teams and individuals in this field should be managed.
But from our experience, it’s imperative that data science research be an area that is focused, with clearly defined goals. This is especially the case considering that research is a field in which people often navigate with great uncertainty; data scientists don’t know what they’ll uncover during their research and will have to make adjustments according to what they find. With this in mind, data professionals must find the balance between in-depth research and delivering impactful results. All too often, they’ll spend a long time performing research that lacks focus, sacrificing the business impact their work can make.
Why Data Science and Data Scientists are Important for Startups?
Data scientists in the professional world often come from an academic setting where the time that could be allocated for performing research is significantly larger than the time they have for research in the industry. And there’s a good reason for this; academia encourages data professionals to perform trailblazing research in uncharted waters. But the industry encourages problem-solving and POCs. As a result, data scientists often bring into their professional lives a habit of spending extended periods of time on research and ultimately fail to deliver impactful results in the needed timeframe.
Moreover, data scientists in the startup world—specifically in smaller startups—wear a variety of hats. Among other things, a data scientist in a small startup may be tasked with:
- Developing tools
- Working on ETLs
- Researching new features
- Modeling data
- Providing insights from data or building a report or dashboard
How Data Scientists Can be Most Effective in a Startup Setting
In order to ensure that the work of data scientists is well-managed and delivers impactful results, our approach focuses on defining a set of possible outcomes for each data science task. The set of possible outcomes might vary across different companies, company sizes, industries, etc., but the overall framework remains the same. By defining outcomes for every task, data scientists have a clear understanding of what’s expected of them, what their timeframe is, and what tools they need to use to complete the task. This helps ensure their success in delivering impactful results.
Here are four suggestions of possible outcomes that might be expected from a data scientist’s work:
Feature
A feature is a measurable characteristic of something that’s being observed. Features might be discriminating or informative, and identifying features is crucial for research. A task with a Feature outcome deals with finding, building, extracting, or engineering features. The result of this task is a new feature or set of features that can be used in a machine learning model, and a rough assessment of the feature’s relation to existing features. For example, what features is it correlated with? Does it convey any new information to the model, or does it capture the same information your other features capture? For the sake of keeping your eye on the ball when creating a feature, it’s always important to check its relative contribution to the prediction of the target variable (in a classification or regression setting), or to any other metric you are using in other settings.
Modeling
A modeling task is the “end game” task for data scientists. The purpose of this type of task is to advance the modeling effort. For example, this might involve introducing new features to an existing model or modeling a new problem, and finding an algorithm that is best suited for solving it. This could also involve hyperparameter tuning for an existing model; adding or improving the preprocessing stages in a model’s pipeline; sampling the data in some new way; anomaly detection; and the list goes on.
Tool
Data scientists might be tasked with building a tool that is reusable and handy. This might be a monitoring tool for a model running in production, a tool for scanning data sets and comparing them to some ground truth, or a tool that automatically QAs the data being written with some ETL. The outcome of this type of task is the development of a working tool.
Intelligence
Product managers often have many questions they’re seeking to answer. Each new activity, client, or product—even the day-to-day activities of the operation—brings up new questions that need to be answered by the data the company collects.
Often, the company’s data team is the one capable of answering these questions. It’s the data professionals who have the needed skills to properly analyze data and extract meaningful insights from it.
Data scientists often need to answer a specific question or set of questions. For example:
- How many times better is one algorithm performing over another? And over a baseline?
- Do media prices follow a cyclical pattern? For example, are weekends more expensive than weekdays?
- What types of customers are more likely to react to a specific type of ad?
- Are users from certain states in the US more likely to perform a certain action in a client’s app than users from other states? And if so, which states?
The outcome of an Intelligence task is providing a detailed answer to a specific question. This is often accompanied by a report describing how the answer was obtained, what data was used in the research, and what filters and manipulations were made on the data in order to reach these conclusions.
Conclusion
To ensure that data scientists spend their time on work that will have a business impact, their tasks must be focused and well-defined. By limiting the number of possible outcomes any data science task might have, you can create standardization across the company, as well as between product managers and the data scientists who are executing their tasks. Not only does this support a positive outcome, it also helps data scientists understand the desired impact of their work before they do it and prepare for it accordingly.
This approach helps minimize the likelihood that a product manager will, for example, expect a data scientist to deliver a simple answer to a problem, while the data scientist believes he or she has been tasked with developing a new tool that answers that question every morning at exactly 8:00 am. The goal here is a clear understanding surrounding the desired outcomes of a task between product managers and data scientists, and that data scientists avoid wasting precious time on unimpactful work.