Please log in to watch this conference skillscast.
Successful data projects are built on solid foundations. What happens when we’re misled or unaware of what a solid foundation for data teams means? When a data team is missing or understaffed, the entire project is at risk of failure.
This talk will cover the importance of a solid foundation and what management should do to fix it. To do this I’ll be sharing a real-life analogy to show how we can be misled and what that means for our success rates.
We will talk about the teams in data teams: data science, data engineering, and operations. This will include detailing what each is, does, and the unique skills for the team. It will cover what happens when a team is missing and the effect on the other teams.
The analogy will come from my own experience with a house that had major cracks in the foundation. We were going to simply remodel the kitchen. We weren’t ever told about the cracks and the house needs a completely new foundation. In a similar way, most managers think adding in advanced analytics such as machine learning is a simple addition (remodel the kitchen). However, management isn’t ever told that you need all three data teams to do it right. Instead, management has to go all the way back to the foundation and fix it. If they don’t, the house (team) will crumble underneath the strain.
Q&A
Question: The house metaphor extends to “tech debt” too.
If I want to add 1 more level (feature) to this building, what is the cost of renovating versus rebuilding it from scratch at the new feature amount of levels?
Renovating is a very different question depending upon the existing foundations.
Answer: It really depends on how well you built your foundation. A foundation built with duct tape and hope will be different than a solid one.
Question: The picture of the poorly engineered house was the image I have been needing to describe to management why a “simple renovation” is going to cost more than simply rebuilding a new foundation at this point in time.
Answer: There's a mountain of technical debt somewhere. It's a delicate surgery to get things fixed. Make sure to fix the org issues that brought the original problem or you're doomed to repeat them.
Question: What are your thoughts on cross-functional data teams versus more isolated platform/internal product data teams? Have you seen any great examples of journeys that transition between these subtypes of data teams to create value?
Answer: I talk about this in the book in the Data Ops chapter. IMHO cross-functional/DataOps teams are the highest and best usage. They create the optimal value relative to cost. However, it is an advanced configuration. IMHO you should only embark on this journey once you've established a solid foundation and friction is your biggest daily issue.
Question: Particularly interested in any experience and ideas you have working with companies that are consultancies - i.e. My company consults to other companies to solve specific problems, we develop data science models and deploy in a way the client can use them. We work with their data, but do everything else in-house. It seems different from a lot of the focus of the talks at this conference which are all about in-house data teams. What kind of different problems have you seen? How does your 3-pillar approach translate to this business model?
Answer: I think it's the same with the possibility that operations isn't there. Theoretically, your client is doing the operations. IMHO you still need data science and data engineering to this right. Another big issue will be the communication between the client and your team.
Question: On a similar question, wondering if the need for operations peeps in a data team is reduced at companies where there are data platform teams offering central platforms for data teams to use?
Answer: It's reduced but never eliminated. It's a similar thing with the cloud. You can't fire the ops team but there is a reduction. I've had discussions with companies who have done central platforms. I even interviewed one for the book. There are ops issues of who is responsible when something breaks? Who figures out what broke? Did the framework/hardware fail? Or did the code you wrote fail? Ops people have to sort this out.
YOU MAY ALSO LIKE:
- Why Most Data Projects Fail and How to Avoid It (SkillsCast recorded in June 2022)
- Upgrading your engine without stopping the car: How the FT is improving our deployment practice with minimum disruption (SkillsCast recorded in September 2022)
- Deep Learning with F#: An Experience Report (SkillsCast recorded in October 2021)
Foundations of Data Teams
Jesse Anderson
Data Engineer, Creative Engineer and Managing Director
Big Data Institute