I shall perhaps start this article by distinguishing (albeit in a broad manner) between a discipline and a philosophy.
A discipline is a branch of knowledge, a field of study, or an area of expertise. Whereas philosophy is the pursuit of wisdom, and thus a philosophy is a set of philosophical concepts, beliefs, teachings, doctrines, ideologies…
I think you’re already getting the gist of where I am going with this, right? Agile is a philosophy, It is not a methodology.
Agile is a philosophy that guides decision-making. It is not a methodology, much less a workflow or process.
Agile was developed in the 1990s as a counter-movement to Waterfall’s restrictions in software development. This culminated in the Agile Manifesto in the early 2000s.
The Manifesto stresses principles such as collaboration, adaptability, and simplicity. And data science operates on scientific foundations that guide its approach to analyzing data and extracting insights from it such as empiricism, testability, and cumulative knowledge. Ergo, one can imagine many ways in which Agile's principles can be assimilated into data science practices.
But if you sift through online discussion forums or talk to data scientists in your entourage, you'll see and hear some data scientists sharing horror stories about Agile. So where does the rift come from ?
A sea of misinterpretations
I’ve come across many misinterpretations, and used to have some myself, from which, in my opinion, a lot of misunderstandings and faulty implementations stem. But in this article, i’ll just list some common ones:
- Doing things faster, hence avoiding proper problem framing and project specifications and requirements.
- Going agile will help deliver more data science results. Going agile does not solve data value chain bottlenecks, and promising that can lead to big expectation misalignment.
- Breaking down data science work to thinly granular tickets. Hence, a backlog overload and a lot of time wasted on synchronizing the team's work instead of focusing on value steams. Remember: Agile is neither a euphemism for micro-management nor for laisser-faire.
- Emphasis on continuous delivery. Focusing on continuous delivery without taking a holistic approach to bridging Agile principles with data science workflow specifications can lead to delivering faulty products, which in turn might lead to mistrusting the data science team and its deliverables.
- Leadership dictating scope alone because data scientists are “just too technical”. Leads to misalignment in vision and misunderstanding what’s feasible. This can also lead to a frustrated and disengaged data team.
When implementing Agile methodologies, perhaps the crucial first step is minimizing misconceptions around them. By educating leaders, organizations can ensure that those in steering positions understand what these methodologies entail and where they start and stop benefiting their goals, their leadership and their teams.
With that being said, and as I hinted above, some Agile principles can indeed be adopted in data science work.
Can Agile be adopted by data scientists ?
I don't see why not ! The adaptive and iterative approach of Agile can certainly complement and enhance the data-driven and experimental approach of data science. So I do think there is a lot of common ground we can leverage as data scientists and/or as data science managers.
In each card below, I'll summarize an Agile principle or belief, contrast it with data science practices' and projects' specifics, to then suggest some approaches that can combine the two.
Data Science
Data projects might have high degrees of uncertainty.
Data Science + Agile
- Deploy early and update continuously
- Minimally viable product (MVP) approach
Agile
Recognizes that it's very difficult to plan all aspects of complex software upfront.
C.1
Agile
Promotes frequent delivery of working software with a preference for shorter timescales.
Data Science
Empiricism is at the core of the discipline.
Relies on iteration, feedback loops and incremental improvements.
Data Science + Agile
- Decompose the problem at-hand and continuously test results.
- Develop incremental improvement processes adjusted to the team’s and project’s context.
- Maintain high code quality standards and invest in modular code.
C.2
Agile
Builds projects around motivated individuals, and emphasizes the need to providing them with the environment and support they need, and trusting them to get the job done.
Data Science
Data projects necessitate specialized expertise, data processes, infrastructures tools and technologies, security and privacy measures…etc.
Data Science + Agile
- Recruit motivated and passionate people.
- Provide the right environment.
- Trust the experts to handle the complexities of data science projects.
- Understand the specifics of data science projects and where their complexities lie.
C.3
Data Science + Agile
- Give priority to cultivating a trusted environment where communication can emerge organically.
- Grow a culture that embraces challenges and failures.
- Set-up regular team-meetings for work progress, demonstrations, and knowledge sharing presentations.
- Drop the stand-ups as the predetermined channel of communication for all project phases. They do not align with every phase of data science projects' life-cycle.
Agile
Recommends face-to-face conversation as the most efficient and effective method of conveying information within and to a development team.
Data Science
Workflows are continuously adjusted and updated to accommodate new findings or new modeling decisions and/or requirements.
C.4
Data Science + Agile
- Keep stakeholders involved throughout the process.
- Keep constant negotiation with stakeholders.
- Aim to maintain stakeholder interest through frequent delivery of working products.
- Assess regularly if implemented methods are optimal.
- Keep focus on the problem at-hand (what worked in one project might not work in another.)
- Promote and encourage autonomous learning and hire people who believe and actively seek to learn more about all aspects of the discipline.
Agile
Promotes sustainable development practices, in order for sponsors, developers, and users to maintain a constant pace indefinitely.
Encourages teams to regularly reflect on how to become more effective, then tune and adjust their behavior accordingly.
Data Science
Data projects can have long cycles before producing tangible results.
Relies on iteration and feedback loops.
Every problem has its own characteristics.
Is a fast evolving discipline (esp. the machine learning part)
C.5
Data Science + Agile
- Acknowledge that some phases take time.
- Empower your team to navigate and process them better.
- Don’t be rigid with methodologies and framework application.
Perhaps this is more of a general recommendation! methodologies and frameworks should work for the context and not constrain it
Agile
Enhances agility through continuous attention to technical excellence and good design.
Believes the best architectures, requirements, and designs emerge from self-organizing teams.
Data Science
Data projects demand precision, meticulous attention, careful considerations and consistency.
Data science is also a socio-technical discipline.
C.6
Where does Agile adoption in data science projects stop working ?
As we've seen, one can envision and implement many possible ways to assimilate Agile principles into data science practices.
However, there are still some areas where I don't see data science work benefiting from Agile methodologies.
When Data science + Agile becomes a methodological nightmare
Just for fun, I asked ChatGPT to suggest titles similar to mine. Some of its suggestion, I think, will illustrate what I'll talk about in this section:
Prompt: Suggest more titles like this: "When Data science + Agile becomes a methodological nightmare"
Picked results:
"Agile vs. Data Science: Balancing Speed with Accuracy"
"Scaling Data Science in Agile: When Flexibility Turns to Frustration"
"Navigating the Agile-Data Science Paradox: When Innovation Slows Down"
Catchy titles aside, there are elements I see as forced fits or ill fits when it comes to assimilating Agile principles into data science projects cycles or workflows:
- Forcing timelines to fit into fixed-length sprints;
- Adopting a product approach throughout all phases of a data science project cycle;
- Exclusively assessing quality proactively;
- Prioritizing delivery over comprehensive and extensive documentation.
I'll elaborate on each element one by one.
Unpredictable timelines vs Fixed-length sprints
The data science value chain includes many phases that can be unpredictable.
Problem understanding, for example, can sometimes span across weeks because of the necessity to federate different stakeholders from different business units (BU) and the difficulty to align on a scope for the project especially when BUs have different priorities.
Another issue I typically observe is the difficulty to scope data science projects. From my experience, the modeling aspect of data science projects makes it difficult for stakeholders, especially those who didn’t work closely with data teams before, to rapidly grasp the potential nuances that go into the scoping phase.
Also, the stochastic nature of data analysis, data exploration, modeling, and model training does not fit with the short and fixed-length nature of sprints. Sticking with such time constraints can pressure the data team to deliver under-developed or faulty products and foster a culture of delivery over quality.
Exploration & Research vs a Product approach
Agile methodologies were developed to orient product development in software value chains. While data teams strive to deliver data products as well, research and exploration underline a lot of the work being done. This can heavily impact the project's roadmap.
The product approach can also foster a culture of risk aversion, because data scientists will avoid experimenting with techniques, approaches or methods that can lead to better results but require time.
In addition to that, data scientists will often require more periods of uninterrupted focus to deep dive into a project's specifics and complexities. Rituals, such as stand-up and unscheduled face-to-face communication can heavily disrupt this process, resulting in the opposite of what Agile intends, i.e. higher efficiency and productivity.
Quality engineering vs Quality assurance
One of the practices of data science is to build quality into the project or product from the start. This why data science revolves around quality engineering instead of quality assurance.
The unpredictable nature of data projects requires a more active and/or reactive approach rather than the proactive one of quality assurance.
Documentation maintenance & knowledge management
Agile believes in prioritizing working products instead of complete and comprehensive documentation. This would just be an irresponsible choice, in my opinion, in data science projects where a lot of decision making is taking place in terms of data engineering, feature engineering, modeling assumptions, experiment design, feature selection, modeling choices, ..etc
Conclusion
Agile principles and methodologies can offer benefits to data science teams and projects, but they can easily turn into a "be careful what you wish for" situation if applied rigidly and not made to align with the specifics of data science work.
Leaders and teams should choose and adapt frameworks and methodologies to their context and continuously educate themselves, learn and test what best works for them.
Frameworks and methodologies are means to an end, puristic approaches will only hinder innovation and progress.
Read more on the subject
Jurney, R. (2017). Agile Data Science 2.0
Baijens, J., Helms, R. W., & Velstra, T. (2020). Towards a Framework for Data Analytics Governance Mechanisms. In Proceedings of the 28th European Conference on Information Systems (ECIS2020) Article 81 AIS Electronic Library. https://aisel.aisnet.org/ecis2020_rp/81/
Cordeiro, R., Alves, I., Alves, S., Goldman, A. (2024). Being Agile in a Data Science Project. In: Kruchten, P., Gregory, P. (eds) Agile Processes in Software Engineering and Extreme Programming – Workshops. XP XP 2022 2023. Lecture Notes in Business Information Processing, vol 489. Springer, Cham. https://doi.org/10.1007/978-3-031-48550-3_6
Citation
Please cite this article as :
NNZ. (Oct 2024). Agile and Data science. NonNeutralZero.
https://www.nonneutralzero.com/blog/engineering-7/data-science-and-agile-a-love-hate-relationship-19
or
Data Science and Agile: a love-hate relationship ?