Malt

Agile had its beginnings in technology with the Manifesto for Agile Software Development. It’s a simple yet comprehensive framework that explains the four pillars of the developer’s definition of agile to the broader public.

What are the benefits of agile in data science? The relatively new field of data science revolves around the analysis of data and uses methods linked to development and programming (i.e. statistics, machine learning and data processing). The actual essence is, however, the delivery of data-based insights that help in empowering the business.

Agile methods build on short iteration loops that may also enable your data science team to respond to new insights and requirements from multiple stakeholders, while visualizing the progress. Frequent updates increase the transparency of the efforts and the results they yield. Retrospectives performed by the team will support continuous improvement of the process while ensuring a steep learning curve for all stakeholders.

However, the benefits of agile can not be fully realized in data science yet. Challenges mainly root in applying agile planning instruments, especially due to the consecutive nature of tasks and problems of fitting models to frequently changing datasets. This is why we have adapted the agile manifesto.

Hypotheses and experiments

over processes

Oftentimes, data science teams are ultimately measured in the success of their prediction and decision models. This is where the benefits of agile can truly be realized for data science teams, since building those models require testing of hypotheses and generating insights—both areas where agile approaches work well. However, we’ve seen resource and capacity conflicts arising when trying to follow agile methods for process-oriented tasks, such as collecting and maintaining data.

Agile methods build on short iteration loops that may also enable your data science team to respond to new insights and requirements from multiple stakeholders while visualizing the progress.

Defined quality objectives

over striving for optimization

We’re observing quite some academic working habits in data science where the optimization of a model is prioritized over a pragmatic result focus. This is why clearly defined quality criteria such as “confidence level of X%” or “withdrawal after X number of iterations” shall be used to ensure on-time execution needed in a business setting.

Applicability and problem understanding

over methodological excellence

Data is rarely structured, correct or complete. A model which used to work well for some datasets, might not be delivering on the results after changing the data scope (in statistics this is known as overfitting). We recommend applying simple models that easily work in many environments, rather than the ‘optimum’ model that may only function under lab conditions.

Periodic review and learning

over finished software and models

Once implemented, many models are not regularly reviewed and improved. This is frequently justified with higher consistency and better comparability over time. However, due to quick technological developments, a regular and maybe even automated review should be part of every data science team's calendar.

We are convinced that by following this adjusted manifesto, the benefits of agile can be truly realized in data science – and be even more fun to work in.

This piece was written by Josef Korte and Jan Ortmann

Agile methods in data science