paul vidal - pragmatic big data nerd

Computer Science needs the scientific method

by paul 0 Comments
Computer Science needs the scientific method

Software development is basically the wild-west

At the exception of a small portions of academics, Computer Science is generally equated to software development. When I talk about Software development I mean it in the broadest sense possible, from scripting unix cron jobs to developing a mobile application. Thus, computer science in the colloquial term is an adolescent engineering practice that operates in a free wild-west like open world.

We basically develop new technologies through trial and error, then either iterate or choose another starting point when we hit a wall.

A good example of that is the evolution of data storage and analysis. The relational model has been extremely successful until it didn’t scale enough so we changed the paradigm to be distributed.

This method of development has been successful for one good reason: the outcome of software is 0 or 1. Data scales or doesn’t. The software works or doesn’t. To put it in better terms, the complexity of the systems we are implementing and studying in Computer Science are simple enough to be finite and fully evaluated.

Open Source isn’t Open Science

This also explains the success of the open source model. Crowd sourcing is easy when you can easily say if something is correct or not.

However, Computer Science is different today. Machine Learning and Artificial Intelligence are changing the system. Predictability of results is now statistical instead of binary, and the algorithms used can be black boxes.

This means that depending on the assumptions you took while developing your software, its outcome can differ without you being able to know if you are achieving what you were after.

For instance, you could create a deep learning software that uses facial recognition, natural language processing and data mining to determine whether one is inclined to like orange juice. There is no easy way for you to know whether the prediction of your software are true. You can know whether your model is accurate, and optimize for it, but you don’t know its real world truth value.

This is scary. Especially when you’re not trying to predict how much one likes orange juice.

Today’s guard rails are inadequate

The AI/ML industry brandishes model drift evaluation, model explainability or even ethics to address this fundamental shift. However, while these methods are necessary they fail to address the lack of methodology needed inherent to the increased complexity of the software development realm.

Indeed, before developing any software we should follow a typical scientific process including publication of methodology and hypothesis prior to development, and have the software peer reviewed.

To get a head start on how to apply this methodology, the open science taxonomy proposed by the center for open science is a good start.

References:

  • Center for Open Science: https://cos.io
  • Open Science on Wikipedia: https://en.wikipedia.org/wiki/Open_science

Leave a reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>