paul vidal - pragmatic big data nerd

Tag Archives

2 Articles

Making data analytics operational

by paul 0 Comments
Making data analytics operational
I refuse to use the term-that-should-not-be-used when describing stale data lakes.

After 6 months of silence, I finally take the time to get back behind my keyboard. I would like to say that I used these 6 months to reflect upon my writing, the current data market and came out of this hiatus a better, more informed and well versed person, but that would be a lie. And, despite the current pace at which the social fabric of our society is moving towards considering lies as acceptable and moral, I prefer not to. I don’t really know why I stopped writing for a bit, but most likely because I had nothing to say. So today, brace yourselves for a semi-informed opinion piece on data analytics, because I actually changed my opinion a bit on it through real-life experience.

My opinion then: analytics are a fringe use case of data management

In my article “Why data driven companies should stop investing in data analytics” I argued for the death of dashboards. I still stand by that point of view, as too often the Business Intelligence (BI) platforms are an end point of the data life cycle. Countless data replication processes, ETL, busses and other goldengate push data into data warehouses or data lakes where data scientists pat themselves on the back by showing dashboards that could potentially contain information to be integrated in the current business processes. Quick aside and nugget of knowledge from my PhD friends: if your title contains “science” in it, you’re not a real scientist. Shots fired. Moving on, while I still stand knee deep in stale data lakes despite being on my soapbox, there is one thing I did not consider enough: Machine Learning algorithms. There are two main reasons why the existence of machine learning algorithms as they are implemented now changes my opinion. First and foremost, the problem I describe of BI being the end of the data chain and its outcome only being driven by humans trying to improve business process can be alleviated with analytics automation via these algorithms (to some extent at the moment, but will be more and more true as the technology progresses). Secondly, ML needs access to data lakes, not operational big data. The algorithms need to be able to train using any data sets, looking at data from any angle in order to make usable predictions.

My opinion now: analytics need to be better integrated in the data life cycle

Consequently, here my proposal to the data world. We need to envision an architecture where data warehouses are not the raiders of the lost ark type but more the amazon type: they need to be an inherent part of the data life cycle. Drilling a bit further in the architecture I contemplate, your data as a service layer would feed current data sets to your data warehouse, where ML would run asynchronously, but the outcome of these analytics would then feed back the rules of data manipulation embedded in your DaaS layer. If you manage a constant feedback loop of the kind, your end user application served by your DaaS will constantly get fed more accurate and relevant data, which in turn can enable the next generation of platforms: Information as a Service. But that’s for another day.

Data As A Microservice: the future of data architecture

by paul 0 Comments
Data As A Microservice: the future of data architecture

Let me preface this article with an understatement: sometimes, enterprise architecture can be complicated. Large companies run thousands of applications, multiplied by dozens of environments replicated for testing, user testing, sandboxing, accumulated over years of acquisitions, re-architecturing (yes, it is a word I made up), and experiments all with the purpose of driving business forward. Like any complex system, human beings have been trying to make sense out of it by conceptualizing models and architectures aimed at simplifying the system thus making it more efficient, robust, scalable, secure, and spiritually vertuous (OK, maybe not the last part, although can a piece of software be inherently virtuous? A question for another day). With all this in mind, I would like to take some time to reflect on one of these concepts: Micro-Services and how this concept can apply in the realm of data management.

Microservices VS Enterprise Service Bus

First introduced in 2011 during workshop of software architects held near Venice in May 2011, Microservice Architecture is defined by James Lewis as follows:

The term “Microservice Architecture” has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data.

Microservice architecture is a subset of a Service Oriented Architecture (SOA), aiming at distributing microcomponents to deploy applications as opposed to a centralized application integraiton layer, often called Enterprise Integration Layer (EAI) or Enterprise Service Bus (ESB). Leaving aside the obvious angry developer argument stating that all of this is marketing jargon and rebranding of the same products, it’s interesting to take note of a fundamental trend I covered before in this blog: enterprise are looking to implement agile enviroment, which extremely granular elements in order to ensure business reactivity. The dream of the all-integrated all-consolidated entreprise layer is fading.

Data As A Microservice

In a very similar manner, the idea of single source of truth containing all the enterprise data is coming to an end. And, unlike some data lakes proponents would like to make you believe, it is not because of the pitfalls of traditional technologies that can’t handle large volume of data or distribute it efficiently. Building a single centralized source of data is an utopia. Instead, companies are now shifting their focus towards platforms enabling rapid agnostic data integration, agile data schema modification, and complete distribution. These platforms can then be used in a microservice architecture, making them Data As A Microservice platforms. I’ll admit, I may have made that term up too because it sounds cool, but it is very important to note for you data vendor, data scientist or data consumer (CIOs and CTOs organization). The future of data microservice-like agility, not monolithic unification.

References:

http://martinfowler.com/articles/microservices.html
http://stackoverflow.com/questions/25501098/difference-between-microservices-architecture-and-soa
https://www.voxxed.com/blog/2015/01/good-microservices-architectures-death-enterprise-service-bus-part-one/
https://en.wikipedia.org/wiki/Microservices