paul vidal - pragmatic big data nerd

Tag Archives

2 Articles

Making data analytics operational

by paul 0 Comments
Making data analytics operational
I refuse to use the term-that-should-not-be-used when describing stale data lakes.

After 6 months of silence, I finally take the time to get back behind my keyboard. I would like to say that I used these 6 months to reflect upon my writing, the current data market and came out of this hiatus a better, more informed and well versed person, but that would be a lie. And, despite the current pace at which the social fabric of our society is moving towards considering lies as acceptable and moral, I prefer not to. I don’t really know why I stopped writing for a bit, but most likely because I had nothing to say. So today, brace yourselves for a semi-informed opinion piece on data analytics, because I actually changed my opinion a bit on it through real-life experience.

My opinion then: analytics are a fringe use case of data management

In my article “Why data driven companies should stop investing in data analytics” I argued for the death of dashboards. I still stand by that point of view, as too often the Business Intelligence (BI) platforms are an end point of the data life cycle. Countless data replication processes, ETL, busses and other goldengate push data into data warehouses or data lakes where data scientists pat themselves on the back by showing dashboards that could potentially contain information to be integrated in the current business processes. Quick aside and nugget of knowledge from my PhD friends: if your title contains “science” in it, you’re not a real scientist. Shots fired. Moving on, while I still stand knee deep in stale data lakes despite being on my soapbox, there is one thing I did not consider enough: Machine Learning algorithms. There are two main reasons why the existence of machine learning algorithms as they are implemented now changes my opinion. First and foremost, the problem I describe of BI being the end of the data chain and its outcome only being driven by humans trying to improve business process can be alleviated with analytics automation via these algorithms (to some extent at the moment, but will be more and more true as the technology progresses). Secondly, ML needs access to data lakes, not operational big data. The algorithms need to be able to train using any data sets, looking at data from any angle in order to make usable predictions.

My opinion now: analytics need to be better integrated in the data life cycle

Consequently, here my proposal to the data world. We need to envision an architecture where data warehouses are not the raiders of the lost ark type but more the amazon type: they need to be an inherent part of the data life cycle. Drilling a bit further in the architecture I contemplate, your data as a service layer would feed current data sets to your data warehouse, where ML would run asynchronously, but the outcome of these analytics would then feed back the rules of data manipulation embedded in your DaaS layer. If you manage a constant feedback loop of the kind, your end user application served by your DaaS will constantly get fed more accurate and relevant data, which in turn can enable the next generation of platforms: Information as a Service. But that’s for another day.

Why data driven companies should stop investing in data analytics

by paul 0 Comments
Why data driven companies should stop investing in data analytics

Despite the blatantly click-baitish and oxymoronic nature of the title of this post might lead you to believe, I actually have a point that I think is worth expressing (which I suppose is the reason I write any post, really). As it is often the case, this point is coming from an accumulation of real-life situations I encounter at my job. A lot of my work consist in rapidly enabling access to data to which companies have a really hard time accessing and distributing it in a very efficient big data architecture. The funny thing is, when I go to a customer and tell them that they will get access to all this data within a few days timeframe, they are either unprepared as to what to do with this data. Indeed, they (and we as an industry) have been focusing so much on the integration and exposure of data that we forgot to think about the value it can bring us. What’s the use case everyone think of when their data access is enabled? Analytics. “We’ll use [Name your favorite BI tool] to build dashboards and gain insight. This, to me, is incomplete and extremely short-sighted.

How analytics became synonym with Big Data

Before I dive into the reasons I suddenly want to make a good chunk of the big data industry hate me, let me try to express how we got to equate the term big data, or data in general with Business Intelligence or analytics. Note that I am trying to describe trends here; I am aware that there are outliers to these trends. With this said, and now that I have a license to express completely unverified facts since I apologized in advance, here is what I observed. When databases were first created, they were an enablement layer for software. We just needed a way to store data more efficiently in order for you application to run faster. Eventually, data changed from being a necessary evil to being a source of value. Indeed, once we realized that every activity that we engage in involves a piece of software, the data piece entrenched in each and every of these pieces of software became the best way to understand our own (and our users’) behavior. This is the advent of data warehouses, and BI tools. Then we realized there was a lot of data (I think the technical term is a shit ton), so we started developing big data lakes. In this transition, big data primary use became what data warehouses were used for: data analytics.

The death of dashboards

However, data analytics is only a very small percentage of the data use cases. Remember, data layers were design to enable applications, not showing you what they contain. Yes, dashboards and graphics are pretty but what is their goal? Their goal is to give you an idea on what whoever interacts with your software is doing in order to design solutions to palliate to the problems you find. Somehow however, analytics became a finality. Companies spend an insane amount of investment to achieve data analytics. This is extremely misguided. To be fair, part of the reason why analytics are a finality is a product of the limitation of data lakes, but that’s another topic.

Using data pro-actively

Solving the data integration, consolidation, distribution and exposure problem is not easy but it is being solved (I can tell that with confidence since I am on the front of that battle every day, though I would not say I put my life in danger every day, so that battle analogy stops here). My advice is to think beyond analyzing the data as a use case once you are able to have access to it. Instead of trying to identify trends, think about how to change them. Instead of trying to build an individualized snapshot of your customer, think about what action you should take based on that snapshot. Instead of getting a consolidated view of all of your systems, think about how to better orchestrate data flows between these systems to minimize the need for consolidation. I am purposely not listing specific examples, because they require a deep industry expertise which is not what I am trying to highlight here (my expertise being in data, not a specific industry). So, next time you are confronted with someone building a dashboard, ask yourself: why? why am building BI on top of my data? is BI going to give me insight on a problem to solve? if so, what is that problem? Once you have the answer to that question, try to build a platform that identifies and solves that problem rather than a platform that only allows you to identify it.