paul vidal - pragmatic big data nerd

Yearly Archives

4 Articles

Making data analytics operational

by paul 0 Comments
Making data analytics operational
I refuse to use the term-that-should-not-be-used when describing stale data lakes.

After 6 months of silence, I finally take the time to get back behind my keyboard. I would like to say that I used these 6 months to reflect upon my writing, the current data market and came out of this hiatus a better, more informed and well versed person, but that would be a lie. And, despite the current pace at which the social fabric of our society is moving towards considering lies as acceptable and moral, I prefer not to. I don’t really know why I stopped writing for a bit, but most likely because I had nothing to say. So today, brace yourselves for a semi-informed opinion piece on data analytics, because I actually changed my opinion a bit on it through real-life experience.

My opinion then: analytics are a fringe use case of data management

In my article “Why data driven companies should stop investing in data analytics” I argued for the death of dashboards. I still stand by that point of view, as too often the Business Intelligence (BI) platforms are an end point of the data life cycle. Countless data replication processes, ETL, busses and other goldengate push data into data warehouses or data lakes where data scientists pat themselves on the back by showing dashboards that could potentially contain information to be integrated in the current business processes. Quick aside and nugget of knowledge from my PhD friends: if your title contains “science” in it, you’re not a real scientist. Shots fired. Moving on, while I still stand knee deep in stale data lakes despite being on my soapbox, there is one thing I did not consider enough: Machine Learning algorithms. There are two main reasons why the existence of machine learning algorithms as they are implemented now changes my opinion. First and foremost, the problem I describe of BI being the end of the data chain and its outcome only being driven by humans trying to improve business process can be alleviated with analytics automation via these algorithms (to some extent at the moment, but will be more and more true as the technology progresses). Secondly, ML needs access to data lakes, not operational big data. The algorithms need to be able to train using any data sets, looking at data from any angle in order to make usable predictions.

My opinion now: analytics need to be better integrated in the data life cycle

Consequently, here my proposal to the data world. We need to envision an architecture where data warehouses are not the raiders of the lost ark type but more the amazon type: they need to be an inherent part of the data life cycle. Drilling a bit further in the architecture I contemplate, your data as a service layer would feed current data sets to your data warehouse, where ML would run asynchronously, but the outcome of these analytics would then feed back the rules of data manipulation embedded in your DaaS layer. If you manage a constant feedback loop of the kind, your end user application served by your DaaS will constantly get fed more accurate and relevant data, which in turn can enable the next generation of platforms: Information as a Service. But that’s for another day.

What I Don’t Talk About When I Don’t Talk About Running

by paul 0 Comments
What I Don’t Talk About When I Don’t Talk About Running
The Track. My old nemesis.

Today’s post will differ quite a bit to my previous ones. While I enjoy discussing the current state of affairs in the fascinating world of data management, I would like to take the opportunity to write a very personal piece. So very personal in fact, that I have hesitated a long time to publish it. To put things into context, at this instant in time in my life, I strive to be a very logical person. One of things I loathe the most in this world is the proportion of emotion driven decisions that occur. It’s been a soap box of mine for quite a while now, upon which I stand and promote logic and critical thinking, comfortably sitting (or I suppose in this case, standing) in my eco-chamber of like minded skeptics. That being said, there are really only two domains in my life for which I do not strive to apply that very rational framework and let my emotions take the best of me: my family, and running (obviously to a greater extent with regard to my family, I am not a monster). Here is the thing about very emotional topics: I do not feel comfortable talking about them, which is why I very so seldom share anything about my love for my family; and that’s not going to start in this blog post either, let me makes things clear. No, today, I’m going to talk about my relationship with running. Once, then I will go back to not talking about it, except on rare occasions and with selected people.

I do not run to be healthy

Before I start pouring my logorrhea of deeply intimate and potentially completely uninteresting facts about the place that running has in my life, let me do a little bit of house cleaning. Yes, exercise is good for you. The list of benefits is never ending, from better longevity, easier weight management, improved cognitive functions, stress reduction and much more. This is NOT the running I’m going to talk about here. As a by product of running I may be more healthy, but that’s really not the goal or the attitude I have towards running. I don’t need an end goal to run. I run because this is one of the things I love the most in life.

I am in a relationship with running

Indeed, while I recognize that I’m still a “new” runner, having only started seriously about 5 years ago, I can tell you that running is part of my life. As soon as I started, I sincerely fell in love with it. I’m choosing the world love carefully, meaning that I am emotionally involved with running. The joy and pain I feel for getting the opportunity to spend time running, not even running itself, just the idea that I can go on a run are extremely potent for me. To give you a recent example, about a week ago, I got a free bib to run a race. The day of the race, while warming up, I realized I was sick, something completely out of my control. In any other domain, things that I cannot control do not affect me greatly. But that day, as I realized I wasn’t going to be able to run hard, and it made more sense for me not to race, I sat down behind a tree and cried. In hindsight, this attitude is completely disproportionate to the situation. The bib was free, I signed up less than a week in advance, I did nor really prepared for the race, I could not control being sick or not, my family still loves me, the sun is going to rise tomorrow, etc. I realize that this type of emotional response is completely alienating to my entourage, from people who don’t care about my last workout splits to my wife having to deal with my training schedule and nervous breakdowns. However, this relationship I am in is not unhealthy, not like it was when I started. I know when to take time off. Reluctantly sure, but I take it nonetheless. I even questioned the very nature of it, to see if the emotions I feel about running are not a coping mechanism for underlying deeper issues, but I ultimately arrived at the conclusion that it isn’t. I stepped into despair from the pit of dread, and as described by Kierkegaard I emerged trying to know myself. And the person that I am is a very lucky man, who gets to really be passionate about running. If running is taken away from me, it would really suck, but I would still be me.

I am not a social runner

Don’t think for a minute that I consider my case exceptional in any way. I’m sure people feel very passionate about many things, running included. Here is one thing that I am not though: a social runner. The community of runners is perhaps one of the most amazing things I get to witness on a regular basis. The camaraderie and motivation that groups offer for people that are just starting out to people that are constantly striving to better themselves are amazing, and reach far beyond the act of running itself. Running promotes charity, inclusiveness, and self worth. When you run in a group, you are always welcome, no matter where you come from, how fast you’re going, how long you’ve been running or if you will ever go and run with this group again. It took me a while to admit this but I, however, have very little need for belonging to a group. Note that this is how I currently feel, and that it may be subject to change in the future, but I think it is a mature position that I hold, and the reason why I think it is fair for me to write it black on white. I enjoy running with people I like (and most of the people I meet while running I like), but I do not look for being part of a community (running group or beyond). I do not seek out runners to go out for a run. I do not feel like waving a flag saying to the world that I am a runner. I do not wear running clothes outside of a running-related event. I do not post on a regular basis on running groups. I do not like to talk about running all the time, while I’m running. I enjoy a little bit of it, but only a little. I confess that I have unfollowed many groups and friends that constantly posts about running on Facebook. And I think that I figured out why. First, I have psychopathic tendencies and therefore have never really felt the appeal of community. I don’t care about identifying myself one way or the other. Yes, I am a runner, as a matter of fact I am very passionate about running but I don’t need group recognition. But to some extent, I think I am a jealous lover. I do not want to share running with anyone else. Running is very intimate to me, so much that sometimes, I hear songs describing a relationship and I feel they are describing my relationship with running. So you will understand why it is so hard for me to see pictures of people having an orgy of good feelings during a race I just raced, and for which I am not completely satisfied of my results.

Running is my Sisyphus’ boulder

Which brings me to my next point: I am a competitive person. And the person I love to beat the most is myself. Running is the best avenue for me to compete with myself. And I thoroughly enjoy the process. Training every day to gain little edges of fitness is the most gratifying process in which I get to engage. The process is complex, dependent on many variables, which is why I am lucky to have a coach that is almost as passionate and knowledge about coaching as I am about running. This by no means ensures that I am always making the optimum choice of training every day, but leaning on people’s expertise is a good rule of thumb if you want to improve at a particular skill set. The great thing about self-competition is that you can always find ways to improve, whether it is getting better at a certain type of workout/distance or acknowledging the person that you are and knowing your limitations. More than improvement, training to be a better runner teaches me to have short and long term goals, all the while giving me a constant in my ever changing life. No matter where I am, I know that I can train. Which is why days like today, where I am unfortunately sick and it is more advised not to run at all, are very hard. Regardless, I know that the task of training will never finish, and that makes me pretty happy, because that means I get to run more!

Final thoughts

As I mentioned at the beginning of this post, the strings of words that are part of this article are a departure from my normal writing. They may in fact not even be coherent. Why publish them at all? Partly to experiment and see people’s reaction to it, but mostly to shut the voice inside me that beg me to write about the subject; just once, so that you understand how deeply and sincerely attached I am to running, and how much it humbles me. Running is so important to me, I try to advertise it as little as possible.

Why data driven companies should stop investing in data analytics

by paul 0 Comments
Why data driven companies should stop investing in data analytics

Despite the blatantly click-baitish and oxymoronic nature of the title of this post might lead you to believe, I actually have a point that I think is worth expressing (which I suppose is the reason I write any post, really). As it is often the case, this point is coming from an accumulation of real-life situations I encounter at my job. A lot of my work consist in rapidly enabling access to data to which companies have a really hard time accessing and distributing it in a very efficient big data architecture. The funny thing is, when I go to a customer and tell them that they will get access to all this data within a few days timeframe, they are either unprepared as to what to do with this data. Indeed, they (and we as an industry) have been focusing so much on the integration and exposure of data that we forgot to think about the value it can bring us. What’s the use case everyone think of when their data access is enabled? Analytics. “We’ll use [Name your favorite BI tool] to build dashboards and gain insight. This, to me, is incomplete and extremely short-sighted.

How analytics became synonym with Big Data

Before I dive into the reasons I suddenly want to make a good chunk of the big data industry hate me, let me try to express how we got to equate the term big data, or data in general with Business Intelligence or analytics. Note that I am trying to describe trends here; I am aware that there are outliers to these trends. With this said, and now that I have a license to express completely unverified facts since I apologized in advance, here is what I observed. When databases were first created, they were an enablement layer for software. We just needed a way to store data more efficiently in order for you application to run faster. Eventually, data changed from being a necessary evil to being a source of value. Indeed, once we realized that every activity that we engage in involves a piece of software, the data piece entrenched in each and every of these pieces of software became the best way to understand our own (and our users’) behavior. This is the advent of data warehouses, and BI tools. Then we realized there was a lot of data (I think the technical term is a shit ton), so we started developing big data lakes. In this transition, big data primary use became what data warehouses were used for: data analytics.

The death of dashboards

However, data analytics is only a very small percentage of the data use cases. Remember, data layers were design to enable applications, not showing you what they contain. Yes, dashboards and graphics are pretty but what is their goal? Their goal is to give you an idea on what whoever interacts with your software is doing in order to design solutions to palliate to the problems you find. Somehow however, analytics became a finality. Companies spend an insane amount of investment to achieve data analytics. This is extremely misguided. To be fair, part of the reason why analytics are a finality is a product of the limitation of data lakes, but that’s another topic.

Using data pro-actively

Solving the data integration, consolidation, distribution and exposure problem is not easy but it is being solved (I can tell that with confidence since I am on the front of that battle every day, though I would not say I put my life in danger every day, so that battle analogy stops here). My advice is to think beyond analyzing the data as a use case once you are able to have access to it. Instead of trying to identify trends, think about how to change them. Instead of trying to build an individualized snapshot of your customer, think about what action you should take based on that snapshot. Instead of getting a consolidated view of all of your systems, think about how to better orchestrate data flows between these systems to minimize the need for consolidation. I am purposely not listing specific examples, because they require a deep industry expertise which is not what I am trying to highlight here (my expertise being in data, not a specific industry). So, next time you are confronted with someone building a dashboard, ask yourself: why? why am building BI on top of my data? is BI going to give me insight on a problem to solve? if so, what is that problem? Once you have the answer to that question, try to build a platform that identifies and solves that problem rather than a platform that only allows you to identify it.

The Big Data market in 2017

by paul 0 Comments
The Big Data market in 2017
Buildings always look badass from the ground and with a black and white filter

Accepting reality is not trivial. As we sit in our echo-chambers, particularly exacerbated by our social networks, preference algorithms and suggested searches, our cognitive biases betray our picture of the world around us. Add that to the fact that everyone else than us is an idiot with whom we should not engage in a conversation (or if you prefer, call this mild social anxiety), and pretty soon you can convince yourself of anything. With this in mind, I decided to take the time today to expose my understanding of the reality of the big data market in 2017, for large entreprises. While it is inherently biased, arguably like any piece of writing, I did try to do my research reading a lot of white papers recently (of which you can find some reference below), but mostly, this is my domain of expertise which means I’m confronted to it every day. Of course, this categorization is subject to discussion and constructive criticism that I always welcome.

Out, or very little relevancy

  • Data Lakes: The fascination for unlimited data distribution has passed. Enterprises struggle to find a use to their data lakes and the layers written on top of them to make them useful seem too much effort for little reward.
  • Pure data analytics: The terms Data Analytics or Business Intelligence encapsulate a vast number of concepts that will always be useful one way or another in our data driven world. What is nowadays losing momentum is solutions making analytics the end goal (analyzing trends, population sub group preferences, etc.). BI is a very small portion of what Big Data offers and if the end goal of a solution is to give you trend analytics, it is too reductive.
  • SOA, ESBs, Convergent applications: This is been dead for a while but worth mentioning. The idea of a single convergent enterprise solution to encapsulate all data and functionalities is practically not feasible (too much market change, too much cost, too much complexity, too little agility).

Extremely relevant for 2017

  • Agile data platforms: At the opposite end of ESBs and massive consolidation into one data system is the micro-services architecture. The architecture enables extremely rapid and agile deployment of applications to respond to an ever changing market where end customers have more choices than ever and thus are very hard to retain. The bottleneck of micro-services architectures is often data. Being able to consolidate, cleanse and expose rapidly data to anywhere is a complicated proposition but some platforms can do it. If a platform is able to integrate from multiple sources, consolidate and expose data rapidly, then it enables today’s hottest use cases: digital transformation, agile test data management, micro-services implementation, and more.
  • Cloud enablement: More than ever, and despite previous reticense vis-a-vis security (just like older generations are still reluctant to use their credit card numbers on the web), the movement to cloud applications and platforms. Enabling the cloud, that is not only exposing/migrating data to cloud applications but also ensuring security, compliance and control over the data exposed is therefore a very important market trend.
  • Data Personalization: we live in a world where everyone expects their experience to be catered to them. Having to repeat your identity while being tossed from department to department on a help desk line is one of the most infuriating experience (after the complete loss of human rights and dignity one experiences in an airport). Seriously though, enabling the understanding of the individual is crucial, whether that individual is a person, a product or a machine in IoT use cases.

Not quite there yet

  • Predictive Individual Analytics: We already see some implementations of this in ad personalization or preference settings, but being able to predict what an entity (a person, a machine, a car, a product) will do, what it wants and needs is going to open the door to systems that give answer instead of respond to questions. It requires the problem of data personalization to be solved beforehand though.
  • Smart Data Discovery: Once agile data platforms are in place, the use-case of automated data mining will explode. Too many systems with too little experts on the systems will give birth to solution that’ll enable the enterprise to recover a fair percentage of the relevant data without human intervention.
  • Expert AI systems: Finally, and most exciting of all are expert AI systems. These are software that will replace the way that data is currently fed to our everyday software (CRM, machine monitoring, marketing analytics, etc.). The use cases are still not clear in my head but I know that finding the point where human intervention is the most costly (where it requires pattern recognition), and replacing it with automated AI will be a game changer.

Some references

  • Is the Cloud Secure? (Gartner): link
  • Marketing data management (Ascend2): link
  • Seizing the Digital Advantage in Banking and Financial Services (Cognizant): link
  • The Big data workbook (Informatica): link
  • Agile Test Data Management: The New Must-Have (Forrester): link