paul vidal - pragmatic big data nerd

Monthly Archives

2 Articles

What is a data scientist and do I need one?

by paul 0 Comments
What is a data scientist and do I need one?

A good friend once told me: “If your profession is not represented by a cartoon animal, your job description is made up and society does not need you”. This is when I realized that my life was a lie and I was condemned to eternal despair. But I digress. On the subject of data scientists, this is a role that has recently been introduced to the market place, so I think it’s important to ask ourselves what this role is and who can benefit from a data scientist.

The evolution of data

In the recent years, data has experienced profound changes. Not only the technology behind data storage and management has dramatically evolved from standard relational models to distributed solutions, the place of data in the enterprise and in the mind of people has changed. Suddenly data is becoming a sexy buzzword instead of being a necessary evil. Indeed, data has become its own entity within business organization with entire teams dedicated to it. Companies now no longer ask “do I really need to keep this data?” but “how can I make sure that I keep all the data I have?”. With this advent, new roles started to emerge and this is when “Data Scientists” have been introduced.

Buzzword or actual role?

Many argue that data scientists are just a fancier replacement to the role of business/data analysts; “A Data Scientist is a Data Analyst Who Lives in San Francisco” as you can read in this article (a very good read I might add). I agree to a certain extent: data scientists are people that are diving into software to get results that will ultimately help make business decisions. Companies leadership have always relied on this type of analysis from experts called business analysts. Business analysts even use business intelligence software to do data mining and generate statistics and guide business solution, which are some of the principal prerogatives of data scientist.

But I do think there is a fundamental change to be considered: data platforms are now a separate piece of software. Before the advent of big data, software used data layers. Nowadays, you have data lakes, data virtualization layers, real time data warehousing that are their own entities. Using these platforms require a combined set of skills: know how to use data platforms intimately (skills formerly owned by data administrators) and be able to generate business intelligence data out of them (skills formerly owned by business analysts).

As such, I think that a new designation for this combined set of skills is fair; and it looks like Wikipedia agrees with me by calling data science an interdisciplinary field: “Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD).”

Do I need to hire a data science team?

I think that there is a better question to be asked: “what am I doing with my data?”. Don’t get me wrong, the trend of wanting to accumulate as much data as possible is great. Especially great for me that work for a company that provides data management solutions. But I have seen implementations of massive data lakes taking years and month and very little use out of them, and this is a shame.

New data platforms gives business a tremendous opportunity. Instead of relying on the wisdom of visionaries or accumulated experience to make difficult business decisions, we get to gather evidence and make an informed decision. But you need to know what you want to know first. Once you do, then you can decide which platform is good for you and what type of data scientist you should hire. This will give you much more tangible results than buying a huge data platform, hiring an army of data scientists and do fundamental data research. OK, I made that last part up, fundamental data research is not a real thing… yet!

5 reasons why software consolidation always fails

by paul 0 Comments
5 reasons why software consolidation always fails
INSTRUCTIONS WERE UNCLEAR

Let’s start with a dare: I dare you to go to any large corporation, find an IT architect and ask them to give you a diagram of their complete architecture. I honestly think that they will politely ignore you, but for the sake of argument, let’s assume they are able to have access to this end-to-end architecture and that this architecture is accurate (and that you can find a screen or a piece of paper that is big enough to fit all of it in one page); by looking at this diagram, you will quickly understand why software consolidation is a very appealing proposition: multiple pieces of software serving the same purpose, duplicated teams, disparate processes… Think of all the money you can save if you buy this giant universal platform that everyone will use and will give you complete control over your IT!

Except that never happens. This giant convergent platform never gets implemented, even if it restricted to a certain functional vertical (e.g. billing, ERP, etc.). So why can’t we consolidate pieces of software into one? Let me give you my two cents.

Note: Hopefully the example I gave speaks for itself, but let me clarify the context of this article: I am specifically addressing software consolidation for very large organizations; of course if your organization employs 10 people and you’re all using google apps then this does not apply to you.

1. Large systems are complicated

This goes without saying but it’s better to say it: the answer to the ultimate question of life, the universe, and everything is fictional. Seriously though, it is so complicated to imagine a solution that would cater to the need of every company and every use case is ludicrous.

2. Enterprise softwares are outdated

While we can all agree that a universal solution is a utopia, this does not mean that you can’t create a solution that gives a large percentage of the solution, is what the smart guys at big enterprise software companies must have thought. To cater to the remaining few percents, customization can be added, (for a fee, charged by the software provider itself). And they have. These large enterprise software implementation have become colossi (at least I think that’s the plural of colossus) that are really hard to move: they are gigantic, expensive, slow-responsive and use backend technologies from the 70s.

As a result, these platforms become engorged and most of the innovation around them is about managing them more efficiently rather than offering a competitive advantage against the rest of the market. Let’s be clear, I’m not saying big enterprise software is dead, they are necessary.

But in an established competitive environment, you distinguish yourself by fighting for the edges, which means fast reactivity, which is incompatible with these outdated massive implementations.

3. Companies need solutions not platforms

How does one find its competitive edge? By implementing efficient targeted solutions. And as far as I could witness, this trend does not seem to be slowing down, quite the contrary (which I believe is a very healthy response). However, the multiplication of targets solution contributes to rendering the consolidation problem even more complicated and necessary.

4. Budget and learning curves are real constraints

Again, this might seem banal but is worth saying. An enterprise is driving a team of people, with their own expertise and responding to the demand of the market. Any change has a cost upfront and downstream, especially when replacing a well-known software as part as a consolidation effort.

5. Consolidation softwares aren’t business driven

In this realm where a single solution does not exist and businesses tend to purchase more and more specific solution, data consolidation platform flourish. Unfortunately, in order to cater to the complexity of the systems we’re dealing with, they are often driven by the underlying technology and not the business requirements.

This sounds a lot like business jargon, so let me explain this with an example: your software relies on its data back-end, and if you have tried to consolidate multiple back-end systems together, whether you use a traditional or distributed data platform, the first thing you end up doing is designing the data schema of the platform, then implement a way for the data to move from multiple backends to this system.

This is not the way your business want to see consolidation. Your business has a clear idea of what is the most important entity from which they can gain insight (for example analyzing user or customer behavior). This means that your consolidation platform schema needs to always be able to adapt to your business and not your business to try and fit into a schema.

So what’s next?

Software consolidation has tremendous application in giving insight to any business owner. But it needs to be a solution, not a generalized overhaul of the IT eco-system. Therefore I think it requires a good data virtualization solution. This solution must have at least the following qualities:

  1. Be business oriented
  2. Be able to publish fresh data on demand
  3. Be flexible enough to interface with any new element of the IT eco-system
  4. Be able to handle any amount of data
  5. Be able to publish results using known methods (using standard connectors/languages)

Of course, I work for a company that provides all these capacities, but that does not make my analysis unfounded. I would not work for a company if I didn’t believe it provided something truly unique and needed by the market. I genuinely believe that this type of solution will be the cement of the future IT eco-systems.