paul vidal - thoughts about tech and more

Category Archives

23 Articles

The Big Data market in 2017

by paul 0 Comments
The Big Data market in 2017
Buildings always look badass from the ground and with a black and white filter

Accepting reality is not trivial. As we sit in our echo-chambers, particularly exacerbated by our social networks, preference algorithms and suggested searches, our cognitive biases betray our picture of the world around us. Add that to the fact that everyone else than us is an idiot with whom we should not engage in a conversation (or if you prefer, call this mild social anxiety), and pretty soon you can convince yourself of anything. With this in mind, I decided to take the time today to expose my understanding of the reality of the big data market in 2017, for large entreprises. While it is inherently biased, arguably like any piece of writing, I did try to do my research reading a lot of white papers recently (of which you can find some reference below), but mostly, this is my domain of expertise which means I’m confronted to it every day. Of course, this categorization is subject to discussion and constructive criticism that I always welcome.

Out, or very little relevancy

  • Data Lakes: The fascination for unlimited data distribution has passed. Enterprises struggle to find a use to their data lakes and the layers written on top of them to make them useful seem too much effort for little reward.
  • Pure data analytics: The terms Data Analytics or Business Intelligence encapsulate a vast number of concepts that will always be useful one way or another in our data driven world. What is nowadays losing momentum is solutions making analytics the end goal (analyzing trends, population sub group preferences, etc.). BI is a very small portion of what Big Data offers and if the end goal of a solution is to give you trend analytics, it is too reductive.
  • SOA, ESBs, Convergent applications: This is been dead for a while but worth mentioning. The idea of a single convergent enterprise solution to encapsulate all data and functionalities is practically not feasible (too much market change, too much cost, too much complexity, too little agility).

Extremely relevant for 2017

  • Agile data platforms: At the opposite end of ESBs and massive consolidation into one data system is the micro-services architecture. The architecture enables extremely rapid and agile deployment of applications to respond to an ever changing market where end customers have more choices than ever and thus are very hard to retain. The bottleneck of micro-services architectures is often data. Being able to consolidate, cleanse and expose rapidly data to anywhere is a complicated proposition but some platforms can do it. If a platform is able to integrate from multiple sources, consolidate and expose data rapidly, then it enables today’s hottest use cases: digital transformation, agile test data management, micro-services implementation, and more.
  • Cloud enablement: More than ever, and despite previous reticense vis-a-vis security (just like older generations are still reluctant to use their credit card numbers on the web), the movement to cloud applications and platforms. Enabling the cloud, that is not only exposing/migrating data to cloud applications but also ensuring security, compliance and control over the data exposed is therefore a very important market trend.
  • Data Personalization: we live in a world where everyone expects their experience to be catered to them. Having to repeat your identity while being tossed from department to department on a help desk line is one of the most infuriating experience (after the complete loss of human rights and dignity one experiences in an airport). Seriously though, enabling the understanding of the individual is crucial, whether that individual is a person, a product or a machine in IoT use cases.

Not quite there yet

  • Predictive Individual Analytics: We already see some implementations of this in ad personalization or preference settings, but being able to predict what an entity (a person, a machine, a car, a product) will do, what it wants and needs is going to open the door to systems that give answer instead of respond to questions. It requires the problem of data personalization to be solved beforehand though.
  • Smart Data Discovery: Once agile data platforms are in place, the use-case of automated data mining will explode. Too many systems with too little experts on the systems will give birth to solution that’ll enable the enterprise to recover a fair percentage of the relevant data without human intervention.
  • Expert AI systems: Finally, and most exciting of all are expert AI systems. These are software that will replace the way that data is currently fed to our everyday software (CRM, machine monitoring, marketing analytics, etc.). The use cases are still not clear in my head but I know that finding the point where human intervention is the most costly (where it requires pattern recognition), and replacing it with automated AI will be a game changer.

Some references

  • Is the Cloud Secure? (Gartner): link
  • Marketing data management (Ascend2): link
  • Seizing the Digital Advantage in Banking and Financial Services (Cognizant): link
  • The Big data workbook (Informatica): link
  • Agile Test Data Management: The New Must-Have (Forrester): link

Data As A Microservice: the future of data architecture

by paul 0 Comments
Data As A Microservice: the future of data architecture

Let me preface this article with an understatement: sometimes, enterprise architecture can be complicated. Large companies run thousands of applications, multiplied by dozens of environments replicated for testing, user testing, sandboxing, accumulated over years of acquisitions, re-architecturing (yes, it is a word I made up), and experiments all with the purpose of driving business forward. Like any complex system, human beings have been trying to make sense out of it by conceptualizing models and architectures aimed at simplifying the system thus making it more efficient, robust, scalable, secure, and spiritually vertuous (OK, maybe not the last part, although can a piece of software be inherently virtuous? A question for another day). With all this in mind, I would like to take some time to reflect on one of these concepts: Micro-Services and how this concept can apply in the realm of data management.

Microservices VS Enterprise Service Bus

First introduced in 2011 during workshop of software architects held near Venice in May 2011, Microservice Architecture is defined by James Lewis as follows:

The term “Microservice Architecture” has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data.

Microservice architecture is a subset of a Service Oriented Architecture (SOA), aiming at distributing microcomponents to deploy applications as opposed to a centralized application integraiton layer, often called Enterprise Integration Layer (EAI) or Enterprise Service Bus (ESB). Leaving aside the obvious angry developer argument stating that all of this is marketing jargon and rebranding of the same products, it’s interesting to take note of a fundamental trend I covered before in this blog: enterprise are looking to implement agile enviroment, which extremely granular elements in order to ensure business reactivity. The dream of the all-integrated all-consolidated entreprise layer is fading.

Data As A Microservice

In a very similar manner, the idea of single source of truth containing all the enterprise data is coming to an end. And, unlike some data lakes proponents would like to make you believe, it is not because of the pitfalls of traditional technologies that can’t handle large volume of data or distribute it efficiently. Building a single centralized source of data is an utopia. Instead, companies are now shifting their focus towards platforms enabling rapid agnostic data integration, agile data schema modification, and complete distribution. These platforms can then be used in a microservice architecture, making them Data As A Microservice platforms. I’ll admit, I may have made that term up too because it sounds cool, but it is very important to note for you data vendor, data scientist or data consumer (CIOs and CTOs organization). The future of data microservice-like agility, not monolithic unification.

References:

http://martinfowler.com/articles/microservices.html
http://stackoverflow.com/questions/25501098/difference-between-microservices-architecture-and-soa
https://www.voxxed.com/blog/2015/01/good-microservices-architectures-death-enterprise-service-bus-part-one/
https://en.wikipedia.org/wiki/Microservices

What is the most underrated aspect of software development and why is it measurability?

by paul 0 Comments
What is the most underrated aspect of software development and why is it measurability?

Designing and developing software is complicated. I have heard there might even be a full industry gathering experts in this domain, and that it could be doing well. Not sure if it will ever be a thing. All joking aside, theories about the optimum way to approach software development are numerous and constantly evolving, which is excellent. Today however, I want to talk about an underrated concept, especially within the realm of software development: measurability. Despite online dictionaries results, I’m pretty sure I just made up that word, or at least the concept attached to it vis-a-vis software development, so let me define it.

What do you mean by measurability and why should I care about it?

Within the realm of software development, measurability can be catalogued in the same category as other transversal high-level concepts, that must be considered at each and every step of the development process, such as user experience, performance, scalability, re-usability and security. Measurability in this sense is the idea that each and every feature of you develop for your software can be measured for popularity and efficacy in order to ultimately evaluate its necessity. That is a lot of y-ending words, which should have convinced you already. Hoping it didn’t, let me explain you why it is important to consider. First, I believe that the importance of these types of high-level concepts does not need further justification: we have all witnessed software failures when their development ignored one of these key concepts, security being the one making the front page most often. The impact of measurability is more subtle but nonetheless crucial. Without measurability, decisions you make about feature prioritization or design become irrational. For instance, if you are developing an API that contains multiple methods of access, if you are unable to measure their popularity or efficacy you will end up with either features that are being costly maintained for no benefits to your end user or features that are massively used by necessity but incrementally building your end user’s frustration. This is a very simple example but it illustrate an underlying notion that we rarely see in the world of zeros and one: irrationality. Indeed, a piece of software is usually extremely rational and quantifiable, which makes evaluating performance, scalability, security or even re-usability a relatively easy mathematical problem. With the advent of software popularization we see user experience has been on the forefront of Agile development, making customer feedback a key piece of feature release. What I am proposing here is to go one step further. Whenever developing a feature for your software, one should ask himself: how will I know if this feature is necessary or not? How will I test for it?

Implementing measurability

Implementing measurability acknowledges the fact that you are operating in an uncertain environment, which inherently makes its implementation uncertain. That being said, a good starting point is to measure its use and performance and then compare it to the other features you develop. This measurement and analysis can be done using trace or audit mechanisms, which, bonus, you should implement anyway to cater to security. A more robust approach would be to first select the metrics you want to measure for each software feature and have a dedicated module to implement measurability over those metrics. You may think it’s an overkill but with the advent of scalable and cheap storage, why not do it?

Beyond software development

Big Data, monitoring, analysis data science, all of these concepts are design to increase the world’s measurability, and they are definitely what everyone talks about now. And while the idea of being data driven in any aspect of our lives, from corporate management to personal fitness, it has yet to really make an impact within the realm of software development, or at least the tools dedicated to measurability only are scarce. That being said, making rational decisions does not seem to be as appealing to me as it is for the rest of the world, which could explain this scarcity.

Who decided that stored procedures should not be commented?

by paul 0 Comments
Who decided that stored procedures should not be commented?

I’ve been spending the past couple of weeks working on stored procedures. Glimpsing into my career so far, I realize how much stored procedures are the backbone of many organizations dealing with data. Stored procedures are something of a potpourri between magic behavior, bespoke black boxes, and the sedimentation of code layers accumulated over years of feature additions implemented by a battalion of sometimes well-intented PL/SQL programmers with tight deadlines. Furthermore, stored procedures, more than any other type of data manipulation, are what the actual live production systems rely upon. It is not uncommon for a piece of software to have hundreds of store procedures essential for it to work, and for good reason. Indeed, store procedures are extremely efficient. So much so that even unoptimized pieces of code harboring redundant test and an unreasonable amount of nested outer joins still run in a few milliseconds. Efficient they are. But you know what they are not? Commented. Seriously, the packages I worked with recently contain tens of thousands of lines of code but never contain more than 10 lines of comments, mostly containing something along the lines of “– 10/10/2014 added by Jay” or “– requirement R3045”. And as far as I can remember, relying solely on my flawed memory and anecdotal evidence, this is the case with the vast majority of stored procs. Therefore, after I spending some time curved into a ball crying, I asked myself: “why?”.

Common consensus about commenting code

Childishly, I first assumed that every piece of code should be commented, and the only reason for not commenting code would be laziness/lack of time/lack of understanding/hatred for whomever would read your code in the future. I was obviously misguided as one often is when assuming anything to be simple. Indeed, they are many times when commenting renders your code less readable, or is an excuse for bad coding. This article in particular, Common Excuses Used To Comment Code and What To Do About Them does an excellent job at highlighting when commenting is sub-optimal:

The code is not readable without comments. Or, when someone (possibly myself) revisits the code, the comments will make it clear as to what the code does. The code makes it clear what the code does. In almost all cases, you can choose better variable names and keep all code in a method at the same level of abstraction to make is easy to read without comments.

    We want to keep track of who changed what and when it was changed. Version control does this quite well (along with a ton of other benefits), and it only takes a few minutes to set up. Besides, does this ever work? (And how would you know?)
    I wanted to keep a commented-out section of code there in case I need it again. Again, version control systems will keep the code in a prior revision for you – just go back and find it if you ever need it again. Unless you’re commenting out the code temporarily to verify some behavior (or debug), I don’t buy into this either. If it stays commented out, just remove it.
    The code too complex to understand without comments. I used to think this case was a lot more common than it really is. But truthfully, it is extremely rare. Your code is probably just bad, and hard to understand. Re-write it so that’s no longer the case.
    Markers to easily find sections of code. I’ll admit that sometimes I still do this. But I’m not proud of it. What’s keeping us from making our files, classes, and functions more cohesive (and thus, likely to be smaller)? IDEs normally provide easy navigation to classes and methods, so there’s really no need to scan for comments to identify an area you want to work in. Just keep the logical sections of your code small and cohesive, and you won’t need these clutterful comments.
    Natural language is easier to read than code. But it’s not as precise. Besides, you’re a programmer, you ought not have trouble reading programs. If you do, it’s likely you haven’t made it simple enough, and what you really think is that the code is too complex to understand without comments.

Why this consensus does not apply to stored procedures

As much as these arguments make sense I don’t think they apply to store procedures:

    “you can choose better variable names and keep all code in a method at the same level of abstraction”: You can’t easily change table fields and names, nor can you cut a big nested SQL statement gracefully.
    “Version control does this quite well”: Version control is almost never implemented for stored procedures.
    “I wanted to keep a commented-out section of code there in case I need it again.”: OK, that’s just BS.
    “[complex code] is extremely rare.”: Nested SQL queries are inherently complex and MUCH less readable than traditional code.
    “Markers to easily find sections of code.”: I never saw a problem with that.
    “you ought not have trouble reading programs”: Except queries are the opposite of natural language. Please, please, please SQL developer, let me know why you are doing this particular join

To summarize, I still don’t understand why stored procedures are generally not commented, while it would seem they are the type of code that could benefit the most from comments. Maybe NoSQL will change this, but in the meantime, I will start this crusade, and make sure people explain their code, yo!

Choosing the correct DBMS – Gartner Reprint review

by paul 0 Comments
Choosing the correct DBMS – Gartner Reprint review
What do you mean my speed isn't linearly scalable?

If you haven’t had the chance to look at it yet, I encourage you to read this gartner reprint: Critical Capabilities for Operational Database Management Systems. This report is extremely interesting if you’re into data at all. I’m in too deep, so I’m going to talk about this report for the whole length of this marvelous post.

The importance of specific use-cases

The first thing that jumped out to me, was that Gartner uses the word DBMS Gartner, thus highlights the fact that the dichotomy between traditional relational database management systems and what has been labelled “NoSQL” is fading out. Instead, Gartner advises to “Classify the use cases under consideration and map them to the costs, deployment options and skills requirements of the products evaluated here.”. This is extremely important and a departure from some of the preconceptions I witness amongst my fellow professionals. Often enough, I get confronted with consultants trying to categorize DBMS by capabilities (distribution capabilities, support of languages, etc.). More importantly, these platforms are marketed through those capabilities. As I argued before, and confirmed by this report, end users, consultants and software provides alike should select, recommend and market according to the use-cases at which they excel instead of the capabilities inherent to a specific platform (see previous post: The importance of specialization in software sales).

Evaluation criteria

But enough surrendering to my own confirmation biases in order to pat myself on the back and delude myself into thinking that my observations may be going in the same direction as Gartner’s. The report evaluates different vendors (selected themselves base on the set of Inclusion Criteria), using the following criteria:

  • High-Speed Ingest and Processing
  • ACID Support
  • Tunable Consistency
  • Multimodel Support
  • Automated Data Distribution
  • Cloud/Hybrid Deployment
  • Programmability for HTAP
  • Administration and Management
  • Security
  • These criteria are then weighted according to four different use cases:

  • Traditional Transactions
  • Distributed Variable Data
  • Lightweight Events and Observations
  • Hybrid Transactional/Analytical Processing (HTAP)
  • I’m not going to spend time describing the criteria, Gartner put up very readable charts to compare the different vendors. In short, it seems that Oracle is leading the traditional transaction world while DataStax is leading the distributed one. On a personal note, I’m super excited for DataStax, I get to work with many members of their team and the company I am working for leverages their solution, so it’s excellent recognition.

    I would however perhaps have had another two criteria: integration ecosystem and cost. Regarding the latter, I would have created two sets of charts: one considering the cost and one not considering it. Of course, I understand cost is a delicate and fluctuant subject and I understand Gartner’s decision. Integration ecosystem however is very important. Being able to evaluate how easy it is to integrate and use data once it is in these DBMS is extremely important when making an architecture choice.

    Personal Conclusion

    I’m always impressed by the conciseness of Gartner reports. This one does not fail in that regard, and gives a very good basis to one evaluating data management systems. That being said, and to make sure that horse is dead, think of your use-case before going for an RFP. Many DBMS can do many things, but few excel at all use-cases.

    Should all your data move to the cloud?

    by paul 0 Comments
    Should all your data move to the cloud?

    I recently on multiple occasions engaged into conversations about whether or not Fortune 500 organizations are ready to move all their data to the cloud. While I’m not arguing about the benefits of distributed systems, I did encounter a significant number of organizations that are not ready to move to a SaaS model. Despite the obvious security reasons, I think it is an maintaining control over the core of your business to drive innovation is crucial (see Telsa example). Furthermore, many organizations’ strategy seem to be going towards build IaaS/PaaS and eventually SaaS within their own IT. These tendencies lead me to believe the dichotomy between SaaS and traditional in-house implementation isn’t absolute. Therefore, the market will see the advent of solutions enabling control over internal data while leveraging SaaS functionalities.

    Since I work for a company offering one of these solutions, I wrote a white paper about it, so here it. Enjoy the read!

    screen-shot-2016-09-30-at-8-36-50-am

    The future of Data is Augmentation, not Disruption

    by paul 1 Comment
    The future of Data is Augmentation, not Disruption
    I'm disrupting the light bulb market by enabling wireless. I call it "Photoshop"

    I spent last week enjoying the Cassandra Summit, so much that I did not take the time to write a blog post. I had a few ideas but I chose quality over quantity. That being said, something interesting happened at the summit: we coined the term “augmentation” for one of my companies key go to market use case, instead of data layer modernization or digitalization. even got the opportunity to try both terms to the different people visiting our booth. In this extremely small sample, people really tended to have a much better degree of understanding when I used the word augmentation, which got me thinking. I even read a very interesting article from Tom O’Reilly called: Don’t Replace People. Augment Them. in which he argues against technology fully replacing people. Could this concept of augmentation be applied in a broader scale to understand our data technology trends? Maybe, at least that’s what I’m going to try to lay out in this article.

    Technological progress relies on augmentation.

    That’s the first thing that struck me when I pondered on augmentation in our world, and more specifically when it comes to software. At the exception of very few, the platforms, apps and tool that we use are all based on augmentation of existing basic functions: Amazon? Augmentation of store using technology. Uber? Augmentation of taxis. Chatbots? Augmentation of chat clients. Slack? Augmentation of email + chats. Distributed/Cloud applications? Augmentation of legacy applications. To some extent even Google is an augmentation of a manual filing system. I would admit listing examples that confirm an idea that I already had is close to a logical fallacy, so I tried to find counter examples, i.e. software solutions that try to introduce completely new concepts, but could not think of any. Of course we could argue over semantics in defining what constitute true innovation versus augmentation of an existing technology, but ultimately I think it is fair to say that the most successful technologies are augmenting our experience rather than being completely disruptive, despite what most of my field would argue. Therefore, augmentation must be at least considered as part of the future of any software industry, such as the Big Data industry.

    Augmentation is better than transformation

    Human nature needs comfort, that’s why most of us prefer augmentation over disruption. By disruption, I’m talking about transforming or replacing the existing systems, not adding features: selling unpaired socks over internet is not disrupting the sock industry, despite what the TED talks would like me to believe. Seriously, when you have existing technologies, as every company does, a replacement/transformation is a hard pill to swallow. Loss of investment, knowledge, process, etc. It is especially risky and complex when talking about data layer transformation, as I argued before in this very blog. So when given a choice, augmenting existing data layers is an obvious choice for risk-advert IT organizations.

    Augmentation drives innovation

    Perhaps the most convincing argument towards acknowledging that augmentation is the future of data is the analysis of the most innovative big data software solutions: machine learning, neural networks and all of these extremely complex systems which behaviors are almost impossible to predict, even for experts. These systems are design to augment their own capabilities, instead of having a set of deterministic rules to follow. Indeed, these systems are designed to approach the capabilities of complex biological systems and therefore incorporate their “messiness”. We can’t think of big data systems using physics thinking (i.e. here is an algorithm, here is a set of parameters, this is the result expected), but we should rather rely to biology thinking (i.e. what is the results I get if I input this parameter). A great example of this type of thinking is Netflix’s Chaos Monkey, a service running on AWS to simulate failures and understand the behavior of their architecture. Self-augmentation is the principle upon which the technologies of the future are built. We understand the algorithms we input but not necessarily the outcome, which can have unintended consequences sometimes (see: Microsoft Tay), but ultimately is a better pathway to intelligent technologies. I’m a control freak, and not being able to understand a system end to end drives me nuts, but I’m willing to relinquish my sanity for the good of Artificial Intelligence.

    Conclusion

    With software Augmentation being part of our everyday life, a safer and easier way to add features to existing data layer, and the core concept of machine learning, I think it is fair to say that it is the future of Data. Did I convince myself? Yes, which is good because my opinion is usually my first go to when it comes to figuring out what I think. Seriously though, what do you think? As always, I long to learn more and listen to everyone’s opinion!

    Is patience overrated? how real-time big data affects our behavior.

    by paul 0 Comments
    Is patience overrated? how real-time big data affects our behavior.
    soon.

    If you haven’t figured it out by now, I’m fairly action driven. One of the skills that is often pointed out to me for a lack thereof is patience. If you want an illustration of my personality, I encourage you to read this comic:

    woah-woah-woah-slow-down-friend-dontcha-know-that-sometimes-you-have-to-stop-and-smell-the-flowers-flower-smelling-champion-type-a-type-b-1471399276

    Thank you @shenanigansen. Seriously, I have a problem. My cousin would tell me I need to do yoga, many of my friends would tell me I should practice mindfulness, and my Dad would tell me I should be patient. Here is my take on it: I think patience is overrated, and I think that it is the result of the technology we have at our disposal.

    The advent of real-time in Big Data

    A couple of years ago, the selling point of big data was the big in big data. Being able to store practically unlimited amount of data was a game changer. But if you look at the recent trends (see a few excerpts here, here, and here), real-time and speed are selling points. People want access to their data quickly, and I can tell you it is a major part of my every data pitch. To be fair, the shortening of time of any part of your life is a trademark of the modern era, as much as hipsters are trying to fight it (typewriters, anyone?). However, I do think that accelerating big data access and storage has and will continue to be one of the trends that impacts that acceleration the most. Indeed with the luxury to record everything in our lives, through IoT or simply by being a normal being that spends a significant portion of his time in the virtual realm (a.k.a. surfing the web or playing video games), real-time is the next game changer after personalization.

    What that means for us, the end user

    We are already a product being sold by any social media, site, fitness tracker or video game. And we already see the outcome of this by targeted ads, suggestions and so on. But these suggestions can be a bit off at times (think suggestion about something you already bought), on the account of the algorithms needing more iteration, but also by the lack of sufficient data, not because of pure lack of it but by the latency of gathering all these pieces of data together. Imagine what speed can add to these phenomena. The accuracy of the suggestions will be at times frightening, but mostly we will become more impatient. And we already see results of that. An recent example would be the reaction of retailers about using cup readers, due to their processing time. We’re talking about a few seconds of difference, but it matters to us, the end user. Personally, if I can’t use contactless payment and have to pull out my card like and animal, I’m annoyed. And this trend will continue folks, make no mistake.

    Conclusion & Limitations

    So why should I be patient? Why should I have to wait for a specific outcome? The frustration comes from the fact that many situations for which you are impatient are not limited anymore by logistics themselves but by inaction, at least in the business world. But my point is the following: the world of data and therefore to some extent our personal world is moving to real-time. You can decide to be an outsider and there is of course value in this, or you need to adapt. The value of waiting for a possible different situation is overrated. For instance, let’s say you have to make a life changing decision. Chances are that the amount of data that you will have to make that decision now versus 3 weeks from now is going to be roughly similar. So why not take the decision now? Why be patient?
    Of course, this may sound like I’m advocating for having results now now now now, like a 3 year-old (and I talk from experience). This is not it, valuing highly hard incremental work towards a long term goal is extremely important, but patience as an excuse to inaction isn’t.

    5 reasons you should go to the Cassandra Summit 2016

    by paul 0 Comments
    5 reasons you should go to the Cassandra Summit 2016
    I did a search for summit on royalty free images and this is what I got.

    For those who don’t know, I’ll be attending the Cassandra Summit 2016 in San Jose (possibly talking, but this is still in the works). The Cassandra Summit is organized by DataStax, the Cassandra enterprise company with which my company, K2View, is a partner.
    I’m super excited about this summit, and participated to last year’s edition. I thought I’d share the excitement by writing a total click-bait of an article, expressing my genuine feelings of excitement. Seriously, I am excited about this summit. Of course my judgement is biased by the fact that I am part of the show, but I would not be working for who I am working now if I was not honestly passionate about this technological environment and the events that surround it. So allow me the right to be a nerd and share this with you: 5 reasons you should go to the Cassandra Summit 2016.

    1. To learn about market-leading technologies

    Like the paradoxical man would say, it goes without saying but it’s better said than not. Obviously, this should be the first thing you look for when attending that kind of summit. First, Cassandra and DataStax Enterprise are used by companies that are the leaders of our day to day technological life (e.g. Netflix, Apple): at this summit you get to talk to the guys that implemented these clusters and understanding their deployment is always fascinating. Perhaps even more interestingly, you get to learn about new companies leveraging Cassandra in use-cases you never thought about. If you play your cards right, you should be able to overload your brain with new information, which is always a good feeling.

    2. To listen to people that are smarter than you

    Granted, this is not very hard for me. Take a look at the conference agenda and the speaker list though. I have a professional crush on Patrick McFadin, Chief Evangelist at DataStax, who was my first encounter with Cassandra. I really y enjoy its delivery, and always have fun listening to him, but he is one of many for that conference.

    3. To genuinely connect with other data nerds

    With our (professional) lives going at 100 miles per hour, we don’t get a chance to stop and tell someone: the gossip protocol is one of the coolest things. If you try to tell that to someone that does not work in the field, he probably won’t know what you’re talking about; if you try to tell that to someone in your field, it either comes out as a platitude or you simply never get time to enjoy a very nerdy conversation. You get to do that at the Cassandra Summit. If you’re participating, grab a beer and a snack and come talk to me about anything you find cool, I’ll listen.

    4. To witness cool logistic hacks

    Two words for you: whiteboard tables. This blew my mind last year, being able to doodle with a marker on the very table you sit at is amazing. Why isn’t every conference room table a whiteboard? I will never know. I can’t wait to see what cool things the organizers will come up with this year.

    5. To have fun

    Look. Work is arguably the largest part of our lives outside of sleeping, It is not every day we get to be in an environment full of new exciting information, surrounded by extremely intelligent and passionate people, where everything has been thought of to the last detail. I like to think of it as an all-inclusive resort for data nerds. I’ll be damned if I don’t enjoy every minute of it and so should you, so please, enjoy yourself!

    Becoming intimate with Big Data

    by paul 0 Comments
    Becoming intimate with Big Data
    Come on guys, we are all made of blue glass inside.

    About a year ago, I had a chance to have a discussion with one of the smartest person I’ve ever met, currently a board member of our company. This man has not only built his fortune out of nothing by being able to identify trends in the market and position his companies accordingly, he is also a genuine human being that forces admiration. But I digress. During this conversation, he mentioned that one of the things that helped him succeed was his capacity to understand the intrinsic values that define a generation. As an example, he mentioned that his generation, during the 90s was all about financial success. The following generation, the 2000s kids was all about fame (big brother anyone?). Then he told me that he was yet to figure out what my generation was all about. Since then I have been able to understand what makes my generation tick. After about a year of poking around, I think that I found the answer: my generation is the selfish generation. We are all selfish and think about our individuality. Look around, it’s selfies, freedom above all, my Facebook or my privacy, my right for an opinion, my right for an outlet to express my idea. I’m including myself in this of course, I am writing a blog after all. What’s interesting about this realization is to understand the consequences it has on the market, and specifically in a domain in which I have at least a bit of expertise: Big Data.

    Big Data is driven by the individual

    In a recent report from Forrester (link), companies were asked “Which use cases are driving the demand for continuous global data availability at your organization?”. The most common use case representing 52% of the answers received was 360-degree view of the business, product. This means that more than half of the big data drivers are coming from the consolidation of data to represent an individual unit of business. Make no mistake, in many cases, the product is you. What drives big data is the intimate knowledge of the individual. This makes perfect sense if you agree with the premise of my first paragraph: big data, and the market in general wants to cater to the selfish generation, and therefore is implementing solutions to know each individual personally.

    This report is only one of numerous examples corroborating what I’m trying to explain here. We see machine learning algorithms and data scientists arguing about what algorithm is the best to target individual with the right add. IoT is tracking and personalizing every aspects of our lives. Anecdotally, I even witnessed the re-naming of a data analytics team in a large company to “Your Data”.

    What does this mean for your Big Data implementation

    First you need to consider that in order to be able to keep a relevant edge on your competition, you must be able to have access to a solution to individualize your data collection. I have expressed this opinion quite a bit, but I believe that ultimately individualization of data is a use case that requires its own solution. There is no magic end to end consolidation platform that will do everything. You need to consider a big data individualization platform, as opposed to a big data generic platform that you then try to morph in order to cater to your individualization needs. Once implemented, this data individualization platform can be leveraged to implement further features like real-time provisioning, data virtualization, personalized analytics or real customer centric support, but your platform must be intimate with your unit of business first.