paul vidal - pragmatic big data nerd

Yearly Archives

21 Articles

Data As A Microservice: the future of data architecture

by paul 0 Comments
Data As A Microservice: the future of data architecture

Let me preface this article with an understatement: sometimes, enterprise architecture can be complicated. Large companies run thousands of applications, multiplied by dozens of environments replicated for testing, user testing, sandboxing, accumulated over years of acquisitions, re-architecturing (yes, it is a word I made up), and experiments all with the purpose of driving business forward. Like any complex system, human beings have been trying to make sense out of it by conceptualizing models and architectures aimed at simplifying the system thus making it more efficient, robust, scalable, secure, and spiritually vertuous (OK, maybe not the last part, although can a piece of software be inherently virtuous? A question for another day). With all this in mind, I would like to take some time to reflect on one of these concepts: Micro-Services and how this concept can apply in the realm of data management.

Microservices VS Enterprise Service Bus

First introduced in 2011 during workshop of software architects held near Venice in May 2011, Microservice Architecture is defined by James Lewis as follows:

The term “Microservice Architecture” has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data.

Microservice architecture is a subset of a Service Oriented Architecture (SOA), aiming at distributing microcomponents to deploy applications as opposed to a centralized application integraiton layer, often called Enterprise Integration Layer (EAI) or Enterprise Service Bus (ESB). Leaving aside the obvious angry developer argument stating that all of this is marketing jargon and rebranding of the same products, it’s interesting to take note of a fundamental trend I covered before in this blog: enterprise are looking to implement agile enviroment, which extremely granular elements in order to ensure business reactivity. The dream of the all-integrated all-consolidated entreprise layer is fading.

Data As A Microservice

In a very similar manner, the idea of single source of truth containing all the enterprise data is coming to an end. And, unlike some data lakes proponents would like to make you believe, it is not because of the pitfalls of traditional technologies that can’t handle large volume of data or distribute it efficiently. Building a single centralized source of data is an utopia. Instead, companies are now shifting their focus towards platforms enabling rapid agnostic data integration, agile data schema modification, and complete distribution. These platforms can then be used in a microservice architecture, making them Data As A Microservice platforms. I’ll admit, I may have made that term up too because it sounds cool, but it is very important to note for you data vendor, data scientist or data consumer (CIOs and CTOs organization). The future of data microservice-like agility, not monolithic unification.

References:

http://martinfowler.com/articles/microservices.html
http://stackoverflow.com/questions/25501098/difference-between-microservices-architecture-and-soa
https://www.voxxed.com/blog/2015/01/good-microservices-architectures-death-enterprise-service-bus-part-one/
https://en.wikipedia.org/wiki/Microservices

What is the most underrated aspect of software development and why is it measurability?

by paul 0 Comments
What is the most underrated aspect of software development and why is it measurability?

Designing and developing software is complicated. I have heard there might even be a full industry gathering experts in this domain, and that it could be doing well. Not sure if it will ever be a thing. All joking aside, theories about the optimum way to approach software development are numerous and constantly evolving, which is excellent. Today however, I want to talk about an underrated concept, especially within the realm of software development: measurability. Despite online dictionaries results, I’m pretty sure I just made up that word, or at least the concept attached to it vis-a-vis software development, so let me define it.

What do you mean by measurability and why should I care about it?

Within the realm of software development, measurability can be catalogued in the same category as other transversal high-level concepts, that must be considered at each and every step of the development process, such as user experience, performance, scalability, re-usability and security. Measurability in this sense is the idea that each and every feature of you develop for your software can be measured for popularity and efficacy in order to ultimately evaluate its necessity. That is a lot of y-ending words, which should have convinced you already. Hoping it didn’t, let me explain you why it is important to consider. First, I believe that the importance of these types of high-level concepts does not need further justification: we have all witnessed software failures when their development ignored one of these key concepts, security being the one making the front page most often. The impact of measurability is more subtle but nonetheless crucial. Without measurability, decisions you make about feature prioritization or design become irrational. For instance, if you are developing an API that contains multiple methods of access, if you are unable to measure their popularity or efficacy you will end up with either features that are being costly maintained for no benefits to your end user or features that are massively used by necessity but incrementally building your end user’s frustration. This is a very simple example but it illustrate an underlying notion that we rarely see in the world of zeros and one: irrationality. Indeed, a piece of software is usually extremely rational and quantifiable, which makes evaluating performance, scalability, security or even re-usability a relatively easy mathematical problem. With the advent of software popularization we see user experience has been on the forefront of Agile development, making customer feedback a key piece of feature release. What I am proposing here is to go one step further. Whenever developing a feature for your software, one should ask himself: how will I know if this feature is necessary or not? How will I test for it?

Implementing measurability

Implementing measurability acknowledges the fact that you are operating in an uncertain environment, which inherently makes its implementation uncertain. That being said, a good starting point is to measure its use and performance and then compare it to the other features you develop. This measurement and analysis can be done using trace or audit mechanisms, which, bonus, you should implement anyway to cater to security. A more robust approach would be to first select the metrics you want to measure for each software feature and have a dedicated module to implement measurability over those metrics. You may think it’s an overkill but with the advent of scalable and cheap storage, why not do it?

Beyond software development

Big Data, monitoring, analysis data science, all of these concepts are design to increase the world’s measurability, and they are definitely what everyone talks about now. And while the idea of being data driven in any aspect of our lives, from corporate management to personal fitness, it has yet to really make an impact within the realm of software development, or at least the tools dedicated to measurability only are scarce. That being said, making rational decisions does not seem to be as appealing to me as it is for the rest of the world, which could explain this scarcity.

Who decided that stored procedures should not be commented?

by paul 0 Comments
Who decided that stored procedures should not be commented?

I’ve been spending the past couple of weeks working on stored procedures. Glimpsing into my career so far, I realize how much stored procedures are the backbone of many organizations dealing with data. Stored procedures are something of a potpourri between magic behavior, bespoke black boxes, and the sedimentation of code layers accumulated over years of feature additions implemented by a battalion of sometimes well-intented PL/SQL programmers with tight deadlines. Furthermore, stored procedures, more than any other type of data manipulation, are what the actual live production systems rely upon. It is not uncommon for a piece of software to have hundreds of store procedures essential for it to work, and for good reason. Indeed, store procedures are extremely efficient. So much so that even unoptimized pieces of code harboring redundant test and an unreasonable amount of nested outer joins still run in a few milliseconds. Efficient they are. But you know what they are not? Commented. Seriously, the packages I worked with recently contain tens of thousands of lines of code but never contain more than 10 lines of comments, mostly containing something along the lines of “– 10/10/2014 added by Jay” or “– requirement R3045”. And as far as I can remember, relying solely on my flawed memory and anecdotal evidence, this is the case with the vast majority of stored procs. Therefore, after I spending some time curved into a ball crying, I asked myself: “why?”.

Common consensus about commenting code

Childishly, I first assumed that every piece of code should be commented, and the only reason for not commenting code would be laziness/lack of time/lack of understanding/hatred for whomever would read your code in the future. I was obviously misguided as one often is when assuming anything to be simple. Indeed, they are many times when commenting renders your code less readable, or is an excuse for bad coding. This article in particular, Common Excuses Used To Comment Code and What To Do About Them does an excellent job at highlighting when commenting is sub-optimal:

The code is not readable without comments. Or, when someone (possibly myself) revisits the code, the comments will make it clear as to what the code does. The code makes it clear what the code does. In almost all cases, you can choose better variable names and keep all code in a method at the same level of abstraction to make is easy to read without comments.

    We want to keep track of who changed what and when it was changed. Version control does this quite well (along with a ton of other benefits), and it only takes a few minutes to set up. Besides, does this ever work? (And how would you know?)
    I wanted to keep a commented-out section of code there in case I need it again. Again, version control systems will keep the code in a prior revision for you – just go back and find it if you ever need it again. Unless you’re commenting out the code temporarily to verify some behavior (or debug), I don’t buy into this either. If it stays commented out, just remove it.
    The code too complex to understand without comments. I used to think this case was a lot more common than it really is. But truthfully, it is extremely rare. Your code is probably just bad, and hard to understand. Re-write it so that’s no longer the case.
    Markers to easily find sections of code. I’ll admit that sometimes I still do this. But I’m not proud of it. What’s keeping us from making our files, classes, and functions more cohesive (and thus, likely to be smaller)? IDEs normally provide easy navigation to classes and methods, so there’s really no need to scan for comments to identify an area you want to work in. Just keep the logical sections of your code small and cohesive, and you won’t need these clutterful comments.
    Natural language is easier to read than code. But it’s not as precise. Besides, you’re a programmer, you ought not have trouble reading programs. If you do, it’s likely you haven’t made it simple enough, and what you really think is that the code is too complex to understand without comments.

Why this consensus does not apply to stored procedures

As much as these arguments make sense I don’t think they apply to store procedures:

    “you can choose better variable names and keep all code in a method at the same level of abstraction”: You can’t easily change table fields and names, nor can you cut a big nested SQL statement gracefully.
    “Version control does this quite well”: Version control is almost never implemented for stored procedures.
    “I wanted to keep a commented-out section of code there in case I need it again.”: OK, that’s just BS.
    “[complex code] is extremely rare.”: Nested SQL queries are inherently complex and MUCH less readable than traditional code.
    “Markers to easily find sections of code.”: I never saw a problem with that.
    “you ought not have trouble reading programs”: Except queries are the opposite of natural language. Please, please, please SQL developer, let me know why you are doing this particular join

To summarize, I still don’t understand why stored procedures are generally not commented, while it would seem they are the type of code that could benefit the most from comments. Maybe NoSQL will change this, but in the meantime, I will start this crusade, and make sure people explain their code, yo!

You should not try to be more productive

by paul 0 Comments
You should not try to be more productive
Coffee, a gift from the gods

You should not try to be more productive

Sometimes, my job requires long hours to meet impossible deadlines. I think that this is a situation with which many people can empathize. So much so that the self help industry is riddled with recipes to optimize your work habits, and become more productive. Leaving aside the overwhelming pseudoscientific nature of most of the self-help industry, productivity has become a market. Apps, website, gurus, all working to make you work better and more efficiently. Here what I think: trying to be productive is counter productive.

Where the productivity industry fails

If you start diving into productivity advice, you will start to have the feeling that productivity is like a cauldron. A cauldron that accumulates anyone’s latest advice concoction, no mater its origin. Want to be more productive? Eat Kale. Listen to music with a tempo accelerated by 5%. Take notes on paper because it sticks in your memory better. Do adult coloring. Play brain training apps that have no scientific backup what so ever. Always reach inbox zero or always make sure that you have at least 10000 emails unread. Anything goes. Anything will make you more productive. In hindsight, fantasizing about being more productive is a great way to procrastinate. To be perfectly transparent, I’m guilty of having read a lot of these articles in my days. OK, literally (in its proper sense) a few months ago. I could argue that I was just reading them for entertainment but I’m more honest than that. I get fooled often, which is why I have a lot to write about. Take a step back on these productivity advice, and I think you will reach the same conclusion as I do: most of them are a mix of unproven facts, common sense and non applicable pieces of advice that take away actual productive time.

What I do instead

But Paul, does that mean that we are doomed to be as efficient as we are today with no improvement route what so ever? Yes. Also, life has no meaning. But that’s besides the point. Of course, there is always room for improvement. So here is what I suggest to do instead of trying to be productive: 1. Prioritize. 2. Execute. That’s it. Prioritizing is a necessary evil but should not be taking too much of time. The best way to do it is to do it on small chunks, like an everyday to do list. Most of the time should be spent on executing however. Once I’m executing something, I never think twice about the priority. This is why I run every day without ever thinking of what I could do instead. It is on my list, I have to do it, so I do it. I do not think that going any deeper than this has any value, really. Just list what you need to do, and do the things. It’s important to note that sometimes, due to lack of time of ever unpredictable facts of life, execution fails. This just means that another prioritizing cycle needs to happen.

After thoughts

As many of my less technical posts, this article is obviously an opinion piece. The reason I started this blog in the first place is to share things I learned throughout my career without anyone spelling it out simply to me. It is not aimed to be definitive, since my opinions are prone to change, but instead is aimed at provoking conversations and trigger perhaps more thorough scrutiny and even scientific evaluation of the methods of productivity. It would be interesting to dive into this subject, but unfortunately for me, my tasks are prioritized, and this is not one of them, so…. back to work!

On the simplicity of experts systems

by paul 0 Comments
On the simplicity of experts systems

Today I want to share with you an unfinished thought. Perhaps because I have been working non stop all week, perhaps because people keep complaining about apple correct strategy of fleshing out their devices’ design (remember how everyone complained when they removed the CD player?), or perhaps because I’m not sure I ever will reach a satisfying definite answer on the matter. At any rate, here is the question: what’s the optimal ratio simplicity/features of experts software systems?

The state of the art

If you’ve ever been in contact with someone who has to use a computer system beyond browsing the web or desperately trying to string 1000 words together the night the essay is due, i.e. had to work with any software at all from point of sale retail, to graphic design passing by cloud CRMs, you will notice a common thread: that piece of software is awful. These complaints usually boil down to one of these three categories: the software is either buggy, too complicated to use or missing functionalities. Leaving aside the unstable aspect inherent to the current process of software development, the other categories seem to be two ends of the same spectrum. And this is what keeps me up at night. Probably for 30 seconds, then it’s my kids crying.

Developing software in a customer is king world

Having worked for years for B2B software providers, where sales cycles are long, requirements are complex and clients must be satisfied, I can tell you that most of these pieces of software require training in order to be used to their full potential (which in part drives the sales through maintenance and support costs). In this world, roadmaps always add functionalities, new modules and new ways to customize a product. Ultimately, this renders the sales, delivery and support extremely complex (and why I have a job).

A new era of software delivery

However, in the recent years we have seen a new development with Software As A Service. Companies are limited to the SaaS current version’s functionalities. Because of the effectiveness of the cloud sales model and the centralized maintenance of these pieces of software it is much easier for software publishers to prioritize, consolidate and even drop features according to what clients are actually using. With SaaS, customization is often out of the question. Yet, more and more companies are moving to software as a service and have clear initiatives to simplify their current eco-system.

My dilemma

First, since when did dilemma lose its ‘n’? I digress. In all seriousness, and as I argued before I do believe that simplicity is key to software success. Thus, when do you decide to add functionalities to your software to respond to market demand? How do you ensure that this addition is not only relevant but it also does not throw away your user experience, brand and sales process? Ultimately, I think that finding this balance is really more art than science. And this is always going to bug my extremely rational mind.

Choosing the correct DBMS – Gartner Reprint review

by paul 0 Comments
Choosing the correct DBMS – Gartner Reprint review
What do you mean my speed isn't linearly scalable?

If you haven’t had the chance to look at it yet, I encourage you to read this gartner reprint: Critical Capabilities for Operational Database Management Systems. This report is extremely interesting if you’re into data at all. I’m in too deep, so I’m going to talk about this report for the whole length of this marvelous post.

The importance of specific use-cases

The first thing that jumped out to me, was that Gartner uses the word DBMS Gartner, thus highlights the fact that the dichotomy between traditional relational database management systems and what has been labelled “NoSQL” is fading out. Instead, Gartner advises to “Classify the use cases under consideration and map them to the costs, deployment options and skills requirements of the products evaluated here.”. This is extremely important and a departure from some of the preconceptions I witness amongst my fellow professionals. Often enough, I get confronted with consultants trying to categorize DBMS by capabilities (distribution capabilities, support of languages, etc.). More importantly, these platforms are marketed through those capabilities. As I argued before, and confirmed by this report, end users, consultants and software provides alike should select, recommend and market according to the use-cases at which they excel instead of the capabilities inherent to a specific platform (see previous post: The importance of specialization in software sales).

Evaluation criteria

But enough surrendering to my own confirmation biases in order to pat myself on the back and delude myself into thinking that my observations may be going in the same direction as Gartner’s. The report evaluates different vendors (selected themselves base on the set of Inclusion Criteria), using the following criteria:

  • High-Speed Ingest and Processing
  • ACID Support
  • Tunable Consistency
  • Multimodel Support
  • Automated Data Distribution
  • Cloud/Hybrid Deployment
  • Programmability for HTAP
  • Administration and Management
  • Security
  • These criteria are then weighted according to four different use cases:

  • Traditional Transactions
  • Distributed Variable Data
  • Lightweight Events and Observations
  • Hybrid Transactional/Analytical Processing (HTAP)
  • I’m not going to spend time describing the criteria, Gartner put up very readable charts to compare the different vendors. In short, it seems that Oracle is leading the traditional transaction world while DataStax is leading the distributed one. On a personal note, I’m super excited for DataStax, I get to work with many members of their team and the company I am working for leverages their solution, so it’s excellent recognition.

    I would however perhaps have had another two criteria: integration ecosystem and cost. Regarding the latter, I would have created two sets of charts: one considering the cost and one not considering it. Of course, I understand cost is a delicate and fluctuant subject and I understand Gartner’s decision. Integration ecosystem however is very important. Being able to evaluate how easy it is to integrate and use data once it is in these DBMS is extremely important when making an architecture choice.

    Personal Conclusion

    I’m always impressed by the conciseness of Gartner reports. This one does not fail in that regard, and gives a very good basis to one evaluating data management systems. That being said, and to make sure that horse is dead, think of your use-case before going for an RFP. Many DBMS can do many things, but few excel at all use-cases.

    3 essential features of the perfect software

    3 essential features of the perfect software
    Hi Mark. John left the company 3 months ago. Can you help us find a bug in his code? It is somewhere in this file.

    I have recently been thinking quite a bit about what make a piece of software successful. Very early in my career, I got to temper my idealistic view of computer science and the world in general, quite well captured by the saying “you don’t have to be the best to be the first”. As I progress in my professional journey, I have been trying to identify key aspects that make a software go head and shoulders above their competitors or see a exponential growth in a niche of the market not exploited at the time. While it is most likely impossible to find the 3 magic words you have to pronounce to make the perfect software appear, I am still going to pretend I did this, for web traffic purposes, because I have no integrity. All bad humor aside, and for the sake of readability, I did try to funnel my thinking into 3 major aspects which, combined, are a successful piece of software. Quick aside: the purpose of this piece is not to dive into what technical aspect of a piece of software is valuable, but rather to present the software features to which the market responds positively. With that out of the way, let me present you the current state of my cogitation: the perfect software is:

    screen-shot-2016-10-07-at-8-50-30-am

    SIMPLE

    Simplicity is essential for the end-user to open their eyes. While the algorithms, architecture and other under-the-hood building blocks can and will most likely be complex, the idea here is to present something that is simple to understand. Simplicity can be driven by multiple factors. It could be for instance the front end of your application. This is why UI/UX is such a sought-after skill, and while it is predominant in the B2C industry it is severely underused in the B2B world. It could also be driven by the product packaging. If building a platform with many potential uses, packaging them into specific solutions recognized by the industry is a fair way to achieve simplicity. It could also be targeted: focusing on solving the problems of one vertical for instance.

    NON-INTRUSIVE

    This characteristic is epitomized by the success of cloud computing. Software As A Service particularly is the perfect example of non-intrusiveness being a successful business model: end users do not want to have to install and maintain software. It is an obvious cost reduction feat for big enterprises, it is just as much a reality for consumers: no one wants to have to install a software on their computer, and the ones we do install are the ones we hate the most (he who had no complaint about Microsoft Word cast the first stone). Even more interesting, web software like appointment booking, ticket sales and so on are considered websites and not pieces of software, but I digress. That being said, and as I argued as early as last week, SaaS isn’t the only model and non-instrusiveness can be characterized by other traits. Backward compatibly or maintenance of current set of existing skills and application is a great way to ensure a non intrusive model. One of the reason why I think disruption is nonsense, see previous rant.

    ACTIONABLE

    Being simple and non-intrusive are essential qualities, but your software must actually do something in order to be valuable. The important question here is: what’s in it for your user? What is the value? And I don’t think that a value proposition such as “we are doing it better than the others” is enough, nor is “imagine what you could do with that”. You need to be able to be able to show tangible results right off the bat, drive your customer through a story of what they will be able to do now that they weren’t able to do before. This feature is in my opinion one of the hardest and most often forgotten feature for a software to possess, especially put in relation with the other two. Indeed, innovation while maintaining non-intrusiveness could be seen as an oxymoron. In reality, the perfect unique innovative actionable value that of which no one thought before does not exist. That being said, many tech companies today start by building technical prowesses instead of focusing on creating value. My recommendation is to think about the value first, then focus on making the solution simple and non-intrusive, which is ironically why this article is written in the opposite order.

    Conclusion

    Can a piece of software truly possess all these qualities fully? Probably not, but at I think it is at least an ideal to strive for. As mentioned awkwardly at the beginning of this article, this is also a very preliminary assessment of my thought process. I do believe that if a piece of software possess a good balance of theses 3 features, it is set for success. More importantly, I think that these features should drive the development of new softwares. I know that I have a few ideas about what to develop, and I’m going to make sure to keep that in mind.

    Should all your data move to the cloud?

    by paul 0 Comments
    Should all your data move to the cloud?

    I recently on multiple occasions engaged into conversations about whether or not Fortune 500 organizations are ready to move all their data to the cloud. While I’m not arguing about the benefits of distributed systems, I did encounter a significant number of organizations that are not ready to move to a SaaS model. Despite the obvious security reasons, I think it is an maintaining control over the core of your business to drive innovation is crucial (see Telsa example). Furthermore, many organizations’ strategy seem to be going towards build IaaS/PaaS and eventually SaaS within their own IT. These tendencies lead me to believe the dichotomy between SaaS and traditional in-house implementation isn’t absolute. Therefore, the market will see the advent of solutions enabling control over internal data while leveraging SaaS functionalities.

    Since I work for a company offering one of these solutions, I wrote a white paper about it, so here it. Enjoy the read!

    screen-shot-2016-09-30-at-8-36-50-am

    You will never reach your full potential… and that’s OK.

    by paul 0 Comments
    You will never reach your full potential… and that’s OK.
    I don't care, I'm digging a hole to the water.

    I am an ambitious person. In my mind, whenever I do something (I’m refraining myself to use the word accomplish because I’m never truly satisfied), I always have this Batman Begins scene in my head: Rachel sees Bruce Wayne running out with two models after buying out a hotel; Bruce Wayne poses and says: “Rachel, all- all this, it- it’s not me, inside, I am, I am more.”. For some reason, mostly because I am a Batman nerd, this scene resonates so much. The funny thing is that I sometimes forget Rachel’s response: “Bruce, deep down you may still be that great kid you used to be, but it’s not who you are underneath, it’s what you *do* that defines you.”. Here it is: if you think you are more, then show that you can do more. I was actually reminded of Rachel’s wisdom this week by a friend/mentor of mine who enlightened a path for me to do more and get better at my job. Looking back on it, I realized that I managed to have an unconscious belief that I reached a finite knowledge/expertise about my job which could not be any further from the truth. That’s why in this blog post, I’d like to give you what really helps me going: You are never done. You don’t get to finish the game. And that’s pretty awesome.

    The hinder nature of achievement

    The first thing to acknowledge is the fact that the belief of achieving a certain potential is crippling. As mentioned in my very recent life experience, my unconscious belief of having reached a certain expertise in my job prevented me to get to the next steps of my career. But this is true for a lot of other things. To give you another personal example, being a fairly dedicated runner, you get to assume a few paces at which you run a certain type of races, e.g. this pace is my 5K pace, and it’s very hard to re-teach your brain to think that you can go faster than your “5K pace” when racing. It’s when you have no pre-conceived notion of what you are capable of that you can improve. But here is the secret: when you do the best race of your life, you did the best race of your life… so far! And in a similar manner that the best way to combat cognitive biases is to deliberately scrutinize them when trying to form a fallacy-free thought, you should look for your unconscious beliefs of potential and strip them of their crippling nature.

    The value of consistency

    The fun thing when you realize that all of your preconceived beliefs about your current potential are ill-informed, is that you get to contemplate the abyss of the work that needs to be done in every aspect of your life if you want not to get schackled by them. If you ask me, staring into the abyss is always fun. All kidding asides, it raises a very difficult question: if I can always improve, how do I get better? I think that this plays very nicely with one of the most important core belief I and many share: consistency. Let me take an example from the fitness realm again, in this case evaluating the benefits of muscle confusion versus progressive overload (spoiler, the title of this article: ‘Muscle Confusion’ Is Mostly a Myth). Too often we are confronted with miracle fitness solutions, founded on the idea that dramatically shaking things up will enable you to unlock your maximum potential. As debunked here, the only method with tangible results is consistent incremental improvement. I think we can draw a fairly straight forward corollary to the selection of our method of improvement. A muscle confusion-like does not work for improving the skills that we are trying to improve here: you are trying to improve something at which you are already proficient, which implies that you have already done a lot of work in figuring out what works best and what doesn’t. The only way to get better is to slowly augment the resistance. Look at what you have done so far. If you’re comfortable with it add more until you get comfortable. Repeat.

    Coping with never being done

    Now if you agree with me that you will never reach a finite potential and that the only way to improve is consistent slow incremental changes, this can be a little overwhelming. Since I agree with myself, at least for the next 10 minutes, I am a little overwhelmed. The way that I found I could cope with this incessant work that will eventually lead to my death is three fold. First, I plan things out. I set actionable, trackable short-term goals. For instance, I wanted to get better at writing and communicating so I set myself a goal of writing a blog post every week. I have done that so far, even if I missed one week over that past few months. Secondly, I prioritize. I acknowledge for instance that I don’t want to sacrifice some of the time I am spending working or running playing Magic, and that therefore I will not get to play the pro-tour any time soon or ever for that matter. Understanding what you decide not to not improve is crucial. Finally, I allow myself to enjoy the present. Granted, I’m not super good at it as of today, but I haven’t reached my full potential yet!

    The future of Data is Augmentation, not Disruption

    by paul 1 Comment
    The future of Data is Augmentation, not Disruption
    I'm disrupting the light bulb market by enabling wireless. I call it "Photoshop"

    I spent last week enjoying the Cassandra Summit, so much that I did not take the time to write a blog post. I had a few ideas but I chose quality over quantity. That being said, something interesting happened at the summit: we coined the term “augmentation” for one of my companies key go to market use case, instead of data layer modernization or digitalization. even got the opportunity to try both terms to the different people visiting our booth. In this extremely small sample, people really tended to have a much better degree of understanding when I used the word augmentation, which got me thinking. I even read a very interesting article from Tom O’Reilly called: Don’t Replace People. Augment Them. in which he argues against technology fully replacing people. Could this concept of augmentation be applied in a broader scale to understand our data technology trends? Maybe, at least that’s what I’m going to try to lay out in this article.

    Technological progress relies on augmentation.

    That’s the first thing that struck me when I pondered on augmentation in our world, and more specifically when it comes to software. At the exception of very few, the platforms, apps and tool that we use are all based on augmentation of existing basic functions: Amazon? Augmentation of store using technology. Uber? Augmentation of taxis. Chatbots? Augmentation of chat clients. Slack? Augmentation of email + chats. Distributed/Cloud applications? Augmentation of legacy applications. To some extent even Google is an augmentation of a manual filing system. I would admit listing examples that confirm an idea that I already had is close to a logical fallacy, so I tried to find counter examples, i.e. software solutions that try to introduce completely new concepts, but could not think of any. Of course we could argue over semantics in defining what constitute true innovation versus augmentation of an existing technology, but ultimately I think it is fair to say that the most successful technologies are augmenting our experience rather than being completely disruptive, despite what most of my field would argue. Therefore, augmentation must be at least considered as part of the future of any software industry, such as the Big Data industry.

    Augmentation is better than transformation

    Human nature needs comfort, that’s why most of us prefer augmentation over disruption. By disruption, I’m talking about transforming or replacing the existing systems, not adding features: selling unpaired socks over internet is not disrupting the sock industry, despite what the TED talks would like me to believe. Seriously, when you have existing technologies, as every company does, a replacement/transformation is a hard pill to swallow. Loss of investment, knowledge, process, etc. It is especially risky and complex when talking about data layer transformation, as I argued before in this very blog. So when given a choice, augmenting existing data layers is an obvious choice for risk-advert IT organizations.

    Augmentation drives innovation

    Perhaps the most convincing argument towards acknowledging that augmentation is the future of data is the analysis of the most innovative big data software solutions: machine learning, neural networks and all of these extremely complex systems which behaviors are almost impossible to predict, even for experts. These systems are design to augment their own capabilities, instead of having a set of deterministic rules to follow. Indeed, these systems are designed to approach the capabilities of complex biological systems and therefore incorporate their “messiness”. We can’t think of big data systems using physics thinking (i.e. here is an algorithm, here is a set of parameters, this is the result expected), but we should rather rely to biology thinking (i.e. what is the results I get if I input this parameter). A great example of this type of thinking is Netflix’s Chaos Monkey, a service running on AWS to simulate failures and understand the behavior of their architecture. Self-augmentation is the principle upon which the technologies of the future are built. We understand the algorithms we input but not necessarily the outcome, which can have unintended consequences sometimes (see: Microsoft Tay), but ultimately is a better pathway to intelligent technologies. I’m a control freak, and not being able to understand a system end to end drives me nuts, but I’m willing to relinquish my sanity for the good of Artificial Intelligence.

    Conclusion

    With software Augmentation being part of our everyday life, a safer and easier way to add features to existing data layer, and the core concept of machine learning, I think it is fair to say that it is the future of Data. Did I convince myself? Yes, which is good because my opinion is usually my first go to when it comes to figuring out what I think. Seriously though, what do you think? As always, I long to learn more and listen to everyone’s opinion!