paul vidal - pragmatic big data nerd

Tag Archives

6 Articles

What is the most underrated aspect of software development and why is it measurability?

by paul 0 Comments
What is the most underrated aspect of software development and why is it measurability?

Designing and developing software is complicated. I have heard there might even be a full industry gathering experts in this domain, and that it could be doing well. Not sure if it will ever be a thing. All joking aside, theories about the optimum way to approach software development are numerous and constantly evolving, which is excellent. Today however, I want to talk about an underrated concept, especially within the realm of software development: measurability. Despite online dictionaries results, I’m pretty sure I just made up that word, or at least the concept attached to it vis-a-vis software development, so let me define it.

What do you mean by measurability and why should I care about it?

Within the realm of software development, measurability can be catalogued in the same category as other transversal high-level concepts, that must be considered at each and every step of the development process, such as user experience, performance, scalability, re-usability and security. Measurability in this sense is the idea that each and every feature of you develop for your software can be measured for popularity and efficacy in order to ultimately evaluate its necessity. That is a lot of y-ending words, which should have convinced you already. Hoping it didn’t, let me explain you why it is important to consider. First, I believe that the importance of these types of high-level concepts does not need further justification: we have all witnessed software failures when their development ignored one of these key concepts, security being the one making the front page most often. The impact of measurability is more subtle but nonetheless crucial. Without measurability, decisions you make about feature prioritization or design become irrational. For instance, if you are developing an API that contains multiple methods of access, if you are unable to measure their popularity or efficacy you will end up with either features that are being costly maintained for no benefits to your end user or features that are massively used by necessity but incrementally building your end user’s frustration. This is a very simple example but it illustrate an underlying notion that we rarely see in the world of zeros and one: irrationality. Indeed, a piece of software is usually extremely rational and quantifiable, which makes evaluating performance, scalability, security or even re-usability a relatively easy mathematical problem. With the advent of software popularization we see user experience has been on the forefront of Agile development, making customer feedback a key piece of feature release. What I am proposing here is to go one step further. Whenever developing a feature for your software, one should ask himself: how will I know if this feature is necessary or not? How will I test for it?

Implementing measurability

Implementing measurability acknowledges the fact that you are operating in an uncertain environment, which inherently makes its implementation uncertain. That being said, a good starting point is to measure its use and performance and then compare it to the other features you develop. This measurement and analysis can be done using trace or audit mechanisms, which, bonus, you should implement anyway to cater to security. A more robust approach would be to first select the metrics you want to measure for each software feature and have a dedicated module to implement measurability over those metrics. You may think it’s an overkill but with the advent of scalable and cheap storage, why not do it?

Beyond software development

Big Data, monitoring, analysis data science, all of these concepts are design to increase the world’s measurability, and they are definitely what everyone talks about now. And while the idea of being data driven in any aspect of our lives, from corporate management to personal fitness, it has yet to really make an impact within the realm of software development, or at least the tools dedicated to measurability only are scarce. That being said, making rational decisions does not seem to be as appealing to me as it is for the rest of the world, which could explain this scarcity.

Who decided that stored procedures should not be commented?

by paul 0 Comments
Who decided that stored procedures should not be commented?

I’ve been spending the past couple of weeks working on stored procedures. Glimpsing into my career so far, I realize how much stored procedures are the backbone of many organizations dealing with data. Stored procedures are something of a potpourri between magic behavior, bespoke black boxes, and the sedimentation of code layers accumulated over years of feature additions implemented by a battalion of sometimes well-intented PL/SQL programmers with tight deadlines. Furthermore, stored procedures, more than any other type of data manipulation, are what the actual live production systems rely upon. It is not uncommon for a piece of software to have hundreds of store procedures essential for it to work, and for good reason. Indeed, store procedures are extremely efficient. So much so that even unoptimized pieces of code harboring redundant test and an unreasonable amount of nested outer joins still run in a few milliseconds. Efficient they are. But you know what they are not? Commented. Seriously, the packages I worked with recently contain tens of thousands of lines of code but never contain more than 10 lines of comments, mostly containing something along the lines of “– 10/10/2014 added by Jay” or “– requirement R3045”. And as far as I can remember, relying solely on my flawed memory and anecdotal evidence, this is the case with the vast majority of stored procs. Therefore, after I spending some time curved into a ball crying, I asked myself: “why?”.

Common consensus about commenting code

Childishly, I first assumed that every piece of code should be commented, and the only reason for not commenting code would be laziness/lack of time/lack of understanding/hatred for whomever would read your code in the future. I was obviously misguided as one often is when assuming anything to be simple. Indeed, they are many times when commenting renders your code less readable, or is an excuse for bad coding. This article in particular, Common Excuses Used To Comment Code and What To Do About Them does an excellent job at highlighting when commenting is sub-optimal:

The code is not readable without comments. Or, when someone (possibly myself) revisits the code, the comments will make it clear as to what the code does. The code makes it clear what the code does. In almost all cases, you can choose better variable names and keep all code in a method at the same level of abstraction to make is easy to read without comments.

    We want to keep track of who changed what and when it was changed. Version control does this quite well (along with a ton of other benefits), and it only takes a few minutes to set up. Besides, does this ever work? (And how would you know?)
    I wanted to keep a commented-out section of code there in case I need it again. Again, version control systems will keep the code in a prior revision for you – just go back and find it if you ever need it again. Unless you’re commenting out the code temporarily to verify some behavior (or debug), I don’t buy into this either. If it stays commented out, just remove it.
    The code too complex to understand without comments. I used to think this case was a lot more common than it really is. But truthfully, it is extremely rare. Your code is probably just bad, and hard to understand. Re-write it so that’s no longer the case.
    Markers to easily find sections of code. I’ll admit that sometimes I still do this. But I’m not proud of it. What’s keeping us from making our files, classes, and functions more cohesive (and thus, likely to be smaller)? IDEs normally provide easy navigation to classes and methods, so there’s really no need to scan for comments to identify an area you want to work in. Just keep the logical sections of your code small and cohesive, and you won’t need these clutterful comments.
    Natural language is easier to read than code. But it’s not as precise. Besides, you’re a programmer, you ought not have trouble reading programs. If you do, it’s likely you haven’t made it simple enough, and what you really think is that the code is too complex to understand without comments.

Why this consensus does not apply to stored procedures

As much as these arguments make sense I don’t think they apply to store procedures:

    “you can choose better variable names and keep all code in a method at the same level of abstraction”: You can’t easily change table fields and names, nor can you cut a big nested SQL statement gracefully.
    “Version control does this quite well”: Version control is almost never implemented for stored procedures.
    “I wanted to keep a commented-out section of code there in case I need it again.”: OK, that’s just BS.
    “[complex code] is extremely rare.”: Nested SQL queries are inherently complex and MUCH less readable than traditional code.
    “Markers to easily find sections of code.”: I never saw a problem with that.
    “you ought not have trouble reading programs”: Except queries are the opposite of natural language. Please, please, please SQL developer, let me know why you are doing this particular join

To summarize, I still don’t understand why stored procedures are generally not commented, while it would seem they are the type of code that could benefit the most from comments. Maybe NoSQL will change this, but in the meantime, I will start this crusade, and make sure people explain their code, yo!

3 essential features of the perfect software

3 essential features of the perfect software
Hi Mark. John left the company 3 months ago. Can you help us find a bug in his code? It is somewhere in this file.

I have recently been thinking quite a bit about what make a piece of software successful. Very early in my career, I got to temper my idealistic view of computer science and the world in general, quite well captured by the saying “you don’t have to be the best to be the first”. As I progress in my professional journey, I have been trying to identify key aspects that make a software go head and shoulders above their competitors or see a exponential growth in a niche of the market not exploited at the time. While it is most likely impossible to find the 3 magic words you have to pronounce to make the perfect software appear, I am still going to pretend I did this, for web traffic purposes, because I have no integrity. All bad humor aside, and for the sake of readability, I did try to funnel my thinking into 3 major aspects which, combined, are a successful piece of software. Quick aside: the purpose of this piece is not to dive into what technical aspect of a piece of software is valuable, but rather to present the software features to which the market responds positively. With that out of the way, let me present you the current state of my cogitation: the perfect software is:

screen-shot-2016-10-07-at-8-50-30-am

SIMPLE

Simplicity is essential for the end-user to open their eyes. While the algorithms, architecture and other under-the-hood building blocks can and will most likely be complex, the idea here is to present something that is simple to understand. Simplicity can be driven by multiple factors. It could be for instance the front end of your application. This is why UI/UX is such a sought-after skill, and while it is predominant in the B2C industry it is severely underused in the B2B world. It could also be driven by the product packaging. If building a platform with many potential uses, packaging them into specific solutions recognized by the industry is a fair way to achieve simplicity. It could also be targeted: focusing on solving the problems of one vertical for instance.

NON-INTRUSIVE

This characteristic is epitomized by the success of cloud computing. Software As A Service particularly is the perfect example of non-intrusiveness being a successful business model: end users do not want to have to install and maintain software. It is an obvious cost reduction feat for big enterprises, it is just as much a reality for consumers: no one wants to have to install a software on their computer, and the ones we do install are the ones we hate the most (he who had no complaint about Microsoft Word cast the first stone). Even more interesting, web software like appointment booking, ticket sales and so on are considered websites and not pieces of software, but I digress. That being said, and as I argued as early as last week, SaaS isn’t the only model and non-instrusiveness can be characterized by other traits. Backward compatibly or maintenance of current set of existing skills and application is a great way to ensure a non intrusive model. One of the reason why I think disruption is nonsense, see previous rant.

ACTIONABLE

Being simple and non-intrusive are essential qualities, but your software must actually do something in order to be valuable. The important question here is: what’s in it for your user? What is the value? And I don’t think that a value proposition such as “we are doing it better than the others” is enough, nor is “imagine what you could do with that”. You need to be able to be able to show tangible results right off the bat, drive your customer through a story of what they will be able to do now that they weren’t able to do before. This feature is in my opinion one of the hardest and most often forgotten feature for a software to possess, especially put in relation with the other two. Indeed, innovation while maintaining non-intrusiveness could be seen as an oxymoron. In reality, the perfect unique innovative actionable value that of which no one thought before does not exist. That being said, many tech companies today start by building technical prowesses instead of focusing on creating value. My recommendation is to think about the value first, then focus on making the solution simple and non-intrusive, which is ironically why this article is written in the opposite order.

Conclusion

Can a piece of software truly possess all these qualities fully? Probably not, but at I think it is at least an ideal to strive for. As mentioned awkwardly at the beginning of this article, this is also a very preliminary assessment of my thought process. I do believe that if a piece of software possess a good balance of theses 3 features, it is set for success. More importantly, I think that these features should drive the development of new softwares. I know that I have a few ideas about what to develop, and I’m going to make sure to keep that in mind.

The future of Data is Augmentation, not Disruption

by paul 1 Comment
The future of Data is Augmentation, not Disruption
I'm disrupting the light bulb market by enabling wireless. I call it "Photoshop"

I spent last week enjoying the Cassandra Summit, so much that I did not take the time to write a blog post. I had a few ideas but I chose quality over quantity. That being said, something interesting happened at the summit: we coined the term “augmentation” for one of my companies key go to market use case, instead of data layer modernization or digitalization. even got the opportunity to try both terms to the different people visiting our booth. In this extremely small sample, people really tended to have a much better degree of understanding when I used the word augmentation, which got me thinking. I even read a very interesting article from Tom O’Reilly called: Don’t Replace People. Augment Them. in which he argues against technology fully replacing people. Could this concept of augmentation be applied in a broader scale to understand our data technology trends? Maybe, at least that’s what I’m going to try to lay out in this article.

Technological progress relies on augmentation.

That’s the first thing that struck me when I pondered on augmentation in our world, and more specifically when it comes to software. At the exception of very few, the platforms, apps and tool that we use are all based on augmentation of existing basic functions: Amazon? Augmentation of store using technology. Uber? Augmentation of taxis. Chatbots? Augmentation of chat clients. Slack? Augmentation of email + chats. Distributed/Cloud applications? Augmentation of legacy applications. To some extent even Google is an augmentation of a manual filing system. I would admit listing examples that confirm an idea that I already had is close to a logical fallacy, so I tried to find counter examples, i.e. software solutions that try to introduce completely new concepts, but could not think of any. Of course we could argue over semantics in defining what constitute true innovation versus augmentation of an existing technology, but ultimately I think it is fair to say that the most successful technologies are augmenting our experience rather than being completely disruptive, despite what most of my field would argue. Therefore, augmentation must be at least considered as part of the future of any software industry, such as the Big Data industry.

Augmentation is better than transformation

Human nature needs comfort, that’s why most of us prefer augmentation over disruption. By disruption, I’m talking about transforming or replacing the existing systems, not adding features: selling unpaired socks over internet is not disrupting the sock industry, despite what the TED talks would like me to believe. Seriously, when you have existing technologies, as every company does, a replacement/transformation is a hard pill to swallow. Loss of investment, knowledge, process, etc. It is especially risky and complex when talking about data layer transformation, as I argued before in this very blog. So when given a choice, augmenting existing data layers is an obvious choice for risk-advert IT organizations.

Augmentation drives innovation

Perhaps the most convincing argument towards acknowledging that augmentation is the future of data is the analysis of the most innovative big data software solutions: machine learning, neural networks and all of these extremely complex systems which behaviors are almost impossible to predict, even for experts. These systems are design to augment their own capabilities, instead of having a set of deterministic rules to follow. Indeed, these systems are designed to approach the capabilities of complex biological systems and therefore incorporate their “messiness”. We can’t think of big data systems using physics thinking (i.e. here is an algorithm, here is a set of parameters, this is the result expected), but we should rather rely to biology thinking (i.e. what is the results I get if I input this parameter). A great example of this type of thinking is Netflix’s Chaos Monkey, a service running on AWS to simulate failures and understand the behavior of their architecture. Self-augmentation is the principle upon which the technologies of the future are built. We understand the algorithms we input but not necessarily the outcome, which can have unintended consequences sometimes (see: Microsoft Tay), but ultimately is a better pathway to intelligent technologies. I’m a control freak, and not being able to understand a system end to end drives me nuts, but I’m willing to relinquish my sanity for the good of Artificial Intelligence.

Conclusion

With software Augmentation being part of our everyday life, a safer and easier way to add features to existing data layer, and the core concept of machine learning, I think it is fair to say that it is the future of Data. Did I convince myself? Yes, which is good because my opinion is usually my first go to when it comes to figuring out what I think. Seriously though, what do you think? As always, I long to learn more and listen to everyone’s opinion!

The importance of specialization in software sales

by paul 1 Comment
The importance of specialization in software sales
“Bust of Adam Smith” by Patric Parc, 1845. (Wikipedia)

After spending some time reflecting on whether or not Data Scientist was a useful role within any organization churning a big amount of data, I stumbled upon this post on LinkedIn: There is Only One Type of Software Engineer.

In short, this post calls for a de-specialization of the role of engineers in order to avoid siloed professionals refusing to take responsibility of a task if it does not exactly match their job description.

While I agree with some of this argument, especially in big organizations where unfortunately the lack of ownership of a task and fear of risk taking can be quite flagrant (which I will try to tackle in a future post), I think that small organizations are in serious lack of specialization, the effect of which are particularly visible in the sales process.

Establishing the premise: specialization scarcity versus tangible gains.

Quick disclaimer: as for every post I write, I am not trying to establish and write the ultimate truth but rather I’m engaging into a conversation, so if I’ll be happy to see my premise challenged. It also applies mostly to my domain of expertise: large software sales for big organization.

In this context, here is what I observed and gathered from the market. The era of all-in-one platforms is over. I debated this in my first article speaking about consolidation, but it is still true. And while some giant companies manage and should attack different segment of the market, they still have clear marketing messages and product names for these different segments.

The problem is for companies that have an extremely innovative product, that could tackle a lot of use-cases. I know we have been struggling with this in my company, although it is getting fixed now, but I also observed this statement for many other companies I read or directly got in contact with. Without clear messaging of what narrow use-case your platform solves, sales struggle to happen.

On the flip side, I hear the opposite statement as soon as the product or marketing message get focused. It becomes the most major growth factor and even drives people to your product instead of having to chase opportunities.

Rationale: why and when I think specialization is working.

“It is the great multiplication of the productions of all the different arts, in consequence of the division of labour, which occasions, in a well-governed society, that universal opulence which extends itself to the lowest ranks of the people” -Adam Smith

Look, the concept is not new. The argument for specialization is at the core of our modern society, and many philosophers, economists or other sales guru addressed it before me. The goal of this post, and this blog in general is not to debate whether or not capitalism is the most suitable model for our society but rather to give down-to-earth testimonies based on factual experiences.

With this in mind, here is ultimately why I think specialization enable sales: people hate complication. Despite what my inner nerd would like to think, everyone suffers from decision fatigue. We want to have one solution for one problem. This is why you use what’s app to text your Facebook friends instead of using Messenger in most cases. And I think it is particularly relevant for my generation that is driven by immediate selfish satisfaction (yes, I include myself in this) and want a quick response to a problem they have.

Take away: what you and I should reflect upon.

First and foremost, you need to make sure that your sales and marketing message is clear. You should be able to say what your product does, what it solves, what’s the market and who are your competitors. Then you need to be able to specialize your message even more, and drill down to what the person in front of you is looking for. When you’re playing tic-tac-toe against someone, you’re not thinking about every move that could have happened prior to the current move. You’re focusing on what move will give you the best chance to win now considering the current state of the game. It’s the same thing with sales: you’re not trying to sell your product to a range of hypothetical buyers, you’re trying to sell it to a specific person to solve a specific problem. Personalization is the ultimate specialization, thus the ultimate growth factor.

Now comes the hard question: what should I focus on? What is my product’s area of specialization? This is an extremely complicated question, because while people want reality to be simple, it isn’t. One current tendency established by Eric Ries in the Lean Startup is to use customer feedback and adapt your product to their needs: be data driven. While I adhere to this approach, especially when put against visionary decision making from leaders (which often equates to magical thinking), I think it needs be adjusted to account for lack of specialization. Yes, your product/company can pivot in any direction but it needs to settle. I haven’t found the formula to determine when to settle and what is the best specialization, nor do I think anyone has. But the thrill of uncertainty is what drives me everyday.

5 reasons why software consolidation always fails

by paul 0 Comments
5 reasons why software consolidation always fails
INSTRUCTIONS WERE UNCLEAR

Let’s start with a dare: I dare you to go to any large corporation, find an IT architect and ask them to give you a diagram of their complete architecture. I honestly think that they will politely ignore you, but for the sake of argument, let’s assume they are able to have access to this end-to-end architecture and that this architecture is accurate (and that you can find a screen or a piece of paper that is big enough to fit all of it in one page); by looking at this diagram, you will quickly understand why software consolidation is a very appealing proposition: multiple pieces of software serving the same purpose, duplicated teams, disparate processes… Think of all the money you can save if you buy this giant universal platform that everyone will use and will give you complete control over your IT!

Except that never happens. This giant convergent platform never gets implemented, even if it restricted to a certain functional vertical (e.g. billing, ERP, etc.). So why can’t we consolidate pieces of software into one? Let me give you my two cents.

Note: Hopefully the example I gave speaks for itself, but let me clarify the context of this article: I am specifically addressing software consolidation for very large organizations; of course if your organization employs 10 people and you’re all using google apps then this does not apply to you.

1. Large systems are complicated

This goes without saying but it’s better to say it: the answer to the ultimate question of life, the universe, and everything is fictional. Seriously though, it is so complicated to imagine a solution that would cater to the need of every company and every use case is ludicrous.

2. Enterprise softwares are outdated

While we can all agree that a universal solution is a utopia, this does not mean that you can’t create a solution that gives a large percentage of the solution, is what the smart guys at big enterprise software companies must have thought. To cater to the remaining few percents, customization can be added, (for a fee, charged by the software provider itself). And they have. These large enterprise software implementation have become colossi (at least I think that’s the plural of colossus) that are really hard to move: they are gigantic, expensive, slow-responsive and use backend technologies from the 70s.

As a result, these platforms become engorged and most of the innovation around them is about managing them more efficiently rather than offering a competitive advantage against the rest of the market. Let’s be clear, I’m not saying big enterprise software is dead, they are necessary.

But in an established competitive environment, you distinguish yourself by fighting for the edges, which means fast reactivity, which is incompatible with these outdated massive implementations.

3. Companies need solutions not platforms

How does one find its competitive edge? By implementing efficient targeted solutions. And as far as I could witness, this trend does not seem to be slowing down, quite the contrary (which I believe is a very healthy response). However, the multiplication of targets solution contributes to rendering the consolidation problem even more complicated and necessary.

4. Budget and learning curves are real constraints

Again, this might seem banal but is worth saying. An enterprise is driving a team of people, with their own expertise and responding to the demand of the market. Any change has a cost upfront and downstream, especially when replacing a well-known software as part as a consolidation effort.

5. Consolidation softwares aren’t business driven

In this realm where a single solution does not exist and businesses tend to purchase more and more specific solution, data consolidation platform flourish. Unfortunately, in order to cater to the complexity of the systems we’re dealing with, they are often driven by the underlying technology and not the business requirements.

This sounds a lot like business jargon, so let me explain this with an example: your software relies on its data back-end, and if you have tried to consolidate multiple back-end systems together, whether you use a traditional or distributed data platform, the first thing you end up doing is designing the data schema of the platform, then implement a way for the data to move from multiple backends to this system.

This is not the way your business want to see consolidation. Your business has a clear idea of what is the most important entity from which they can gain insight (for example analyzing user or customer behavior). This means that your consolidation platform schema needs to always be able to adapt to your business and not your business to try and fit into a schema.

So what’s next?

Software consolidation has tremendous application in giving insight to any business owner. But it needs to be a solution, not a generalized overhaul of the IT eco-system. Therefore I think it requires a good data virtualization solution. This solution must have at least the following qualities:

  1. Be business oriented
  2. Be able to publish fresh data on demand
  3. Be flexible enough to interface with any new element of the IT eco-system
  4. Be able to handle any amount of data
  5. Be able to publish results using known methods (using standard connectors/languages)

Of course, I work for a company that provides all these capacities, but that does not make my analysis unfounded. I would not work for a company if I didn’t believe it provided something truly unique and needed by the market. I genuinely believe that this type of solution will be the cement of the future IT eco-systems.