In a world where every company bills itself as being data centric, the vast majority are still falling well short, says Delphix’s VP of data transformation, Sanjeev Sharma
We live in a data driven economy, where it seems every company wants to position itself as a leading data business and leverage its data as a strategic asset to enable innovation and build a reputation as a market leader.
But the data explosion seen in recent years leaves many companies in a difficult position, unable to deliver the necessary datasets or volumes to satisfy the industry’s real innovators.
To truly lead, companies are faced with balancing the need to manage and secure a wide array of data, while also making that data readily available to those who need it. Most companies realise that they need to make their data a strategic asset, but the majority are struggling to see that effort through to execution.
Delphix’s VP of data transformation, Sanjeev Sharma, is all too aware of this issue. The company prides itself on its ability to deliver data quickly and securely to fuel critical digital transformation initiatives for many of the world’s leading companies, and is acutely aware of the pitfalls and errors businesses are making when it comes to data.
“It is clear that applications are becoming increasingly data intensive; IoT, data science, AI and machine learning are all making data agility a major problem,” says Sharma. “More than half AI and data science projects fail because of data issues, so either the data is not available, the data is not cleansed, or the data is not agile. What we are trying to do is help address all of these challenges.”
It was this desire to adapt and improve how companies collect and use their data that initially led Sharma to Delphix. Internationally renowned in the cloud and DevOps community, Sharma spent 15 years with IBM, helping clients with their transformations. He says that clients were increasingly saying their data was a block to adopting DevOps and pushing forward transformation efforts.
“Clients kept asking me what they should be doing with their data; they were finding that their data was the slowest part of the life cycle. I was coming up against the same problems, where it was taking five days to get refreshed data or I was working with synthetic data, which doesn’t always behave in the right way,” Sharma comments.
“It made me want to go on a journey to find companies that were addressing these issues, and that is the reason I discovered Delphix. Some friends of mine pointed me in this direction and I quickly saw that it was working to resolve the problems clients were regularly speaking to me about.”
Sharma cites computer code as an element of technology that enterprise has effectively democratised, making it easy for stakeholders to share, collaborate and test in an agile manner. It is a benchmark that could be used to measure how well businesses are working en masses on their datasets.
“What Delphix is doing is providing the tools as a platform to enable engineers, practitioners – from developers to QA practitioners to E1s – and data scientists to access, manage, govern, share and collaborate around data the very same way they can do around code. Code has a functionality we should be striving for in that it can be shared, coded, pulled back to previous versions and cloned in a collaborative environment,” he says.
“These are capabilities that haven’t existed for data, so that’s really key to what we are trying to provide. We are helping companies to ingest production datasets, continuously keep them updated and make that visualised data available whenever it is needed to whomever wants it.
“They can also get service capabilities via API and command lines to integrate it into their delivery pipeline, manipulate it, share it, collaborate with the data, as well as carrying out all kinds of testing that can be saved or rolled back.
“All these kinds of manipulations are done so using virtual instances, as doing so with physical data is both expensive and technically difficult. At the same time, we can make the data secure and compliant with our masking and policy based controls, making it, reducing the exposure in the non-production environments.”
Companies failing to properly manage and treat their data will find themselves falling behind those with leading data processes and procedures, says Sanjeev, potentially leading to a two-tier system made up by data haves and data have-nots.
And, in an industry that is as competitive as ever, using highly skilled administrative experts to reverse a failing data approach represents an expensive and likely wasteful course of action.
“The biggest pitfall for companies not making their data agile is that they can only move as fast as the slowest layer in the technology stack. Even those who address all the other layers in the stack cannot go faster than their data. If, as a business, you’re doing two-week sprints but it’s taking five days to test the data, that’s a lot of waiting around,” Sanjeev states.
“I was talking to an organisation very recently and it has a particular project that is suffering from a major backlog of requests from engineers, developers, testers, and data scientists. All of those requests are essentially menial labour work; they are asking for database clones or refresh of copies from a few months ago. These are not tasks data administrators should be hired for. They’re an expensive commodity, so why would you use them for low level tasks like pruning databases?
“The answer to those organisations is that if you don’t want to be left behind in this innovation race, they need to provide self-service access to non-production data. My advice would be to allow the ability to provision, manage and refresh non-production data to your engineers, in the same way they’re giving them the ability to do that with continuous integration, continuous delivery.”
However, in reality many companies are struggling to put such processes in place, says Sanjeev. In an era where companies like to define themselves as data companies first and foremost, actually being one remains a mere aspiration for the vast majority.
“I often ask companies how clean their data is, and usually I see smirking and laughing,” he says. “I recently had one tell me that a company they had divested three years prior was still showing up in data reports. Similarly, I was talking to an insurance company yesterday and they discovered a customer whose age according to the system is over a thousand years old.
“That’s the challenge that is being had with cleansing data, it can be really messy. Having a well-defined data governance and data management approach is what organisations need to focus on. And as an industry we are not doing a good job.”
Against the backdrop of GDPR fines and probes into how the likes of Facebook and Amazon are using data, that could spell trouble for a number of businesses. Companies should view this new era of regulation as a jumping off point for vast improvement and lasting change when it comes to data management and processes, says Sanjeev.
“There is certainly a balance to be struck between innovation with the regulatory pressures, and we know a number of startups simply folded because there was no way their model could comply with GDPR,” he observes.
“At the same time, I can walk into any compliance team and see that fixes to issues were made years ago that are still being used but where there has been a sea change in the years since, these policies need to be revisited. We are creating technical debts because nobody is revisiting policies or carrying out governance.
“All that happens is new policies and new governance rules get created, which are layered on top of what is already there. I was talking to a company just last month and every change made to the data model has to be approved by a change control board. That is ridiculous, and stifles innovation.
“Data modellers end up trying to game the system to bundle up changes or hide them from the board, which is clearly too. That’s the kind of compliance which organisations need to focus on aggressively.”