From the Blogosphere
Excel for #BigData Analysis | @CloudExpo #BI #Analytics #DigitalTransformation
How propelling instant results to the Excel edge democratizes advanced analytics
By: Dana Gardner
Feb. 3, 2017 06:45 PM
HTI Labs in London provides the means and governance with its Schematiq tool to bring critical data services to the interface users want.
The next BriefingsDirect Voice of the Customer digital transformation case study explores how powerful and diverse financial information is newly and uniquely delivered to the ubiquitous Excel spreadsheet edge.
We'll explore how HTI Labs in London provides the means and governance with its Schematiq tool to bring critical data services to the interface users want. By leveraging the best of instant cloud-delivered data with spreadsheets, Schematiq democratizes end-user empowerment while providing powerful new ways to harness and access complex information.
To learn how complex cloud core-to-edge processes and benefits can be better managed and exploited we're joined by Darren Harris, CEO and Co-Founder of HTI Labs, and Jonathan Glass, CTO and Co-Founder of HTI Labs, based in London. The discussion is moderated by BriefingsDirect's Dana Gardner, Principal Analyst at Interarbor Solutions.
Here are some excerpts:
Gardner: Let's put some context around this first. What major trends in the financial sector led you to create HTI Labs, and what are the problems you're seeking to solve?
Harris: Obviously, in finance, spreadsheets are widespread and are being used for a number of varying problems. A real issue started a number of years ago, where spreadsheets got out of control. People were using them everywhere, causing lots of operational risk processes. They wanted to get their hands around it for governance, and there were loads that we needed to eradicate -- Excel-type issues.
That led to the creation of centralized teams that locked down rigid processes and effectively took away a lot of the innovation and discovery process that traders are using to spot opportunities and explore data.
Through this process, we're trying to help with governance to understand the tools to explore, and [deliver] the ability to put the data in the hands of people ... [with] the right balance.
So by taking the best of regulatory scrutiny around what a person needs, and some innovation that we put into Schematiq, we see an opportunity to take Excel to another level -- but not sacrifice the control that’s needed.
Gardner: Jonathan, are there technology trends that allowed you to be able to do this, whereas it may not have been feasible economically or technically before?
Glass: There are lot of really great back-end technologies that are available now, along with the ability to either internally or externally scale compute resources. Essentially, the desktop remains quite similar. Excel has remained quite the same, but the upstream capabilities have really grown.
So there's a challenge. Data that people feel they should have access to is getting bigger, more complex, and less structured. So Excel, which is this great front-end to come to grips with data, is becoming a bit of bottleneck in terms of actually keeping up with the data that's out there that people want.
Gardner: So, we're going to keep Excel. We're not going to throw the baby out with the bathwater, so to speak, but we are going to do something a little bit different and interesting. What is it that we're now putting into Excel and how is that different from what was available in the past?
Harris: Schematiq extends Excel and allows it to access unstructured data. It also reduces the complexity and technical limitations that Excel has as an out-of-the-box product.
We have the notion of a data link that's effectively in a single cell that allows you to reference data that’s held externally on a back-end site. So, where people used to ingest data from another system directly into Excel, and effectively divorce it from the source, we can leave that data where it is.
It's a paradigm of take a question to the data; don’t pull the data to the question. That means we can leverage the power of the big-data platforms and how they process an analytic database on the back-end, but where you can effectively use Excel as the front screen. Ask questions from Excel, but push that query to the back-end. That's very different in terms of the model that most people are used to working with in Excel.
Gardner: This is a two-way street. It's a bit different. And you're also looking at the quality, compliance, and regulatory concerns over that data.
Harris: Absolutely. An end-user is able to break down or decompose any workflow process with data and debug it the same way they can in a spreadsheet. The transparency that we add on top of Excel’s use with Schematiq allows us to monitor what everybody is doing and the function they're using. So, you can give them agility, but still maintain the governance and the control.
In organizations, lots of teams have become disengaged. IT has tried to create some central core platform that’s quite restrictive, and it's not really serving the users. They have gotten disengaged and they've created what Gartner referred to as the Shadow BI Team, with databases under their desk, and stuff like that.
By bringing in Schematiq we add that transparency back, and we allow IT and the users to have an informed discussion -- a very analytic conversation -- around what they're using, how they are using it, where the bottlenecks are. And then, they can work out where the best value is. It's all about agility and control. You just can't give the self-service tools to an organization and not have the transparency for any oversight or governance.
To the edge
Gardner: So we have, in a sense, brought this core to the edge. We've managed it in terms of compliance and security. Now, we can start to think about how creative we can get with what's on that back-end that we deliver. Tell us a little bit about what you go after, what your users want to experiment with, and then how you enable that.
Glass: We try to be as agnostic to that as we can, because it's the creativity of the end-user that really drives value.
We have a variety of different data sources, traditional relational databases, object stores, OLAP cubes, APIs, web queries, and flat files. People want to bring that stuff together. They want some way that they can pull this stuff in from different sources and create something that's unique. This concept of putting together data that hasn't been put together before is where the sparks start to fly and where the value really comes from.
Gardner: And with Schematiq you're enabling that aggregation and cleansing ability to combine, as well as delivering it. Is that right?
Harris: Absolutely. It's that discovery process. It may be very early on in a long chain. This thing may progress to be something more classic, operational, and structured business intelligence (BI), but allowing end-users the ability to cleanse, explore data, and then hand over an artifact that someone in the core team can work with or use as an asset. The iteration curve is so much tighter and the cost of doing that is so much less. Users are able to innovate and put together the scenario of the business case for why this is a good idea.
The only thing I would add to the sources that Jon has just mentioned is with HPE Haven OnDemand, [you gain access to] the unstructured analytics, giving the users the ability to access and leverage all of the HPE IDOL capabilities. That capability is a really powerful and transformational thing for businesses.
They have such a set of unstructured data [services] available in voice and text, and when you allow business users access to that data, the things they come up with, their ideas, are just quite amazing.
Technologists always try to put themselves in the minds of the users, and we've all historically done a bad job of making the data more accessible for them. When you allow them the ability to analyze PDFs without structure, to share that, to analyze sentiment, to include concepts and entities, or even enrich a core proposition, you're really starting to create innovation. You've raised the awareness of all of these analytics that exist in the world today in the back-end, shown end-users what they can do, and then put their brains to work discovering and inventing.
Gardner: Many of these financial organizations are well-established, many of them for hundreds of years perhaps. All are thinking about digital transformation, the journey, and are looking to become more data-driven and to empower more people to take advantage of that. So, it seems to me you're almost an agent of digital transformation, even in a very technical and sophisticated sector like finance.
Making data accessible
Glass: There are a lot of stereotypes in terms of who the business analysts are and who the people are that come up with ideas and intervention. The true power of democratization is making data more accessible, lowering the technical barrier, and allowing people to explore and innovate. Things always come from where you least expect them.
Gardner: I imagine that Microsoft is pleased with this, because there are some people who are a bit down on Excel. They think that it's manual, that it's by rote, and that it's not the way to go. So, you, in a sense, are helping Excel get a new lease on life.
Glass: I don’t think we're the whole story in that space, but I love Excel. I've used it for years and years at work. I've seen the power of what it can do and what it can deliver, and I have a bit of an understanding of why that is. It’s the live nature of it, the fact that people can look at data in a spreadsheet, see where it’s come from, see where it’s going, they can trust it, and they can believe in it.
That’s why what we're trying to do is create these live connections to these upstream data sources. There are manual steps, download, copy/paste, move around the sheet, which is where errors creep in. It’s where the bloat, the slowness, and the unreliability can happen, but by changing that into a live connection to the data source, it becomes instant and it goes back to being trusted, reliable, and actionable.
Harris: There's something in the DNA, as well, of how people interact with data and so we can lay out effectively the algorithm or the process of understanding a calculation or a data flow. That’s why you see a lot of other systems that are more web-based or web-centric and replicate an Excel-type experience.
The user starts to use it and starts to think, "Wow, it’s just like Excel," and it isn’t. They hit a barrier, they hit a wall, and then they hit the "export" button. Then, they put it back [into Excel] and create their own way to work with it. So, there's something in the DNA of Excel and the way people lay things out. I think of [Excel] almost like a programing environment for non-programers. Some people describe it as a functional language very much like Haskell, and the Excel functions they write were effectively then working and navigating through the data.
Gardner: No need to worry that if you build it, will they come; they're already there.
Gardner: Tell us a bit about HTI Labs and how your company came about, and where you are on your evolution.
Harris: HTI labs was founded in 2012. The core backbone of the team actually worked for the same tier 1 investment bank, and we were building risk and trading systems for front-office teams. We were really, I suppose, the cutting edge of all the big data technologies that were being used at the time -- real-time, disputed graphs and cubes, and everything.
As a core team, it was about taking that expertise and bringing it to other industries. Using Monte Carlo farms in risk calculations, the ability to export data at speed and real-time risk. These things were becoming more centric to other organizations, which was an opportunity.
At the moment, we're focusing predominately on energy trading. Our software is being used across a number of other sectors and our largest client has installed Schematiq on 120 desktops, which is great. That’s a great validation of what we're doing. We're also a member of the London Stock Exchange Elite Program, based in London for high-growth companies.
Glass: Darren and I met when we were working for the same company. I started out as a quant doing the modeling, the map behind pricing, but I found that my interest lay more in the engineering. Rather than doing it once, can I do it a million times, can I do these things reliably and scale them?
Because I started in a front-office environment, it was very spreadsheet-dominated, it was very VBA-dominated. There's good and bad in that. A lot of those lessened, and Darren and I met up. We crossed the divide together from the top-down, big IT systems and the bottom-up end-user best-developed spreadsheets, and so on. We found a middle ground together, which we feel is a quite powerful combination.
Gardner: Back to where this leads. We're seeing more-and-more companies using data services like Haven OnDemand and starting to employ machine learning, artificial intelligence (AI), and bots to augment what the humans do so well. Is there an opportunity for that to play here, or maybe it already is? The question basically is, how does AI come to bear on what you can deliver out to the Excel edge?
Harris: I think what you see is that out of the box, you have a base unit of capability. The algorithms are built, but the key to making them so much more improved is the feedback loop between your domain users, your business users, and how they can enrich and train effectively these algorithms.
So, we see a future where the self-service BI tools that they use to interact with data and explore would almost become the same mechanism where people will see the results from the algorithms and give feedback to send back to the underlying algorithm.
Gardner: And Jonathan, where do you see the use of bots, particularly perhaps with an API model like Haven OnDemand?
The role of bots
Glass: The concept for bots is replicating an insight or a process that somebody might already be doing manually. When people create these data flows and analyses that they maybe run once so it’s quite time-consuming to run. The real exciting possibility is that you make these things run 24×7. So, you start receiving notifications, rather than having to pull from the data source. You start receiving notifications from your own mailbox that you have created. You look at those and you decide whether that's a good insight or a bad insight, and you can then start to train it and refine it.
The training and refining is that loop that potentially goes back to IT, gets back through a development loop, and it’s about closing that loop and tightening that loop. That's the thing that really adds value to those opportunities.
Gardner: Perhaps we should unpack Schematiq a bit to understand how one might go back and do that within the context of your tool. Are there several components of the tool, one of which might lend itself to going back and automating?
Glass: Absolutely. You can imagine the spreadsheet has some inputs and some outputs. One of the components within the Schematiq architecture is the ability to take a spreadsheet, to take the logic and the process that’s embedded in our spreadsheet, and turn it into an executable module of code, which you can host on your server, you can schedule, you can run as often as you like, and you can trigger based on events.
It’s a way of emitting code from a spreadsheet. You take some of the insight, you take without a business analysis loop and a development loop, and you take the exact thing that the user, the analyst, has programmed. You make it into something that you can run, commoditize, and scale. That’s quite an important way in which we reduce that development loop. We create that cycle that’s tight and rapid.
Gardner: Darren, would you like to explain the other components that make-up Schematiq?
Harris: There are four components of Schematiq architecture. There's the workbench that extends Excel and allows the ability to have large structured data analytics. We have the asset manager, which is really all about governance. So, you can think of it like source control for Excel, but with a lot more around metadata control, transparency, and analytics on what people are using and how they are using it.
There's a server component that allows you just to off-load and scale analytics horizontally, if they do that, and build repeatable or overnight processes. The last part is the portal. This is really about allowing end-users to instantly share their insights with other people. Picking up from Jon’s point about the compound executable, but it’s defined in Schematiq. That can be off-loaded to a server and exposed as another API to a computer, the mobile, or even a function.
So, it’s very much all about empowering the end-user to connect, create, govern, share instantly and then allow consumption from anybody on any device.
Market for data services
Gardner: I imagine, given the sensitive nature of the financial markets and activities, that you have some boundaries that you can’t cross when it comes to examining what’s going on in between the core and the edge.
Tell me about how you, as an organization, can look at what’s going on with the Schematiq and the democratization, and whether that creates another market for data services when you see what the demand entails.
Harris: It’s definitely the case that people have internal datasets they create and that they look after. People are very precious about them because they are hugely valuable, and one of the things that we strive to help people do is to share those things.
Across the trading floor, you might effectively have a dozen or more different IT infrastructures, if you think of what’s existing on the desk as being a miniature infrastructure that’s been created. So, it's about making easy for people to share these things, to create master datasets that they gain value from, and to see that they gain mutual value from that, rather than feeling closed in, and don’t want to share this with their neighbors.
If we work together and if we have the tools that enable us to collaborate effectively, then we can all get more done and we can all add more value.
Gardner: It's interesting to me that the more we look at the use of data, the more it opens up new markets and innovation capabilities that we hadn’t even considered before. And, as an analyst, I expect to see more of a marketplace of data services. You strike me as an accelerant to that.
Harris: Absolutely. As the analytics are coming online and exposed by API’s, the underlying store that’s used is becoming a bit irrelevant. If you look at what the analytics can do for you, that’s how you consume the insight and you can connect to other sources. You can connect from Twitter, you connect from Facebook, you can connect PDFs, whether it’s NoSQL, structured, columnar, rows, it doesn’t really matter. You don’t see that complexity. The fact that you can just create an API key, access it as consumer, and can start to work with it is really powerful.
There was the recent example in the UK of a report on the Iraq War. It’s 2.2 million words, it took seven years to write, and it’s available online, but there's no way any normal person could consume or analyze that. That’s three times the complete works of Shakespeare.
Using these APIs, you can start to pull out mentions, you can pull out countries, locations and really start to get into the data and provide anybody with Excel at home, in our case, or any other tool, the ability to analyze and get in there and share those insights. We're very used to media where we get just the headline, and that spin comes into play. People turn things on their, head and you really never get to delve into the underlying detail.
What’s really interesting is when democratization and sharing of insights and collaboration comes, we can all be informed. We can all really dig deep, and all these people that work there, the great analysts, could start to collaborate and delve and find things and find new discoveries and share that insight.
Gardner: All right, a little light bulb just went off in my head whereas we would go to a headline and a new story and we might have a hyperlink to a source. I could get a headline and a news story, open up my Excel spreadsheet, get to the actual data source behind the entire story and then probe and plumb and analyze that any which way I wanted to.
Harris: Yes, Exactly. I think the most savvy consumer now, the analyst, is starting to demand that transparency. We've seen in the UK, words, election messages and quotes and even financial stats where people just don’t believe the headlines. They're demanding transparency in that process, and so governance can only be really a good thing.
You may also be interested in:
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
SYS-CON Featured Whitepapers
Most Read This Week