Ep. 88 | Next Generation Data Warehouse (Part 1) with Claudia Imhoff

This week Claudia Imhoff, President of Intelligent Solutions, Co-author of 5 books on business intelligence and analytics including Building the Customer-Centric Enterprise, and founder Boulder BI Brain Trust joins Allison Hartsoe in the Accelerator. Claudia explains a next generation data framework companies can use when thinking about how to create the underlying technology architecture which enables customer analytics and fast decision making.

Please help us spread the word about building your business’ customer equity through effective customer analytics. Rate and review the podcast on Apple Podcast, Stitcher, Google Play, Alexa’s TuneIn, iHeartRadio or Spotify. And do tell us what you think by writing Allison at info@ambitiondata.com or ambitiondata.com. Thanks for listening! Tell a friend!

Podcast Links:

Extended Data Warehouse – A New Data Architecture for Modern BI with Claudia Imhoff

Read Full Transcript

Allison Hartsoe: 00:01 This is the customer equity accelerator. If you are a marketing executive who wants to deliver bottom line impact by identifying and connecting with revenue generating customers, then this is the show for you. I’m your host, Allison Hartsoe, CEO of ambition data. Each week I bring you the leaders behind the customer centric revolution who share their expert advice. Are you ready to accelerate? Then let’s go. Welcome everyone. Today’s show is about the next-generation data architecture and to help me discuss this topic is technology expert Claudia Imhoff. Claudia is the president of Intelligent Solutions, co-author of five books on business intelligence and analytics, including building the customer-centric enterprise and founder of the Boulder BI Brain Trust or the BBBT. Yes, folks, she is a legend and one of the brightest thinkers out there. Claudia, I’m so happy to have you on the show. Welcome.

Claudia Imhoff: 01:04 Oh, well. With that kind of introduction. Wow. Thanks so much, Allison. I appreciate it.

Allison Hartsoe: 01:10 You have such an incredible background. Of all the things you could focus on how did you come to focus on this particular topic?

Claudia Imhoff: 01:16 Analytics has always been something that interests me. I actually started out as a biochemist. Yeah, I know Dog Lake to the left, but in that field, you have to use a lot of statistics, and I was always fascinated with the math, with the statistics, how can I twist the data, how do I look at it this way and that way. So I was sort of naturally drawn to the business intelligence and analytics area. I actually did leave biochemistry, and I got heavily into building data warehouses and that sort of thing about 20 years ago. It was a really interesting time. There weren’t a lot of people in my field, certainly not a lot of women in my field, so that was kind of interesting in its own right. We could talk about that at another time, but it was always analytics. It was always the data, and how do I look at it?

Claudia Imhoff: 02:02 How do I understand it, how do I build an architecture to use this data? And the architectural side was actually the most fascinating part of it for me at least. I started out just building architectures, trying to figure out how do you get all these moving parts to work in a cohesive fashion, and that led me to Bill Inman, who was my mentor for many years and Bill and I wrote a book on the architecture called the corporate information factory that took almost 20 years ago and believe it or not, I never looked back. I have been in this field now for over 20 years focusing on the designs of the environments, the architectures. Yes, I did write five books all about this area, and it’s just been one heck of a ride for 20 years.

Allison Hartsoe: 02:46 Well, that’s fantastic. Now I’m going to ask the obvious question here. Why should I care about next-generation architecture? There’s all these great new tools out there. Can’t I just bolt on one of those hundreds of new tools into my technology stack? Have my team learned a little python, a little r and hey, I’m good to go.

Claudia Imhoff: 03:06 Yeah, that usually doesn’t work, and it’s because people do exactly what you just said, but it’s not just one person. It’s 10 or 20 or even a hundred in an organization, and therefore you end up with not a cohesive unit. But in fact, you end up with total chaos. You have no architecture, you’ve got many architectures, therefore you have no architecture. And that’s probably the biggest problem that I see today, especially with the ease with which people can spin up an application or a cloud instance or whatever it is. So now they have hundreds of little instances of data and hundreds of little areas where analytics are being performed. It’s a natural evolution. It’s not a good one. You can imagine that the many different groups, when they run even similar analyses, they never get the same results. So it really is something that has caused all kinds of problems in the last five years in particular.

Claudia Imhoff: 04:05 The last five years have been terribly innovative, really fascinating technologies today that we can use for our analytics that we can use for our infrastructure and so forth. And of course, we have multiple deployment methods. We can deploy it on premises. We can deploy it in the cloud. We can add, most people have a hybrid environment, a little here, a little there, which just adds to the chaos because if you want to analyze the customer, you’ve got to analyze the customer in its entirety and that means any data, every data, every piece of data, wherever it is, is critical to the analysis of that customer. If you’ve got their demographic data on premises, but you’ve got all kinds of analytics about their purchase power or their churn or whatever it is in the cloud, you’ve somehow got to mash those two together, bring those two together and that’s where the architecture comes from.

Claudia Imhoff: 04:58 That’s where the need to have a solid, cohesive architecture comes from that need. What I’ve done over the years is evolve the corporate information factory into something that I call the extended data warehouse architecture. About ten years ago we started seeing rumbles about, well not everything can be done in the data warehouse. I’ve got some analytics that just don’t fit. About five years ago, we started seeing things like data science, which didn’t naturally fit into a data warehouse environment. The data warehouse is a very structured environment. It’s really good at production analytics. Things like creating KPIs, key performance indicators, or creating fraud models from known fraudulent transactions. It’s a heavy-duty production kind of environment, but because it’s production, that means that it’s set up to answer known questions. Well, data science is just the opposite of that. They ask questions that they had, they don’t even know what they want until they sit down and start talking about it and start thinking about it.

Claudia Imhoff: 06:02 So it’s a very free-wheeling kind of exploration, experimental type of environment. And that is not the data warehouse. The data warehouse is a good solid production database, and you don’t monkey around with it. So that meant that we have two environments for analytics, we have the production environment, and we have that wonderful exploration experimental. I ask it once that I never ask it again. I’m just curious about what’s going on kind of environment. But then we also started hearing about a third analytic environment. Yeah, three of them. That was the real-time analytics, real-time analysis on realtime data. Well, that’s a whole different architecture to begin with. So the ExDW, the extended data warehouse, I thought long and hard. How do I create an environment and architecture, a logical architecture that explains these three different environments and allows someone to, again, take this logical architecture, use it in their environment, make their technical architecture based off of that one, but it gives them a roadmap.

Claudia Imhoff: 07:02 Here’s what you need, here’s how all the pieces have to fit together. Now go figure out what technology best fits your business and your needs, but build it to this roadmap. And that was what the ExDW, the extended data warehouse was all about. The data warehouse doesn’t go away. It’s that production environment. However, there is this need for data scientists to have their playground if you will. And there’s also the need for the real-time business environment where we have to analyze data before we even store it. And that’s the biggest difference between the three environments. Both the data warehouse and the investigative computing platform is what I call it. But that playground for scientists, both of them store the data, and then you analyze it, but in the operational world for that streaming analytics or that real-time analysis, you analyze the data, and then you may or may not store it, and that’s a whole different ballgame right there. So it’s a whole new world with all kinds of new technologies. So that’s the architecture and how I got into this business and why I care about it so deeply. That make sense?

Allison Hartsoe: 08:06 It does make sense. In fact, I don’t think I’ve ever heard anyone explain the data architectures in such a clear way because we oftentimes say, oh, there’s all this bi legacy stuff over here, and then usually what I see is this kind of blend between the real-time and the investigative side, just kind of sitting on top of each other and that’s strikingly different. The store then analyze versus store and may not store it as you’re analyzing it in real-time. Those are two radically different functions, and I can see how they call for different technologies. Now you have a really great picture that we’re going to link to that talks about that lays out this structure, the next generation, extended data warehouse architecture. Can you walk us through that picture a little bit?

Claudia Imhoff: 08:53 Sure. If we think about the architecture at the bottom is the operational environment and we’re going to go up from there to the very top, which is all the analytics and the applications that we use, but at the bottom you have the operational systems and if we walk the left-hand side, if you will, up the architecture, you’ll see the next box up above is the data integration, and that’s your traditional ETL, the heavy lifting. How do I integrate the data physically and then I pop it into yes, my traditional enterprise data warehouse for production analytics and on top of that are the analytics and the technologies there at the applications and so forth. That’s a pretty standard architecture. We’ve had that for 20 years. What’s different is the right-hand side of the architecture at the bottom, and I’ll talk, I’ll return to the bottom again for the third architecture, but at the bottom is the data and today especially with data science, we’re starting to see all kinds of weird stuff.

Claudia Imhoff: 09:49 It’s not that nice neat operational data anymore. We have to kind of go outside above the four walls of our organization, for example, and look at all kinds of data that are outside that heavily impacts our business. Weather data, Twitter data, anything you can think of and it’s not in this lovely formatted data that our operational systems are. So there’s a different kind of data collection point. I call it the data refinery. It’s basically taking this oddball data, the Internet of things, streaming data, whatever it is, all kinds of data, and making sense out of it. Not all of the data that comes in from sensors, for example, is useful. A lot of it is just, it’s needed for the sensors, but it’s not needed for an analysis.

Allison Hartsoe: 10:33 Now you have the data refinery sitting right next to the data integration platform in the chart. So these are two different pieces but at the same level.

Claudia Imhoff: 10:41 They are, they both are sort of the entry point into the door and analyze environment and believe it or not they are different technologies. Now the same technology could be used in both boxes, but there are those technologies that specialize in extract, transform and load the ETL side and other technologies that specialize in just prepping the data for the data scientist, and that’s what the data refinery is all about. It just sort of manipulates the data, gets it into a format that’s usable, gets rid of the chaff from the wheat and then it pops it into the investigative computing platform area. That’s the area that has been most innovative lately. It started out as Hadoop. It’s now a spark. It’s snowflake. It’s all kinds of any new database that you can think of. These incredibly fast performance databases. They store a massive amount of data, and they perform like a dream.

Claudia Imhoff: 11:35 That’s the kind of the world that they kicked off this investigative exploration area or experimental areas and it is the playground of the data scientist. And then on top of that, again our many different innovative areas, the tools on top of these databases, these storage areas, the analytic tools has certainly not been sitting still either. We now have incredible capabilities for our data scientists as well as our business analysts using the EDW, but the data science world seems to have just exploded. It’s not just our python. We have data robot, we have outlier and yes we have traditional data visualization tools like tableau and click and Spotfire and so forth, but the area is just rife with innovation and creativity. I’m very excited about that environment, so that’s the right-hand side of the architecture. I said there was a third area of analysis and I’m going to go back to the operational environment like I said and talk about a real-time analysis engine, the ability to analyze data as it’s streaming into my organization, whole different architecture, different technologies, companies like the Tria and Streambase and so forth that can, let me just give you an example of what goes on there.

Claudia Imhoff: 12:48 That’s probably the easiest way to explain it. Companies, especially financial companies and insurance companies and that sort of thing. They really want to detect fraud just a little bit. If they can stop it before it even gets into the environment, gets into the systems. Hallelujah. That is a nature boost for them and their customers as well. So let’s talk about fraud a little bit. We can develop a fraud model, a model of fraudulent behavior, and those models are pretty darn good for the time being. We take a whole bunch of known fraudulent transactions, whether their credit card transactions or ATM or insurance claims are named something. There are a whole bunch of them out there. If I can take those fraudulent transactions, then I can create a model of their behaviors. What constitutes the fraud and how would I know if you can imagine it involves a massive amount of data because I have to understand where they came from, who was doing it, what was fraudulent about them and so forth.

Claudia Imhoff: 13:49 But I can take that fraud model now, and I can bring it into my operational world into these dreaming analytic capabilities. And then every transaction that comes into my organization, the second I can recognize it, I throw it against the fraud model, and I determine, or the system, I’m not doing it. The system determines whether or not that transaction has the earmarks of a fraudulent transaction within milliseconds. Now I’m not saying, and the system is not declaring that it is absolutely fraudulent, but it’s saying it’s sure to look in quack like a duck, right? So I’m going to shuttle that off into a different direction. It is not going to go down the normal process that a regular non-fraudulent transaction would go down. That’s an example of analyzing something as it’s streaming in and then actually redirecting it. Once I’ve identified what kind of transaction it is to a different pathway, incredibly good for these organizations to detect that fraud so quickly.

Claudia Imhoff: 14:48 And that’s what we call streaming analytics. I analyze it before I store it, and I may or may not choose to store it, or I may settle it down a different pathway. So those are the three areas, and that’s what the extended data warehouse is all about. The picture shows it probably better than I can describe it, but you get the idea that’s a logical architecture. Now the technologies that you’re going to use for these different areas is really up to the individual organization. However, the one caveat that I replace on this is they have to remember that all of these components have to come together. For example, that fraud model, I got that probably from my EDW. That’s where I do that production model work, and I brought it into my operational world. That means these things all have to talk to each other.

Claudia Imhoff: 15:36 All that information that I’m doing in my data science environment, those types of analytics need to feed into the other areas as well so that I can do something brilliant. If I’m a customer service rep, for example, I can immediately identify a customer that’s about to leave the company, they’re about to churn, and I can then invoke my intervention script and try to save that customer. Oh, I understand you’re having problems with your fill in the blank. Your cell phone dropped a few times. I’m so sorry. Please let me give you, you’ve already been credited for those calls, but I’d like to do something else for you to try to keep you as a customer and so forth or a location-based offer. For example, very popular today. We all know where our customers are if they give us permission, we can tell that they’re in a store, that they’re online, wherever they are, and if I want to be able to offer them something based on where they physically are, the location where they are right now, then I have to bring together information from my investigated computing platform, from my good old enterprise data warehouse and yes, my realtime environment, bring it all together so that I know exactly where they are, and I know what to offer them.

Claudia Imhoff: 16:46 That’s why we need an architecture. All of these moving parts need to move together at some point so that we can actually act on the intelligence. Otherwise, why are we doing it? Does that make sense?

Allison Hartsoe: 16:59 Exactly. Now, I don’t know if you saw this, but there’s a company called segments out there, and they recently put out a bunch of like it’s not, they tried to pin together different pieces of part of the architecture more for analytics and for obviously segmentation. And they were looking at the companies that were using the most types of any given technology. So they created this list of what was popular amongst their clients. So if you were to apply different types of technology to your stack, and of course we will link to that picture in the show notes. Are there certain recipes that you tend to see certain types of companies using or going back to again and again? Is there a good starting place or a best practice yet?

Claudia Imhoff: 17:44 Well, yeah, there actually is to a certain extent. And again it depends on the company and what kinds of data they have. Are they on premises, are they in the cloud and so forth? But if I’ve got these three massive analytic environments, the last thing I want to do is forklift the analytics into the other environment. You know what I mean? I don’t want to replicate them all over the place. So the best way to start out of to bring these things together so I can make that next best offer or I know the location-based offering, so forth. I would recommend taking a good hard look at data virtualization. Data virtualization has matured over the last ten years to the point where it is remarkable. I can bring together all of these little parts, the fraud model, the fraud analysis capabilities, the customer demographics, where they are.

Claudia Imhoff: 18:31 I know they didn’t just buy gas in Kansas because they just bought something in San Francisco. I can bring all of that together in a virtual fashion and display it on someone’s desktop as if it were physically all together. So instead of replicating these things into different environments and that gets messy, and it gets out of sync really fast. How about if I just take the live analytics, the live results of these things and virtualize them together and show them as if they were physically together? So yes, there are technologies that today that can help ease the integration of these different analytics together. Now we have to be a little careful with the virtualization, especially if we start virtualizing some of our operational systems because it is a bit invasive in them. It can sometimes affect performance. So there’s a little warning there that you need to monitor the systems that you’re virtualizing so that you don’t impact their performance. But boy, oh boy, I would start with virtualization.

Allison Hartsoe: 19:31 And that always reminds me of Looker and another ways it reminds me of Hadoop. Am I in the right zone? When I’m thinking about the tools there.

Claudia Imhoff: 19:37 A little bit different. The virtualization tools, a good example would be Denodo, but even the tools like Tableau and Click and so forth spots are, they also have their own internal data virtualization capabilities so that they can pull data from the cloud, from on premises wherever it is and be able to analyze it themselves. So it actually is a class of technology all unto itself the standalone data virtualization tools like Denodo. And there are others I can think of off the top of that right now, but there are many others, so please don’t think they’re the only one. But they’ve matured to the point where they can actually cache the data, so they get tremendous performance, or if it’s a commonly asked query, they’ll cache the data so that you don’t have to keep going back to the sources and that, of course, helps with the performance and so forth. So it’s a whole new world, and that’s something that we need to pay attention to.

Allison Hartsoe: 20:31 This concludes the first part of my interview. In the second part, we’ll cover how to apply this wisdom. Join us for part two. Thank you for joining today’s show. This is your host Allison Hartsoe and I have two gifts for you. First, I’ve written a guide for the customer centric cmos which contained some of the best ideas from this podcast and you can receive it right now. Simply text, ambition data, one word two three one nine nine six and after you get that white paper, you’ll have the option for the second gift, which is to receive the signal once a month. I put together a list of three to five things I’ve seen that represent customer equity signal, not noise, and believe me, there’s a lot of noise out there. Things I include could be smart tools. I’ve run across articles, I’ve shared cool statistics or people in companies I think are making amazing progress as they build customer equity. I hope you enjoy the CMO guide and the signal. See you next week on the customer equity accelerator.

Previous
Previous

Ep. 89 | Next Generation Data Warehouse with Claudia Imhoff (Part 2)

Next
Next

Ep. 87 | Customer Targeting Gone Wrong: The Big Fish News Story