Ep. 89 | Next Generation Data Warehouse with Claudia Imhoff (Part 2)
This week Claudia Imhoff, President of Intelligent Solutions, Co-author of 5 books on business intelligence and analytics including Building the Customer-Centric Enterprise, and founder Boulder BI Brain Trust joins Allison Hartsoe in the Accelerator. Claudia explains a next generation data framework companies can use when thinking about how to create the underlying technology architecture which enables customer analytics and fast decision making.
Please help us spread the word about building your business’ customer equity through effective customer analytics. Rate and review the podcast on Apple Podcast, Stitcher, Google Play, Alexa’s TuneIn, iHeartRadio or Spotify. And do tell us what you think by writing Allison at info@ambitiondata.com or ambitiondata.com. Thanks for listening! Tell a friend!
Read Full Transcript
Allison Hartsoe: 00:01 This is the customer equity accelerator. If you are a marketing executive who wants to deliver bottom line impact by identifying and connecting with revenue generating customers, then this is the show for you. I’m your host Allison Hartsoe, CEO of ambition data. Each week I bring you the leaders behind the customer centric revolution who share their expert advice. Are you ready to accelerate? Then let’s go. Welcome everyone. Today I have the second half of my discussion with technology expert, Claudia Imhoff. Claudia is the president of intelligence solutions, and she’s also the co-author of five books on business intelligence and analytics, including building the customer-centric enterprise. In our last episode, we were talking about next-generation data architecture. Let’s jump back into the discussion
Allison Hartsoe: 01:02 Now, I don’t know if you’ve looked at companies like databricks, but I see this company making a lot of movement into the new data structures and being able to move data through and write some governance to it and also allow you to hit it with for analysis purposes with a variety of different coding languages. Is that something you would also put in this space or is that more okay?
Claudia Imhoff: 01:26 Oh, absolutely. Databricks has been around for now what, five years or so and they certainly done very well. They’ve been very focused on the experimental areas. The whole idea of the investigative computing platform started with Hadoop and data bricks jumped into that spark is in their snowflake, is now in there. The idea is that can I store the data in such a fashion that it hasn’t been pre-formatted if you will, pre-formatted for a known set of questions. That’s where we get the idea of a Schema less, not really the right term, but a Schema on read kind of environment. When I asked the query, the database or the data storage mechanism can then put together the data on read as if it were physically done on write and it’s not, as I asked the question, it’s scrambled to put the data together, and it does it in an incredibly fast fashion, but I don’t have to pre-format the data for that question.
Claudia Imhoff: 02:24 That is bottom line to this whole thing. It’s a Schema on read and boy that has just revolutionized the entire industry. Believe me. The relational guys have not been standing still either. What we have today is massive amounts of memory that we can now use, which is a wonderful environment, very fluid, very changeable. We also have data compression. We have columnar storage, we have all kinds of things that help us with that idea of I really don’t know what I want to ask or I don’t want to be limited to a set number and going back to the architecture, the one thing I should’ve mentioned that probably didn’t is if we look at the traditional EDW, and we look at the investigative computing platform, they have different functions, but they may reside on the same technology. Doesn’t mean I have to have two separate technologies. Most times I do. Most organizations do, but they don’t have to, but it is almost partitioned. If you’re on the same technology, then you’re going to partition it, and you’re going to make one half of it a more formatted, more formal, more production-ready environment whereas the other one is more the schema on read a little less formally governed perhaps as well. So even if it’s in the same technology in some respects it isn’t, if that makes sense.
Allison Hartsoe: 03:39 So on one half you’ve got structured traditional production databases, and on the other half you have the playground. Is that fair?
Claudia Imhoff: 03:46 Yup got it!
Allison Hartsoe: 03:47 And then that allows you to maybe run your data science questions and then as you form a hypothesis and decide this is something that we are going to productionalize, you push it down into the production database and then as you play with it more or expand your hypothesis, then maybe you update it later on.
Claudia Imhoff: 04:06 Exactly! You’re brilliant. Yes, thank you.
Allison Hartsoe: 04:06 It’s something we’ve always been thinking about. But in a lot of times for analysts and in the data science space, we deal in our own little navel-gazing zone. And it’s very complicated when you get out into all of these other areas that are traditional BI and new data warehouse. It’s a huge space. Just getting on top of the terminology alone. And that’s why I love your chart is because it gives you buckets to put things in so you can understand what the heck is this?
Claudia Imhoff: 04:37 Well that’s my job.
Allison Hartsoe: 04:41 Got It. So with this new system, we have an example of fraud and the fraud modeling. Do you have an example of something that’s not so great? Like here’s something that somebody put together and it’s just an absolute train wreck.
Claudia Imhoff: 04:54 Oh Gosh, there have been a number of those. You mentioned one in our preview, and I want to talk about that in a little bit because I think I’d like to put a little bit of a different spin on it and that was the horror story that poor target boy, the last thing you want is to end up on the cover of the New York Times or Wall Street Journal. And boy, they managed to do all of that. They did a very fascinating thing, and here’s the short end of the story. The marketing group went to the data scientists, the fascinating and brilliant man and said, look, we want people to understand that Target is also a place to buy baby stuff for whatever reason we’re not in that baby field. And yet we sell strollers and rockers and bottles and all the gluts that goes along with a kid, and they don’t recognize that.
Claudia Imhoff: 05:37 So we want to catch them before they have their babies. We’d like to advertise market to them before they had, while they’re pregnant so that they understand, oh, I can go to target for that. And is there anyway, Mr. data scientist, is there any way that you could figure out from what they buy input they put in their shopping baskets? Could you figure out whether or not there’s a high likelihood that they might be pregnant? Well, he kind of went back to his desk and worked with it and monkeyed around with it. And by golly, what’s missing in this horror story to me is by Golly, he figured it out. He actually did figure it out with a pretty high likelihood of getting it right. So he said, yeah, look at this. I figured it out. I’ve got an algorithm and here’s the list of women that bought these things, whatever.
Claudia Imhoff: 06:20 Here are the women that are highly likely to be pregnant. Now, where the wheels fell off the wagon was the next step. Like I said, acting on the intelligence. The marketing team said, great, give me that list of potential customers and will send them all kinds of flyers or marketing materials about all the pregnancy stuff and all the baby stuff that we’ve got, and that’s where they made their mistake because unfortunately, they sent at least one of the flyers to a 16-year-old girl who was pregnant, but her parents didn’t know it. She was in her early days and probably scared out of her mind, but that’s what hit the fan, right? Because the dad marched in and he accused Target of, why are you sending this stuff to my daughter? She’s only 16. How dare you? And so forth and so on. And it turned out, well, see Dad, guess what?
Claudia Imhoff: 07:06 So what went wrong was not the analytics side of it. The guy was brilliant. He did it. He figured it out. What went wrong was that just one little thought toward who are we sending this to? The ethics behind it, the ethical stance, what is the policy? Should we send this to a 16-year-old girl or should we not? And the answer, if they had thought about it would have been a resounding no, we probably shouldn’t send it to a 16-year-old girl. So I think that’s probably the best example I’ve got of if you’re going to do highly sophisticated and highly accurate analytics, you need to have the ethics police, if you will, or an ethics policy at least that says, look, before you leap, let’s think about this for just a second. What kinds of ethics should we be thinking about before we actually do act on these beautiful analytics?
Claudia Imhoff: 07:57 So yeah, that’s one example of a little cautionary note. The other one though that is perhaps also annoying, at least it is to me, and I’m in the business, and if I get annoyed, I can only imagine what other people feel. I’ll give you an example by own personal example. If I go to a website and I look at a sweater or a dress or a pair of shoes or whatever it is, and I go, oh, that’s interesting. And I’ve looked at it. Obviously, I’m interested, but yeah, I don’t think I’m going to buy it today, and I blow out of that website. Right? If I then go to Facebook, I get very upset when I see that exact same pair of shoes show up in the right-hand column. As you were interested in this, do you still want to buy it? That bothers me. That is something that I do see as a total invasion of my privacy, and yes, I have turned everything off on Facebook, and I no longer get those. But the first time I did I went, Huh, you sorry things, you are tracking me tracking what I’m looking at, and I don’t like it. So there’s that sort of creep factor.
Allison Hartsoe: 09:00 Yeah.
Claudia Imhoff: 09:00 That gets in there of you’re watching me, and I don’t like it, and this has nothing to do with Facebook. Why is it showing up on my Facebook page? Yeah. There are some negatives that we need to be very careful of, is this right, and how do I get out of it if I’m in it?
Allison Hartsoe: 09:14 Yes. And I’m sure we could do an entire episode on just ethics, and I think we talked about maybe having one of your friends on the show and that could talk about ethics and protected classes perhaps at a greater degree.
Claudia Imhoff: 09:27 Yeah, and to tell you the truth, it’s becoming even more critical as the United States begins to wake up to privacy. Europe obviously has the GDPR, the right to be forgotten, tagline for it. And I like that. I like the idea that I could tell someone like Facebook. I have the right to be forgotten and make it easy. I want one button. I don’t have to go through layer after layer after layer of clicks and buttons and all that kind of thing. I just want to tell you. I want to be forgotten. You can still serve up ads, but they have nothing to do with what I’ve looked at in the past.
Allison Hartsoe: 10:01 So let’s take that idea for a minute though. If I’ve got this massive data architecture and I’ve got external data coming in, I’ve got my own internal data, and I’ve got a customer who says, I want to be forgotten. Is there any technology that helps me purge that holistically and maintain that?
Claudia Imhoff: 10:20 Oh, you’re singing my song. Yes, as it happens. In fact, I have developed a class that I teach at the data warehousing institute. If anyone is interested, the technology is a data catalog, and it is exactly what it sounds like. It’s a catalog of everything to do with the data that you’ve got that you’ve got in this analytic environment. Now I say it’s a data catalog. It’s kind of a misnomer in some ways because if you think about a catalog, let’s look at the biggest catalog in the world and that’s Amazon. Amazon had started out as a book catalog, right? They sell books, and you could find any book on God’s Green Earth that was available in their catalog. Now, it didn’t take them long before they figured out, Huh, we can do more than just books, and now they are the biggest catalog of everything. Anything you can think of is in their catalog.
Claudia Imhoff: 11:14 Well, that’s kind of what a data catalog is. It’s not just the data, which of course is incredibly important. If you know that, let’s start with the data. It’s all the lineage. Where did I get it? What did I do to with it? Where is it stored? Who’s using it? But it’s also, or what reports use it. It’s also so much more than that. It’s the algorithms behind the analytics. It’s the comment. It’s who’s using it, when they use it, how often do they use it? Do they have permission to do what they’re doing? So it’s this massive catalog of information about the environment itself called a data catalog.
Allison Hartsoe: 11:48 Information about information.
Claudia Imhoff: 11:50 So yes, it’s absolutely the metadata about the metadata in some respects. A lot of metadata stored in there. So if you, for example, want to be forgotten, then you would, that person’s information would have to be looked up in the data catalog. Where is it being used, what reports, what analyses, what dashboards, who’s got access to it? And that helps you at least identify what data needs to be either blocked or even perched in some cases. The other thing that they are starting to do with the data catalogs is they are now some, not all of them, but the data catalogs can now search for sensitive data, and that gets right into HIPPA. It gets right into privacy-sensitive data being your social security number, your patient id code, whatever it is, something that is considered personally identifiable information, and if you don’t, if you really want to follow that, I will have the right to be forgotten. Then that personally identifiable information has got to be registered. So yeah, they’re doing some really good things with data catalogs. If people are interested, let me rattle off some names for you. There’s Alation. There’s Colibra, waterline, io, Tahoe. There is an optics. There’s Octopod. There’s a bunch of them, Octopod another one. There are many, just type in data catalog in your favorite search engine and you’ll, you’ll find them.
Allison Hartsoe: 13:09 We’ll also link to your class just so people can find it very easily in case they want to really dig into it.
Claudia Imhoff: 13:15 Oh, thank you.
Allison Hartsoe: 13:16 I imagined that a data catalog, if it’s kind of operate like a fraud model where it hits the model, and it makes a decision and routes things in the right way, does that have to be a very fast piece or is that something that you more or less batch once in a while?
Claudia Imhoff: 13:31 Well, yeah. The difficulty with a static data catalog, one that isn’t live with the sources is that they’ll get out of sync really fast. So that is a characteristic or a capability that you want to look into. Does the data catalog have live contacts or live connection to the actual data because it does change. A source may change. Something may change that report. I think a lot of it though can be static. For example, the data lineage where I got it, what I did to it, where it went, that is also very useful. It may have to be updated every time you change an ETL code or a data prep piece or something like that. But we can do change data capture, if you will, on the metadata itself so that we get that kind of real-time or as close to real-time as we can get ability to change the data as we need it.
Allison Hartsoe: 14:17 Got It. Okay, so as we’re moving into the model, we’re seeing everything evolve. We’re walking into the future. Do you think data science itself is going away and becoming part of the front line?
Claudia Imhoff: 14:29 Ooh, good question. No, I don’t. Not becoming part of the first line. Does it feed the front of the front line? Absolutely. That customer service rep is no longer just somebody that answers the phone. They’re actually an analyst. Now they’re not a data scientist because that does take very specialized skills, but they need to be taught to think critically, to think analytically, to not just be a mouthpiece on a phone, but to actually think about who am I talking to? What am I going to do at this point? Here’s an analytic that I need. I need a dashboard that I can configure based on who I’m talking to. Think about that for a moment and be able to actually analyze the situation and make the right decision. So everybody is an analyst, whether you’re a salesperson or a customer service rep or a marketing person, everybody every day makes decisions and therefore we have at least some kind of analytical capability or analytical thinking.
Allison Hartsoe: 15:24 I love that.
Claudia Imhoff: 15:25 Yeah. One of the things that I highly recommend for every company on earth is to bring in education, and the first course that I would bring in is critical thinking or analytical thinking, whatever you want to call it, but don’t just sit there and blindly do the same thing you’ve done every day. Think in fact, that ought to be the motto used to be IBM’s. I think they’ve got back to it, but I’m going to steal it and say maybe that ought to be everybody.
Allison Hartsoe: 15:51 Got It. Okay, so let’s say I’m convinced that I really want to take a hard look at my data structures. I want to think about how I’m going to create a faster, maybe a data playground in a real-time analytics zone, and I really want to look at this next-generation data warehouse. What should I do first?
Claudia Imhoff: 16:09 Well, yeah, another good question. Okay. The first thing you should do is put the technology to one side. That is not the first thing you look at. Please do not go out and buy the technology before you even know what you want. My advice is it’s kind of a boring first step, but boy, it is critically important, and that is assessed what you already got. Look at what you already have. Look at your current capabilities, your current technology, your current skillsets, and document. If you have a data warehouse, for heaven’s sake, don’t throw it out. It’s still functioning. Make sure that it is a production environment and not trying to be an experimental one because that won’t work, but at least assess your environment. Where are your strengths? Where are your weaknesses? What are the things that you would like to be able to do and what are the things that you simply cannot do
Claudia Imhoff: 16:54 because of the technology or the need or the money or the skills that you have right now? So that’s the first step is just figuring out what you’ve got, what kinds of technologies are at your fingertips. Okay. Second step, and I’ve been saying this for 20 years, understand the business problem. Yeah. Data Science is sexy. It sounds fabulous. I want one of those. I’ll tell you a funny story that happened. I was brought into a company by the CTO, and he said, look, I just want you to talk to our CEO. I’m not going to tell you what it’s about. I want you to go in cold. And I thought, Great. You know, put on the asbestos suit cause here we go.
Claudia Imhoff: 17:31 I sat down, and I said, hi, I’m Claudia Imhoff, I’ve been brought in to help your analytics team, and that’s about as far as I got cause he lit up like a Christmas tree. And he said, oh that’s fabulous. That’s great. This is the CEO, right? This is good news. You’ve got good support from the CEO. And he said, I was on an airplane, and I thought, Oh God, here we go. I was on an airplane, and I was reading fill in the blank, Forbes or money or fortune or something. I was reading one of those magazines, and it was all about my competitor, and they’re killing us. They are absolutely killing us, and they have one of these Hadoop thingys, and I swear that is what he said. They have a Hadoop thing, and I want one.
Claudia Imhoff: 18:12 And I thought, okay, now I know why the CTO wants me to talk to this guy because you don’t just wheel in the Hadoop thing, you plug it in and let her rip. So he and I had a really interesting hour and a half talk about what exactly are we talking about here. And I said, okay, look, you’re talking about a technology. What are the problems that you are facing as the CEO? Well, we’re getting killed with our campaigns every time we launch, when they counter immediately, and they take all the air out of our sales, and none of them at any went on and on and on. They had all these wonderful business problems, some of which were traditional analytics. We don’t know who our best customers are. We don’t know who’s profitable. We don’t know what products are or what stores are. And I thought, okay, good, good. So I said, all right, you have a lot of different analytical needs, and each of those is a wonderful project. But what you have to understand is that it is a bunch of projects and you need to string them all together into a program and analytics program because you can’t Willy Nilly go out there and start firing it all of these things without some coordination or else you will have chaos. And he looks at me. He goes, how long is this gonna take?
Claudia Imhoff: 19:18 That was the hard question for me to answer for him, but he got the idea that it wasn’t just something you wheel in. So yes, first of all, understand what the business, but he told me so many wonderful business problems, and I said, great, we can work on these. We can work on more than one at a time, but we got to organize it.
Allison Hartsoe: 19:34 I’ve heard that from other people who say, if you just talk to people, like they’ll come within a request, show me all of X. And you’re like, well, what is the problem we’re trying to solve? And that’s rampant in the business. And I think it’s not just the CEO.
Claudia Imhoff: 19:47 Yeah, no, it’s not just the CEO. He happened to be a high profile one. Absolutely correct. All right, so then once you have this list of business problems and they’re prioritized, what the top one, two, three, whatever they, what they are. Then turn your eyes to the technology because then you can say, all right, I’ve got fill in the blank. I have a traditional production data warehousey kind of thing. How can I do it better? Do I need new technology? Can I use my existing environment? What is it that I’ve got right now? I’ve done the assessment and where am I lacking? Right? So now you can start to look at the technology. I have a data science problem. Wow. The world is open to you. All right. What kind of data science is it? Are we talking about regression analysis? Are we talking about correlation?
Claudia Imhoff: 20:31 Who knows whatever it is that they’re dealing with. Then go look at the technologies, and you start to evaluate which technologies fit your existing environment nicely but also expanded. And then at that point, you get a shortlist. I would suggest no more than three navy five if you’re really ambitious, but make a shortlist of the technologies that look like they have all of your mandatory requirements in them and then build that proof of concept. That’s the most critical piece of choosing the technology. And then of course once you’ve picked the winner, whoever it is, then you start off on that top prioritize project or maybe two if you can do two, but just start building out the environment. So it’s a very logical process, but it all starts with figuring out what you got right now and figure out what the business problems are.
Allison Hartsoe: 21:17 Yeah, those are words of wisdom that I wish we would all take to heart a little bit more. And it’s so tempting with 7,000 plus more tech tools alone on the market and God knows how many other technologies on top of that. It’s very easy, especially with the marketing hype to feel like, Oh, if I just plug in the salesforce platform, if I just plug in x, all my problems will be solved. And the tools have an advantage in that they want you to think that. But it’s our duty to build that POC and really see if they can do what they say they can do. Well, this is a great process. So Claudia, if people want to reach you or they want to catch you at the data warehouse institute in your classes, how can they get in touch with you?
Claudia Imhoff: 21:59 Oh well, the email is easy. It’s claudia@bbbt that’s boy, boy, boy, tango dot u s, so that’s pretty easy. And then on Twitter, my handle is claudia_imhoff. So that’s easy too. Those are the fastest ways to reach me, I think.
Allison Hartsoe: 22:15 Great. And we’ll link to the class that you mentioned and any others you’d like us to link to at the institute.
Claudia Imhoff: 22:20 One other I would suggest, and it is a class on data and on the data interpreter, and you ask what is that? That is the person that or the role I guess between the data scientist and the CEO, believe me, there is a big gap between two and the data interpreter is the person that can translate these beautiful but very complex and almost unintelligible, there aren’t work. Data visualizations that our data scientists create into terms that the business person, the CEO, in particular, the business person can understand, so that’s another class they might be interested in.
Allison Hartsoe: 22:57 Absolutely. As always, links to everything we discuss are going to be at ambition data.com/podcast. Claudia, thank you for joining us today. This has been such a rich and wonderful discussion.
Claudia Imhoff: 23:08 Oh, it was my pleasure, and you’re easy to talk to and thank you for asking such good questions. You’ve made it easy for me.
Allison Hartsoe: 23:14 Excellent. Remember everyone, when you use your data effectively, you can build customer equity. It is not magic. It’s just a very specific journey that you can follow to get results. Thank you for joining today’s show. This is your host, Allison Hartsoe, and I have two gifts for you. First, I’ve written a guide for the customer-centric CMO, which contains some of the best ideas from this podcast, and you can receive it right now. Simply text data, one word, two three one nine nine six and after you get that white paper, you’ll have the option for the second gift, which is to receive the signal once a month. I put together a list of three to five things I’ve seen that represent customer equity signal, not noise, and believe me, there’s a lot of noise out there. Things I include could be smart tools. I’ve run across articles, I’ve shared cool statistics or people and companies I think are making amazing progress as they build customer equity. I hope you enjoy the CMO guide and the signal. See you next week on the customer equity accelerator.