Ep. 7 | The ROI of Analytics – Interview Part 1
We take for granted that we can capture, track, and measure every piece of data in our ecosystem. But legendary big data guru Bob Page takes us back to a time before enterprise-class web analytics was a thing. Host Allison Hartsoe asks Bob about his experience founding analytics technology firm Accrue Software, owning analytics and reporting at Yahoo, running analytics at eBay, and building the operational model for Hortonworks.
In part one of this interview, Bob talks about working with large data sets and how Yahoo developed Hadoop, which cracked the code for managing data volume, velocity, and variety. He shares how he has used data to support decisionmaking at Yahoo and eBay, including developing an ROI model to calculate the results of investments in the company’s data program.
Read Full Transcript
Allison Hartsoe: This is the Customer Equity Accelerator, a weekly show for marketing executives who need to accelerate customer-centric thinking and digital maturity. I’m your host Allison Hartsoe of Ambition Data. This show features innovative guests who share quick wins on how to improve your bottom line while creating happier, more valuable customers.
Allison Hartsoe: Ready to accelerate? Let’s go!
Allison Hartsoe: Welcome everyone! Today’s show is about the pit of technology despair that must be avoided or overcome to gain access to the higher levels of customer equity acceleration.
Allison Hartsoe: Now remember, if you’re not familiar with the maturity curb, you can find a summary of it in episode number two, and more detail on each stage in the following episodes, four, five, and six.
Allison Hartsoe: So today to help me discuss a bit about what the pit of technology despair is and why you should care, I’ve invited a very special guest and friend, Bob Page.
Bob Page: Hey Allison.
Allison Hartsoe: Hi Bob. Now, Bob, I think you are a bit of a Silicon Valley legend because you actually founded Accrue Software back in 1996, which of one of the very first analytics companies, and you took it public. So that was just fantastic. Maybe you can start by telling us a little bit more about where that experience led you.
Bob Page: Madness. No, actually we used to talk at Accrue about the fact that we were driving this train and we were gonna be getting to the station, and we hoped there would be people waiting for us when we got there.
Bob Page: we were just making stuff up because no one really understood what Enterprise Class Web Analytics could be, but be that as it may, as a vendor, what it showed me was, we were just barely scratching the surface of what we could be doing with data in the marketing context.
Bob Page: And I took some time off after Accrue to think about what was next, and I felt like one of the things that I really needed to understand was how Enterprise [inaudible 00:02:04]
Bob Page: And so I went to work of Yahoo, and initially my charter was to go build their experimentation platform. And one thing led to another, and I ended up taking on all the analysts tools and then all of the BI systems, and so I owned quite a bit of the sort of internal analytics and reporting within Yahoo for close to five years.
Allison Hartsoe: Well that was an impressive scale. Good lord, that’s a lot.
Bob Page: It was a lot of data, and it was an interesting data set in that the business model primarily was making people go away, right? Yahoo was an ad-driven site, and so the job was to create the inventory for the ads. And then he wanted to see how effective the ads were. But all we had was click data on the ads, so we knew that people saw them and left, but we didn’t necessarily know what they did once they were on the other side.
Bob Page: So trying to connect the whole customer experience was difficult. But fast-forward, I got an opportunity to run data and analytics for eBay, and one of the things that were quite attractive there was not only did they have all the click data, et cetera, because customers were coming to their site, or so I thought, and I’ll get back to that in a second. But they also had all the purchase data, because it’s on e-commerce site and even had support data.
Bob Page: So because I could see the whole customer life-cycle, I thought, “This is gonna be great!” And so I spent a fair amount of time at eBay, and then after that decided, “You know, I think what would be useful would be to jump back into the vendor side of things.”
Bob Page: So by then Hadoop became a thing, and maybe not quite Enterprise class yet, but I wanted to see if I could help in that regard. So I went to go run the product organization at Horton Works, it’s a commercial Hadoop vendor. And I spent a couple of years doing that.
Bob Page: So it’s been kind of a straight line, but kind of a curvy line at the same time trying to go from my roots 20 years ago building systems for large-scale data analysis to do the same thing at Horton Works.
Bob Page: And then I should just point out the last few years I’ve been mostly on the consulting side. Trying to look at what organizations are doing.
Allison Hartsoe: You know, I don’t know that everybody would have such a logical flow to their career. That’s really cool. I really like the way you talked about it from the analytic start to the way Yahoo, you got part of the information, to eBay where you can see both sides of the information, and then because you’re dealing with such scale, then you’ve got systems like Horton Works and always thinking about what’s on the cutting edge. That’s very entrepreneurial of you. Very cool.
Allison Hartsoe: So you were privileged to work with some of the largest data sets out there at the time, but not every company is used to processing zettabytes, exabytes, you know. Massive quantities of data. But I think that most people would agree that there’s way too much data out there.
Allison Hartsoe: So why should people with smaller data take a little time to learn from the big guys?
Bob Page: Well, you mention big guys, and that leads me to what’s commonly called big data. And so let me just define that. Most people think about big data, especially in the context of large companies like Yahoo or eBay, as huge volume.
Bob Page: And while that’s true that those companies did have huge volume if you think about big data as if you sort of visualize it as a cube or a 3D space, the volume is just one axis. You also have velocity, or how fast that data is coming in. And also variety. You just have one type of data; your needs are very different than if you have dozens or even hundreds of different kinds of data, different systems that you need to integrate to make sense out of.
Bob Page: And that doesn’t mean just structure data either, it could be image data, video data, it could be text from folks from social media, it could be anything-
Allison Hartsoe: I’m so glad you mention that because I think people, when they hear data, they automatically think text and yet video and images and the whole additional variety is really quite the way things are now.
Bob Page: Right. And my claim is that everyone can plot all three of those. The bigger you are as a company, it’s probably the case that the further out you are on each of this axis. But-
Allison Hartsoe: And, sorry, what do you mean by further out? Do you mean like you have more volume, more velocity, and more variety?
Bob Page: Yes.
Allison Hartsoe: Yeah, okay.
Bob Page: I think so. When you’re one of the big companies you have all those things in abundance and so trying to mersal all that and get it under control and making it useful for the business is something that takes a lot of effort. It takes a lot of money, it takes a lot of time, and it takes a lot of engineers. IT staff, support teams, et cetera, and that’s not something that a lot of medium to small companies can do or can deal with.
Bob Page: The thing is that just like we’ve done with things like when I did Accrue, and we didn’t really know how to answer the questions, so we were just making stuff up. The big boys, the big companies, the big girls, when they’re out on the leading edge, and you’re sort of taking the arrows; you’re just making stuff up. You’re not really sure if what you’re doing is gonna lead to nirvana, but you’re trying a whole lot of different things.
Bob Page: And often times when it works, you’re excited obviously, but you also wanna share that. And in the old days, it was like, “Well, this is a competitive advantage, why would I wanna share this?” But the reality is, if you sort of think about what your crown jewels are, a lot of times it’s A, your product offering, and B, your data. Your understanding of the customer and how to service them.
Bob Page: It’s not the processing infrastructure. Because the processing infrastructure means nothing without the data and without the great products and service that you provide.
Bob Page: So a lot of companies, when they solve these really hairy problems have said, “Why don’t we take some of this, package it up and make it available as open source that others can take advantage of?”
Allison Hartsoe: Yeah, but you know, when you talk about open source, what I first heard was, customer data that is known by a company should be open-sourced so that everyone can share and benefit from it. And obviously, there are ethics and privacy issues there. Is that what you meant?
Bob Page: Absolutely not. No, no, no.
Allison Hartsoe: Okay.
Bob Page: No, no, no, no, no. The data about the customer that they provided you, or that you communicated or that your systems have been about to produce or whatever, those are the things that are important to you, not the systems that house that and process that data.
Allison Hartsoe: Got it.
Bob Page: Does that make sense?
Allison Hartsoe: Thanks for clarifying that. Okay, so given that, what would be some examples?
Bob Page: Well, I guess the one that comes to mind first is at Yahoo. We didn’t have any commercial systems that we could buy that would allow us to handle our data volume. Never mind variety and velocity. Volume was the problem for us.
Bob Page: And so we had to handle all these technologies, including storage and query and everything else. We didn’t even have a Sequel. We would invent Sequel-like processing engines.
Allison Hartsoe: Wow.
Bob Page: But they were fragile, they were temperamental, they were expensive, and they were fairly slow. And so the business often couldn’t see data until the next day just because of a lot of the processing resources that are required.
Bob Page: But in the meantime, the search team was trying to figure out, “How do we index the web?” And they came up with a technology they ended up calling Hadoop.
Allison Hartsoe: Imagine that!
Bob Page: Yeah and Hadoop didn’t solve our needs in the data team initially, but it cracked a huge problem, and that is “How do we do distributed data processing?” Concurrency and parallelization are some of the hardest problems in computer science, and Hadoop figured out how to do that at scale.
Allison Hartsoe: So, you know, sorry. Distributed data processing, why is that important? You’ve got a lot of information, why is it important to distribute it?
Bob Page: What I mean by distributing, I don’t mean distribute it across the world or across departments, I mean to distribute it across processing engines. Because the engines only go so fast. An individual machine, for example, will only go so fast.
Bob Page: Think about it like a freeway. If you have a one-lane freeway, you can continue to make changes to that one lane to make it from a, say a dirt road, we can only go about ten, 20 miles an hour, to this super-slick autobahn type thing where you could be going 80, 90, 100 miles an hour.
Bob Page: But there’s a top end to that. At some point, you get diminishing returns, and you’re just not gonna be able to build a single lane that’s gonna be able to let you go 400, 500 miles an hour.
Allison Hartsoe: It’s like pages law for data processing instead of Moore’s law for chips.
Bob Page: Well, I don’t know, it’s kinda the same. I mean, yes, you could say, “Well, a car’s not the right thing. Instead, you should be building a jet engine, and you should be building airports …” yes, yes, yes. And there’s rockets and everything else, but let’s just stick with the analogy here because everyone’s got cars. And it’s like everyone’s got Sequel for example, or whatever.
Bob Page: So if I can build this super fast, one-lane autobahn, but I have too many cars that can fit, what should I do?
Bob Page: Well let’s build two lanes. Or let’s build 20 lanes, or let’s build 1,000 lanes. Now there’s a cost involved in that, and once you start getting to dozens and two dozen and 50, 100 lanes all running in parallel, now you’ve got some issues that you have to deal with that you didn’t have to deal with in the single lane freeway.
Bob Page: But, if everyone’s going the same way, at the same speed, for the same amount of time, you could fit a whole lot of cars on that freeway. So when I say distributed data processing, what I mean is spreading the load of the huge amount of data across multiple machines. Hundreds of machine, thousands of machines to do sort of simultaneously, as opposed to having one machine trying to take on the entire load and become a bottleneck. That make sense?
Allison Hartsoe: That makes sense, yeah. That does make sense.
Bob Page: So Hadoop that Yahoo developed and the data team ended up sort of re-basing a lot of their technology stack on, really kinda formed the basis for a lot of Enterprise data links today.
Allison Hartsoe: The backbone.
Bob Page: But let me talk about another application of data analytics. We often think about the technology that supports data and analytics as one that supports decision making, and when we talk about making decisions, we generally talk about analysts making recommendations that are presented in some kind of report form or recommendation form to some decision maker. It could be the business or an executive or whatever, then says, “Okay, go flip the switch,” or, “Go make this change,” or whatever.
Bob Page: But what if that was closed-loop and rather than using just reports and analysis, what if you could take relevant insights that had been computed and seed that back into your operational machinery, like a recommendation engine.
Bob Page: So many years ago at eBay, Mara Post and I observed that one of the hardest problems that we had was getting the data to where it needed to be at the right time. So, yeah, we had customer profiles and all that good stuff, but when you completed a purchase, and you checked out, we would feed you some suggestions about what you might like to consider buying next.
Bob Page: The problem was, our data wasn’t fast enough to let the recommendation engine know, “Don’t recommend the thing you just bought.”
Allison Hartsoe: Oh!
Bob Page: Yeah, so that was a little weird. So we invested in some custom and off-the-shelf technologies to stream the data to the right engines as it was being computed. And now today, years later, organizations can choose from a whole bunch of different open source technologies that allow for this streaming analytics. Essentially live insights that feed other systems.
Bob Page: So again, you’ll find dashboards and whatever.
Allison Hartsoe: But that’s not exactly machine learning, it’s machines more or less educating machines like parts of a car working together, or parts of an engine working together.
Bob Page: Yeah, very, very simple communication of insights or profiles or whatever from one component of your machinery to another. We’re not talking about artificial intelligence or machine learning or any of that stuff. This is very simple, I just purchased this widget, and so put that in my profile so that when you recommend to me that I might wanna consider purchasing things, you don’t recommend the same widget that I just bought.
Bob Page: Very, very simple stuff.
Bob Page: So, I’m gonna give you one more if we have time.
Allison Hartsoe: Yeah.
Bob Page: We’ve got [inaudible 00:14:11] systems, and now we have distributed data managing, and now we’ve got streaming and real-time analytics, and now more and more people are looking at the Cloud and do we do things in the Cloud or on-premises or some kind of hybrid. Now, the big guys who are doing all this stuff are like, “Hm, how do we manage all this stuff as kind of one thing?”
Bob Page: Instead of all these new fun tools that are kind of connected with becoming more business critical. Now we’re talking about operational excellence, we’ve got security issues and how they operate across different systems, obviously then we’re talking about governance and management technologies.
Bob Page: And so a lot of these things are, let’s say that the big boys have kind of taken the arrows now and are saying, “How do we solve these problems,” and then contributing what they’re coming back with, back into the open source world. And as other companies are adopting them, they’re kinda shaving off some of the rough edges.
Bob Page: I know that when we were building stuff at eBay for example, we had a very top-quality, great big operations team that could handle some of the weirdness. And a lot of times our developers would be on the front lines so when something broke, “Oh yeah, [inaudible 00:15:25] we can fix that.”
Bob Page: But once it goes out into the world and it’s not coded that you wrote, you don’t have the development staff or the IT staff to, [inaudible 00:15:34] Silicon Valley Internet Company, you don’t wanna take on any more headache. So a lot of what’s been driving development lately in the open source world is, how do we, as I said, sort of shave off some of these rough edges so that you don’t need an enormous IT staff to be able to bring these technologies in-house and manage them.
Allison Hartsoe: You know, that brings up a really good point, because when you say IT, staff, I think overhead, I think people, I think cost, and think one of my fundamental premises, especially in the foundational stages of the maturity curve is that there’s no or very little ROI on the foundation on technology platforms.
Allison Hartsoe: Do you think there’s any ROI impact on doing big data well other than just avoiding falling into the pit of technology despair?
Bob Page: Yeah, absolutely. I mean there’s absolutely an ROI impact.
Allison Hartsoe: How?
Bob Page: Well, I knew you were gonna ask me that. You know, some of it you can directly measure, and others it’s probably not so direct. I’ll give you examples of both.
Bob Page: Well, okay, let’s go back to eBay, it was just, I guess I call it customer happiness, right? If you’re not getting a recommendation for something you just bought, then- well, you don’t know that you’re happy, but you know you’d be a little bit turned off if you were getting recommendations for things that you just bought, right? On the checkout screen.
Allison Hartsoe: Okay, if you didn’t do it well, then you would revert customer happiness.
Bob Page: Yeah, and so that’s kinda table stakes. Now, something that you didn’t do, there would be a negative impact, I guess.
Bob Page: I’m gonna give you a more subtle one. You’re trying to check out, and something isn’t working, like maybe you’re trying to type in your country, and there’s something wrong. It’s just not working. So you call the call center, and you say, “I’ve got this issue, and here’s what the issue is and here’s how I replicate it, whatever.”
Bob Page: And the agent says, “Hm, I can’t see that here. I can’t replicate that here.” But then you find out later after this happens dozens or hundreds of times, and let’s just say this is hypothetical, we won’t say that any company that I’ve ever been associated with ever had these problems. You realize that the customer wasn’t in an A/B test and the agent had no way to know that even if the test was running, so couldn’t put themselves in the same test and so couldn’t address the problem.
Allison Hartsoe: Confounding factors.
Bob Page: Yeah. Because you know what, your testing system and support system weren’t connected. So you don’t have your sort of complete customer knowledge so you cannot make decisions quickly because your technology is not connected.
Bob Page: Now, should the two systems themselves be connected, because then you have peer-to-peer issues, blah, blah, blah. I won’t get into that, but this is the point of if you’re going down this path of building up this landing place for all your customer data, think about using it as a way to connect all your systems without having to connect your systems themselves.
Bob Page: So you’re connecting them through the data. And I’ll give you an example. When I went to eBay, they were not using their quick stream data. They were throwing it away.
Allison Hartsoe: That’s kinda hard to believe.
Bob Page: Well it is, isn’t it? Now, to be fair, they weren’t throwing it all away. They were keeping a few months, and they were keeping like a 2% sample for like a year or something like that. But they didn’t have everything. And it was simply an ROI calculation. They could tell you COV for customers that had purchased something or had sold something, but what about visitors that actually tried to buy, or were doing the searching and then ended up not trying to be able to buy something?
Bob Page: Well, I didn’t end up in a purchase, so they didn’t capture it. But what happened was Hadoop had become something that was worth putting in place and was sufficiently hardened that we were able to get it done. And at the cost of less than 10% of our existing analytics systems.
Bob Page: So we built quick stream data systems, and they had enough capacity that we were able to take transactions that we had and take the transactions and put those into Hadoop as well. And so we were building this large data set that combined transactional data and the behavioral data and paving the way for a lot of other kinds of data that would be added as well.
Bob Page: But I’ll give you a couple of things that came out of this. Almost right away, one is in the past you’d go to eBay, and you’d type in the search term that you- maybe you wanted something to buy. And you would see hundreds and hundreds and hundreds of listings, and you’d be like, “I don’t even know which one of these I should be paying attention to.”
Bob Page: And then you realize that they’re all from the same seller. And you go, “Well, it’s just polluting my view into what’s available.” And so- many people call it listing spam. And so the team responsible for looking at a lot of this stuff on the buyer behavior side determined that there would probably be a pretty significant uplift if we stopped the practice of listing spam because the whole marketplace would benefit. Sellers would benefit because they’d have more exposure to their products. And buyers would benefit because you’re lowering the friction to buying because you’re able to make decisions faster.
Allison Hartsoe: You know, that’s really tricky though because if I remember correctly, eBay gets paid for every listing that goes up. So in a sense, it’s an economic incentive for eBay to allow all that listing spam. But it’s also an economic dis-incentive if sales can’t be completed.
Allison Hartsoe: That’s a tough place to be.
Bob Page: Yes, and it’s easy to model. It’s also when you have high-volume sellers, things get a little trickier, but for the low-volume sellers, you’ve seen that eBay now, commonly runs specials where your first hundred listings a month are free, for example. You don’t have to pay an insertion fee.
Bob Page: And part of that was based on modeling that we did. The numbers were so compelling that we went to the business with it, but they went to the business, and the business said, “Yes, we wanna make this happen.” And they made an announcement to the selling community that this was gonna be new. Basically, you can’t spam.
Bob Page: If you got one item that you- ten of those items, then, say you have ten of instead of listing one ten times. And some people weren’t really happy about it. However, there was an increase in sales through the marketplace. And so everyone benefited.
Bob Page: So that was done based on a model of what the transactions would look like, what the marketplace would look like based on behavior that we were seeing.
Allison Hartsoe: Nice.
Bob Page: And it worked. Yeah.
Bob Page: Another one of the research teams kinda scratched their head and said, “We’re seeing,” well, I guess the question ended up being, if you type in eBay into your favorite search engine, you’ll see a paid listing for eBay at the top, and an organic listing right under it.
Bob Page: And so the question was, why pay for that when we get it for free, basically. It’s still the first listing, because who else is gonna list Ebay, you know?
Bob Page: And so we modeled it again and said what the impact would be, and the business, it was a much tougher sell for the business.
Allison Hartsoe: Really?
Bob Page: Multiple rounds. Yeah. I mean, go back and check again because if there’s gonna be any kind of drop in revenue, then someone’s head’s gonna roll. I mean, these things matter, right? Especially at scale. And so after a lot of tweaking, a lot of internal discussions, a lot of models, validation, they agreed, “Why don’t we roll out a test for a small amount of time in a small geographic market and see if in fact what the data tells us is actually true?”
Bob Page: And it turns out that it was and so they were able to get a very definitive ROI calculation because of how much they were able to not spend on-
Allison Hartsoe: Nice.
Bob Page: Paid search advertising.
Allison Hartsoe: Cost optimization is a clear win.
Bob Page: I mean it’s all fun and games until somebody pokes them in the eye. It’s not always cut and dry, but Yahoo, the data team, spent months with the finance team at the direction of the CFO to try to come up with a model for ROI of the Yahoo data program.
Allison Hartsoe: You mean, like what was the data program worth to the company?
Bob Page: Yeah. The CFO is probably like, “Every time I turn around; you’re adding more people to your organization. Am I getting value from this?” Yeah, and more machines, more whatever. We could be acquiring companies for that; we could be funding other parts of the business, what’s the ROI of this effort?
Bob Page: And so, we spent a long time really trying to wrestle that to the ground. And we found out a couple of things, one is, it’s a pretty quickly moving target because new technologies and capabilities kept becoming available to the business. And it’s like, “Well, how do you factor that in with the existing ROI models that you’re trying to build?”
Bob Page: And the second was, a bigger one, which is this huge problem of attribution.
Allison Hartsoe: Oh yeah.
Bob Page: And attributions a problem everywhere because it’s not a technology problem, it’s not an analytics problem, it’s a political problem. And-
Allison Hartsoe: That’s a great quote. Attribution is a political problem.
Bob Page: So when I came into Yahoo, I was responsible for building the experimentation platform. Th front page used it for every change that they made. So suppose they went to the CFO and they said, “We were responsible for 4% lift in conversions,” you know, whatever conversion meant for the front page, “Okay, great. Well, what part of that 4% is attributed to me? How much can I take credit for that, the experimentation platform?”
Bob Page: They would not have been able to roll that stuff out if they hadn’t tested it. What about the underlying data systems that the experimentation platform used? How much do you attribute to that? This is the issue with determining ROI of a platform, right? This is always an issue because the platform itself can make things faster, it can add new capabilities, can lower operational costs, or whatever, but somebody has to utilize those platforms.
Bob Page: In the end, we ended up expanding the effort and saying, “We know that the ROI is positive. Keep on going.”
Allison Hartsoe: This concludes the first part of my interview with Silicon Valley big data technology legend, Bob Page. In the second part, we’ll cover how to apply this wisdom. Join us for part two.
Allison Hartsoe: Thank you for joining today’s show. This is Allison, just a few things before you head out. Every Friday I put together a short, bulleted list of three to five things I’ve seen that represent customer equity signal, not noise. And believe me, there’s a lot of noise out there.
Allison Hartsoe: I actually call this email The Signal. Things I include could be smart tools I’ve run across, articles I’ve shared, cool statistics, or people and companies I think are doing amazing work building customer equity.
Allison Hartsoe: If you’d like to receive this nugget of goodness each week, you can sign up at AmbitionData.com, and you’ll get the very next one.
Allison Hartsoe: I hope you enjoy The Signal. See you next week on the Customer Equity Accelerator.