analytics | Geographical Perspectives

Eight (No, Nine!) Reasons Big Data Might Surpass Expectations Despite the Hype

I recently read an OpEd piece in the New York Times, Eight (No, Nine!) Problems with Big Data. The article was written by two NYU professors, one a psychologist and one a computer scientist. I was looking forward to some interesting new arguments and points of view. I was in for disappointment.

Here is an abridged version of the 9 problems they see with Big Data.

Big data is good at detecting correlations but doesn’t tell us which correlations are meaningful.
Big data is helpful as an “adjunct to scientific inquiry” but doesn’t replace domain expertise.
Tools produced with big data can be easily “gamed” reducing long-term utility.
Google’s Flu Trends app doesn’t work as well as it once did.
If big data techniques are used for both data collection and analysis there may be pitfalls.
Big data finds “too many correlations” because it explores too much data.
Big data is “prone to giving scientific-sounding solutions to hopelessly imprecise questions” like ranking Francis Scott Key as history’s 19th best poet.
Big data is “best when analyzing things that are extremely common”. Huh?
There is too much hype surrounding Big Data.

Let’s take a look at these one at a time.

Problem #1 is straight from the Statistics 101 textbook: correlation does not imply causation. So big data in the hands of someone who never studied statistics could be a problem. Of course this has always been an important axiom for data analysts. Nothing new or significant about this problem with the advent of big data.

Problem #2 reminds us that we will still need biologists and other scientists with specific domain expertise because big data can’t do the job on its own. Yes, despite coordinated efforts, statisticians and computer scientists have failed to supplant all other scientific disciplines and domain expertise will continue to be valuable. Again, this issue predates use of the term big data.

Problem #3 warns us that if someone builds a tool with a simple algorithm (e.g., grading papers by looking for use of sophisticated words) then people will be able to figure it out. Darn it. You mean that feeble attempts at laziness might not work…even with big data? I’m starting to see a pattern here.

Problem #4 is clearly alarming. Google’s Flu Trends doesn’t seem to work well any longer. It was so cool at first. And, now, the whole internet has changed. Jeez. Why would the internet change? It’s sort of like analyzing people and finding a way to predict behavior and then those people change their behavior. Why do they do that? It’s a horrible dilemma but we seem to be stuck re-analyzing data over and over to keep track of changes. Brutal. Why doesn’t big data fix that?

Problem #5 implies that before big data no one ever created a model with data they collected on their own and then failed to validate said model with a 3rd party independent source. In other words, model validation is still important. Yes, again, even with big data. Damn it!! I still have to pay attention!

Problem #6 is a major issue if you don’t know anything at all about statistics or data analysis – just like problem #1. The issue here though is that there are correlations everywhere because there is simply too much data analysis going on. Were we better off when we only analyzed a few data sets? I don’t really get this one and I’m surprised the computer science professor allowed this to go to print. Doing more data analysis doesn’t prevent bad interpretation but it doesn’t hurt anything either.

Problem #7 is the anti-positivist angle. Talk to any social scientist who hated math or statistics. They refer to any effort to quantify as “positivism” and lump this sort of research into a bucket full of other horrible practices like voter discrimination and other parts of the GOP platform. (Aside: if you didn’t go to grad school imagine a chain smoking, hand-waving intellectual want-to-be who uses big words but on the inside is terrified they’ll be discovered as not terribly insightful.) I’ll bet the computer scientist secretly liked seeing Francis Scott Key as #19 on the poet list. Who says he’s not a poet? But, the psychologist doesn’t want us to forget that humans have a right to make arbitrary distinctions between those who rhyme with spoken word versus those who rhyme with lyrics to music. To me, these efforts can provide uniquely useful insights. Sometimes they must be disregarded as whimsical but not always. And what’s wrong with a whimsical perspective from time to time?

Problem #8 is sort of like … well, we really want to get to 8 or 9 problems. That’s a big number and that way no one will really want to read the entire article because there are so many problems they will simply assume we’re correct and move on to the next article. Big data is good at analyzing “common” things? What does that mean? So big data is good for analyzing baseball and apple pie but it’s bad for analyzing tennis and zucchini bread? There are rules about how much significance can be attributed to inferential findings – again this is Stats 101..okay, maybe Stats 102 – but there’s nothing problematic about looking for a needle in a haystack. The example they give, something to do with translating a book review, has nothing to do with big data and everything to do with the thorny task of language translation. This may come as a shock but “big data” is better with numbers than it is with text.

Problem #9 – too much big data hype? Perhaps. To me, it’s very exciting that advances in computational power allow us to explore possible solutions to problems that were intractable just a few years ago. Maybe Big Data today is like disco in the 1980s. The Bee Gees were hot but popularity faded a few years later. Or maybe it’s more like the internet in the 1990s. There was way too much hoopla. Remember pets.com? What a joke. After 20 years the internet hasn’t really lived up to the hype…well, except now I work from a home office using a Google Chromebook purchased on Amazon. And you’re reading this on my blog.

The Geography of Big Data: How to Get Started

If you’re a business executive you’ve no doubt heard a lot about “Big Data” and the promise of analytics. You may have even read a recent post in the Harvard Business Review about how to Get Started with Big Data. It’s a great article and offers good advice but probably should be retitled: “How to Get Started with Big Data if you Can Afford to Pay McKinsey & Co Consulting Rates“. It sort of skips past the Big Data 101 issues that I see as first steps and moves directly to what I would consider more advanced uses of Big Data.

Instead, let’s assume that your company isn’t a Fortune 500 company and maybe you’ve struggled a bit with technology strategy and operations. Maybe you’re still struggling but you can’t wait another year for IT to complete the decade long SAP/Oracle/Cognos/Any ERP implementation that cost millions and has yet to show any benefit. Perhaps you’re not a math genius, not a finance person and not even really what some might consider tech-savvy. But, you know your business, you know your customers and you know your products. You also know you have to keep up with changes in your industry and you don’t want to be left behind if Big Data is the next big thing. [And, I definitely think it will be, at least one of them.] You may be asking: where should I start?

My advice: make a map.

Huh? Why would I start by making a map? Our company manufactures sophisticated engine components. I need performance metrics, fancy algorithms and cutting-edge insights to drive strategy and profit. How is a simple map going to help me improve the bottom line? Sounds like a silly kindergarten activity with no possible ROI.

Well, give me a chance to explain. Before you can turn some Nate Silver-like econometrics modeling guru or Physics PhD genius loose you need good data and some ideas about what specific problems you want to try to address with analytics. Producing a map can be an excellent process for moving toward a more sophisticated Big Data program. So, how can making a map help start this process?

Here are 6 benefits of making a map:

1. Your company will be forced to take inventory of key data elements.
2. Your IT team will be required to deliver data in a usable format.
3. Any problems with customer data will become readily apparent in the geocoding process.
4. Geographic representations of company data will reveal new patterns that spreadsheets may be disguising.
5. Producing a map will allow everyone to get involved, not just the same old digit heads.
6. Seeing your company’s data on a map will generate new ideas.

In the coming days and weeks I will elaborate on each of these points. Stay tuned!

Miles Driven Forecasting: Not So Fast My Friend

When it comes to forecasting the future I like to think of two quotes from one of the smartest people I’ve ever worked with (the quote may not be precise but hopefully you’ll get the idea).

1. Forecasts are always wrong.
2. Forecasts with longer time horizons are always worse.

David Simchi-Levi, brilliant MIT Professor and my former boss at LogicTools (now part of IBM), told me this in person and I’m pretty sure he’s expressed the same idea in one or more of his many now-famous supply chain related publications.

I thought of these quotes immediately when I read a special report published in November by the Automotive Aftermarket Supply Association titled, “Don’t Discount Miles Driven in Long Term Forecasts”.

In the article author Paul McCarthy argues that miles driven is a critical driver of demand for parts in the automotive aftermarket. And, while he acknowledges that miles driven has been flat or declining for the past several years, he points to the US Energy Information Administration (EIA) forecast for increased miles traveled as reason to look forward to positive future growth in the aftermarket.

Well, I hate to burst anyone’s bubble but I have to point out that no one should hang their hat on this growth projection. If I was running a manufacturing or distribution company supplying parts to the automotive aftermarket I certainly wouldn’t put any stock in this forecast and I definitely wouldn’t make any capital investments based on these numbers. Let’s take a closer look at the EIA projections.

According to the first chart (above) in the AASA report miles driven peaked around 2006-2007 (which makes sense) around 2700 billion miles and has been more or less flat since (also makes sense). But, the “good news” in the second chart is that miles driven will increase sharply adding about 1 trillion miles annually in the coming years. Well, when exactly will those additional miles start hitting the pavement? According to the second chart in the report (below) it looks like it will be real soon, like next year or the year after.

Great news! Let’s get ready for big sales numbers! Better ramp up production and stock more inventory!

Uhhh…in the immortal words of Lee Corso, “not so fast, my friend”.

When I looked closely at the EIA numbers I noticed a few things that might be a problem if you’re banking on total miles driven to be a growth driver for the aftermarket. If you look at the graphic below (click on it for a larger, easier to read version) you’ll see the EIA’s 2012 forecast on top and their 2010 forecast on bottom. I’ve shaded the forecast “Total VMT” for the next 5 years (2013-2017) in both charts for easier comparison. You’ll notice that in the 2012 numbers we aren’t expected to return to the peak VMT levels reached in 2007 (orange highlight) until 2017. You’ll also notice that between 2010 and 2012 the EIA moved their forecast date for reaching 3,000 billion miles from 2017 all the way back to 2023 (red highlight). No big deal – just an extra 6 years! That’s a lifetime in business.

The Federal Highway Administration also publishes a traffic volume report. Here’s a link to the September 2012 report. The chart below is on page 9 of the report and shows that miles driven has decreased since 2007 and is still in a downward trend. If you were a stock-trading chart reader I think you’d say that the upward trend that started as late as 1987 has clearly been broken. Miles driven could certainly go up from here but, as they say, “the trend is your friend” and it appears just as likely to me that they may be headed further south.

I don’t know why anyone would want to predict something like total miles traveled in 2035. What if we aren’t even driving cars in 20 years? Think about all the change we’ve seen just in the past 5 years. Do you think that Research in Motion may have been predicting growth in smart phones but failed to foresee the emergence of the iPhone and iPad? I’ve read that Google is working on self-driving cars. How will that change the way we transport ourselves? Will improvements in navigational efficiency thanks to ubiquitous mobile devices with GPS technology lead to a large reduction in miles driven? Will personal airplanes become economically viable in the next 15 years? Will communication technology continue to advance at such an amazing pace that virtual meetings become a far more reliable means of interaction allowing far more people to work from a home office? Or might it allow more people to shop or visit service providers (e.g., doctors, lawyers, psychologists, teachers, etc) in a virtual environment?

It’s way too difficult to predict that far out into the future. That’s why I prefer to look no further than 1-2 years out for business forecasting. Will miles driven increase in the next year or two? Perhaps but probably not by much. Will miles driven decrease in the next year or two? Perhaps but probably not by much. Will cars still be the primary mode of transport in 2 years? Yes. There you go. Three forecasts you can hang your hat on. Obviously those forecasts aren’t worth much. But, I would consider paying good money for a forecast of miles driven in Q1 and Q2 2013, especially if it were available by region.

So don’t worry about how many miles will be driven in 2015, let alone 2025 or 2035. No one really knows for sure. I can only safely guarantee two things:

The forecasts will be wrong.
The forecasts with the longer time horizons will be worse.

Blogging Elsewhere — Estimating Category Market Demand in the Automotive Aftermarket

Recently I published an article on the Aftermarket Analytics company blog, on how we estimate market demand. Check out the excerpt below, and click here to read the full post!

“Until recently the Automotive Aftermarket was provided data from key channel distributors indicating monthly sales activity and market share for various vehicle part categories. At the beginning of 2012 the consortium of companies that provided these data collapsed. Since then, parts suppliers and others in the Aftermarket have been searching for a new source of data to fill the void and this very issue is being discussed by industry representatives at the AAIA Fall Leadership Days conference in San Francisco this week.

In this post I propose a methodology for estimating market size and discuss how this estimate of total demand can form the basis for replacing, and perhaps improving upon, the market data previously provided by NPD.

The two key elements in producing category market size estimates are (1) vehicle registration data (referred to typically as VIO, i.e., vehicles in operation) and (2) Replacement Rates.

VIO data is available at various levels of geography (US, State, County, ZIP, Censust tract, and block group) and provided by Experian and Polk. This data is expensive but easy to acquire and utilize.”

Speaking Tomorrow at the AASA Marketing Executives Council

I’ll be giving a talk tomorrow to the Automotive Aftermarket Supplier Association’s (AASA) Marketing Executives Council (MEC) June meeting. The talk is entitled, “Inventory Optimization with Experian VIO, Mosaic and Simmons”.

It will be very similar to the talk I presented to the AASA Technology Executives Council in March, describing how demographic data can supplement vehicle registration and replacement rate data to enhance demand forecasting and inventory optimization. You can download the presentation here.

If you’re near the Detroit Metro Airport Sheraton tomorrow stop by and say hello!

GAAS Presentation Available

As I mentioned in a previous blog post I was in Chicago last week attending the Global Automotive Aftermarket Symposium and had the opportunity to deliver a talk on the Inventory Optimization Process in the Aftermarket.

It was my first presentation to a large audience and I think it went pretty well. It was certainly a great opportunity for me and I learned a lot more about the Aftermarket during the 2-day symposium. Hopefully I’ll have more chances to speak at these types of events.

If you’re interested, you can listen to an audio recording of my presentation.

Before you begin listening you might want to download my presentation so you can follow along with the graphics that I refer to during the talk.

I’d be grateful for any feedback on my presentation, the content, my delivery, etc. I know there’s plenty of room for improvement so don’t be shy about sharing constructive criticism.

Thanks!

Speaking this week at the Global Automotive Aftermarket Symposium

I’ll be speaking at the Global Automotive Aftermarket Symposium (GAAS) in Chicago this Thursday. It’s an exciting opportunity to share some of the geographic data analysis approaches we’ve (TerraSeer) been using to help manufacturers and retailers forecast demand and make inventory assortment decisions. Here’s a downloadable brochure with more information about the conference.

Hopefully this speaking engagement will lead to more opportunities to introduce geospatial analytics to the manufacturing sector and the automotive aftermarket in particular. If you happen to be in Chicago-land stop by the Hyatt Regency near O’Hare and check it out!

I will try to share some thoughts during the conference so follow me on Twitter if you’re interested.

Replacement Rate Models in the Automotive Aftermarket

Recently I posted an article to the Aftermarket Analytics blog about replacement rate modeling in the automotive aftermarket. It’s a sizable post, worthwhile for anyone interested in demand modeling in any industry. I’ll post an excerpt here, but head over to the post to get the full article and images.

“An important piece of the automotive aftermarket category management puzzle involves an understanding of your category’s replacement rates. Replacement rates, which are also referred to as repair rates or failure rates, are essentially an estimate of the likelihood that a vehicle will need to a replacement part due to failure or normal wear and tear.

So, how should replacement rates be calculated?

Well, it starts with determining an appropriate numerator and denominator. The denominator should represent an estimate of the total population of vehicles. The numerator should represent an estimate of the total number of vehicles that required a particular part replacement.

As I understand it currently the two most common ways of calculating replacement rates go something like this: (1) replacement rates are simply calculated using a consumer survey where the total number of a particular vehicle in the survey is used as the denominator and the number of repairs/replacements reported is used as the numerator; or (2) some data/technology providers generate replacement rates based on repair shop part “look-ups” – meaning how frequently a part is queried in an online database of parts. So the number of look-ups is used as the denominator and the number of reported repairs/replacements is used as the numerator.

I don’t like either approach.”

Geospatial Visualization in Business: Bringing John Snow into the Boardroom

John Snow was a physician in London in the 19th Century and he is famous for having used maps to identify the source of the Broad Street cholera outbreak in 1854. Dr. Snow’s work is often cited as the founding event in epidemiology. For me, it represents a key event in applied geography, demonstrating the power of maps and their ability to illuminate patterns in data.

643px-Snow-cholera-map-1 — Original map made by John Snow in 1854. Cholera cases are highlighted in black.

Here’s another map of the same data using modern cartography.

Just like the citizens of London who were trying to figure out what was causing the cholera outbreak, business analysts throughout the world are trying to diagnose pain points in their business.

Moneyball and the Automotive Aftermarket on AftermarketAnalytics.com

I recently made a post on the aftermarketanalytics.com blog on Moneyball & the Automotive Aftermarket. Please check it out and, as always, leave a comment and let me know what you think! Just to give you an idea of what the article is all about, here are the first couple paragraphs:

“I recently watched the movie Moneyball starring Brad Pitt and Jonah Hill. It was a good movie and it made me think about parallels with the Automotive Aftermarket.

In Moneyball, a bold GM for the cash-strapped Oakland A’s begins using a flavor of analytics called Sabermetrics to identify undervalued baseball players who can help the team win without breaking the bank. Despite resistance from baseball’s old-school veterans, the approach is successful and the A’s overachieve by posting a winning season despite a payroll that is dwarfed in comparison to the New York Yankees and most of the rest of the league.

The use of analytics now seems to be firmly established as an important component in professional baseball management. Sabermetrics doesn’t replace the need for veterans of the game who can identify talent and the intangibles that make great baseball players and winning baseball teams, but it does add a critical and previously missing element to the business of baseball. I see the Automotive Aftermarket embarking on a similar journey in the coming years, although I’m not expecting Brad Pitt to star as the CEO of NAPA in a blockbuster movie anytime soon..”