Tuesday, September 28, 2004

User Generated Content: MP3.com data mining

MP3.com was always focused on the collection of data from every point we could collect it. The 800,000 unique visitors generated 4,000,000 page views and 4,000,000 downloads and streams a day. We had an extensive data collection/warehousing/analytics team that could slice it many different ways.

One of the most ironic data points to me was the most searched for artists. Generally speaking, from the beginning until the end, our most searched for artists closely mirrored the top selling Billboard artists. This was a great testament to the power of marketing dollars, MTV, radio and retail distribution. These resources really did create demand. Wow.

Looking at this data at the macro level didn’t really provide any sort of insight that I thought was worth anything. Joe Fleischer (now at Big Champagne) during his tenure at Mp3.com had come up with a product called Single Serving, which was an attempt to create an online product that supported the efforts of the radio departments at the labels to promote singles online into specific geographies.

This was a really cool product in that it forced us to require zip code information which was something we had always taken as an optional field. Consumers, who were enticed with the opportunity to get material before anyone else decided that this was a worthwhile endeavor. Sometime around the Vivendi acquisition the US identifiable database by zip code was somewhere in the range of 7-8 million consumers (approximately and I might add that of course there were the various anomalies you would expect 90210, 11111, 22222, 54321, 12345, etc.)

In an effort to derive greater meaning from this data, Rick Walker, who headed label sales for me after Joe’s departure, asked the analytics team to begin slicing this data into most requested songs by DMA. This data was fascinating. Although the roll up of the data reflected the billboard charts, the specific markets were populated with a number of bands that were not on any charts, but were unsigned and not obviously on the radar of any of the major labels. Although the numbers were fairly small in absolute terms, these bands were present in the specific markets on a relative basis with bands that had major marketing support.

We spent a fair amount of time analyzing this data which included actually contacting the bands to find out why they were in our charts. After interacting with somewhere around 20-30 bands we came up with a series of characteristics of these bands which included:

  1. These bands were generally pre-Soundscan (they didn’t show up on local retail sales figures because they only sold their CDs at shows.)
  2. They were organized online using a combination of IM, blogs, and street team tools to get the word out.
  3. A majority of them were playing all ages venues which didn’t normally pop up on the radar of club goers. (Who wants to hang out with 15 year olds ;-) )
  4. The genres of music were genres that weren’t typically represented by MTV, radio and retail and were clustered around emo/pop punk and grindcore.
  5. These bands generally played around 50-100 shows a year.

There was a lot of more fascinating data, but once we had this data we decided to take a look a level deeper.

Most content businesses are driven by people with a subjective understanding of content whose taste can discern whether or not something can be a hit. My hypothesis was that when you have a large number of people, quantitative data can be used as a proxy for subjective or qualitative measures that typically come from A&R etc.

We decided to spend a week showcasing some of these bands which included Coheed and Cambria, Madison (New Jersey), Locale AM (San Francisco), and All That’s Left (Miami). I should add that some of these bands did have label interest, but I think that what we found was interesting and the vast majority of the bands in the survey didn’t have interest at the time.

For a 24 hour period on equivalent terms we placed the different bands in the lead slot on the homepage of MP3.com. The bands generally got the typical response you would see on the home page of MP3.com with one notable exception. The band All That’s Left had an amazing pick up on a second track not featured on the homepage which elevated the feature song to number one on the charts, and put the second song in the Top 10. To me this was an interesting gauge of quality borne out by numbers.

Based on this finding, we decided to continue down a path. We had determined that the search results by zip code could identify bands that were below the radar, but had a strong following in a local market. We had taken these initial results and tested them on an audience of approximately 800,000 people. These pieces allowed us to find meaning in a sea of data and then small scale test to find quality in quantity.

The band All That’s Left was interesting for a couple of reasons. I had met them on a trip to Miami and at the time they were generally playing Miami area clubs with local forays out into other parts of the southeast. They had no manager, no lawyer, etc. and no real label interest to speak of. We decided to continue the experiment by sending an email to 11 million people highlighting the band. This had a very interesting effect in that it obviously got the band a lot of notice and arguably fast forwarded their career to some extent. They recently played some second stage dates on the Vans Warped Tour and are finishing up their next album sometime in Q4.

There are a lot more details and probably some things I forgot to mention but to me the key takeaways were/are:

  1. If you can create a platform to distribute content that has detailed data, you can identify trends in large populations.
  2. Content that has promise can be then test marketed to groups of people within that network to determine if the interest is specialized or broad.
  3. Based on whether or not the interest is broad or specialized, one can devise a marketing program to fit the right criteria of benefit / return based on the preliminary findings.
  4. Although data is an indicator of a potential audience for content, there are intangibles that cannot be captured in the data, i.e. will a band do well on TV, can they maintain creative output or even manage enough output to warrant additional investment.

I think that this sort of path of analysis and experimentation is the place where the development of new lower cost content can and should occur. I am not saying that the traditional method of content development and marketing goes away, but I do think that a more low to the ground approach has a lot of promise for a new economic model that is more sustainable in the long-term.

71 comments: