Content categorizations

The humanity of web analytics

This snippet from the Economist’s review of Brian Christian’s book about artificial intelligence – The Most Human Human: What Talking With Computers Teaches Us About What It Means To Be Alive – reminds me that web analytics is about people, not data:

“People produce timely answers, correctly if possible, whereas computers produce correct answers, quickly if possible. Chatbots are also extraordinarily tenacious: such a machine has nothing better to do and it never gets bored.”

This spring the nine hardy students who took my USC Annenberg web analytics class came up with wonderful insights that could never have come just from reading a report straight from Google Analytics, Omniture or any other chatbot. Part of their grade was based on whether the (equally hardy) participating news and nonprofit organizations were actually going to use their analyses for decision-making. This meant each student had to really understand the organization’s strategies, goals and personalities before he/she dug into the data. Here are some of the things we learned.

Content is indeed king, but only if it’s coded
None of the organizations coded site content with enough detail to make decisions about what to do with their sites. Data coming straight out of Google Analytics or Omniture was coded only by date published and by broad categories such as “News.” This is the equivalent of marking a box of books “MISC” – or putting in “stuff” in any search engine.

For example, let’s say an organization believes it can build and engage audiences by adding “more local politics and government coverage.” To track whether it did indeed produce “more,” and what coverage did result in increased visits and engagement, the org needs to track how many politics stories it has, by topic and local geographic area, and how much traffic each topic and/or area gets.

Each student developed a taxonomy of codes the organization could use to classify its content, and then manually (I told you they were hardy!) coded sample data pulled out from chatbots, er, Google Analytics or Omniture.

Track traffic by day, week and by topic, to determine if the site is getting the traffic it should

Many organizations look at monthly data, and celebrate traffic spikes. Hidden in monthly data, however, are clues to where to build targeted audiences and advertisers. Health/fitness section traffic, for example, should increase the second week in January, perhaps fall off after Valentine’s Day (!), and increase before swimsuit season.

More content = more traffic
Looking at visits by day of week, we saw radical drops in visits on the weekends. This seemed to be due to little unique local content being posted on Saturdays or Sundays. In this age of the 24/7 newsroom and increased Internet access through mobile, can news orgs afford to make resource decisions based on non-audience-based, chicken-and-egg logic (“We don’t get much traffic on the weekends so we can’t justify adding weekend staff.”)?

Sometimes you should have separate sites for each audience segment….

Josh Podell, an MBA student, focused on analyzing the e-commerce donation functions on the nonprofit sites. He observed that it’s hard to understand what works and what doesn’t when donors are coming to the site to find out more about the organization but residents are coming to find out about programs and services. Josh’s suggestion: Have a completely separate site – and Google Analytics account – for donors. An org could have much more focused content for each audience, and metrics such as visits per unique visitor, page views per visit and the percent of people who left the site after looking at just the home page (home page bounce rate) would give much more clear indicators for both sites.

….but sometimes you shouldn’t.

One of the organizations had its main site on one Google Analytics account, and its blog on another. Dan Lee, a graduate Strategic Public Relations student, noticed extremely high home page bounce rates from returning visitors compared to new visitors.

With the question of why burning in his head, Dan looked in detail at the site content and structure, and hypothesized that returning visitors were most likely to go to the home page, see the teaser about the latest blog entry, and immediately “leave” the site to go to the blog. The Google Analytics account for the blog did indeed show that its top referring site was the main site.

Google Analytics was “correct,” but only a human could have produced the right answer for the organization.

So here’s to Brian Christian for explaining how to be “the most convincingly human human,” and how the competition for the Loebner prizes continues to result in what the Economist calls “a resounding victory for humanity.” And here’s to the most adventurous and tenacious students at USC Annenberg – the journalism, public relations and business majors who took an experimental elective class with “web analytics” in the title!

The content’s there but the data often isn’t

Neil Heyside’s Oct. 18 story on PBS MediaShift about how newspapers should analyze their content by source type – staff-produced, wire/syndicated or free from citizen journalists – got me thinking about other ways content should be analyzed to craft audience-driven hyper-local and paid-content strategies.

Most news sites have navigation that mimics traditional media products – News, Sports, Business, Opinion, and Entertainment. However, those types of broad titles don’t work well with digital content because people consume news and information in bits and pieces rather than in nicely packaged 56-page print products and 30-minute TV programs.

Each chunk, each piece of content – story, photo, video, audio, whatever – should be tagged or classified with a geographic area and a niche topic so a news org can determine how much content it has for each of its highest priority audience segments – and how much traffic each type of content is getting.

By geographic area I mean hyper-local. East Cyberton, not Cyberton. Maybe even more hyper – East Cyberton north of Main Street, for example, or wherever there’s a distinct audience segment that has different characteristics and thus different news needs and news consumption patterns.

Similarly, news orgs need hyper-topic codes, especially for hyper-local topics. The Cyberton community orchestra – not Classical Music, Music, or Entertainment. If a news org is looking at web traffic data for “Music” it should know whether that traffic is for rock music or classical, and whether the content was about a local, regional, national or international group.

Oh, and there’s one more aspect to this hyper-coding. Content should be coded across the site. Ruthlessly. For example, to really understand whether it needs to add or cut coverage in East Cyberton, a news org needs to add up those East Cyberton stories in Local News, plus those East Cyberton Main St. retail development stories in Business, and those editorials and op-eds in Opinion about how ineffective the East Cyberton neighborhood council is, and….

Sometimes these hyper-codes are in content management systems but not in web analytics systems like Omniture or Google Analytics. Knowing what you’ve got is great – but knowing how much traffic each hyper-coded chunk of content is equally if not more important.

Whether the hyper-codes and thus the data are there only makes a difference if a news org is willing to take a hard, nontraditional look at itself. The data may suggest it needs to radically change what it covers and the way it allocates its news resources so it can produce “relevant, non-commodity local news that differentiates” it, as Neil Heyside’s PBS MediaShift story points out.

Heyside’s study of four U.K. newspaper chains has some interesting ideas about how a news org can cut costs but still maintain quality by changing the ways it uses staff-produced, wire, and free, citizen journalist content. The news orgs in the study “had already undergone extensive staff reductions. In the conventional sense, all the costs had been wrung out. But newspapers have to change the way they think in order to survive. If you’ve wrung out all the costs you can from the existing content creation model, then it’s time to change the model itself.”

If a news org doesn’t know, in incredibly painful detail, what type of content it has and how much traffic each type is getting, then it’s not arming itself with everything it can to mitigate the risks of making radical changes such as investing what it takes to succeed in hyperlocal news and in setting up pay walls. Both are pretty scary, and it’s going to take a lot of bold experimentation – and data – to get it right.

The essential categories for “news”

I really like the new simplified structure of The Washington Post’s mobile site, as reported by Online Media Daily.

Mobile’s size really forces publishers to really parse news into the most essential categories, the fewer the better. For the Post, those categories are: top stories; politics; business; metro; arts & living; sports; and a going out guide.

More importantly, these categories are simple words. Unlike newspaper section names, they are instantly understood.

Comparing unique visitors in political blog sites

David Kaplan of PaidContent.org compared the number of unique visitors in April in political blog sites such as Huffington Post and The Drudge Report and found that “left-leaning” sites had 6.4 million; “right-leaning,” 4.8 million; and “neutral/non-partisan,” 1.3 million.

This is a fun comparison, but here are a few web-analytics-nerd thoughts for newsrooms who are competing for these audiences.

The left didn’t necessarily “win.” To really gauge the relative strength or engagement of the audiences, you should look at ratios like number of visits per UV, number of page views per visit, and bounce rate.

The left’s 6.4 million UVs is dominated by HuffPo’s 5.6 million. The right’s 4.8 million was more distributed among The Drudge Report, Free Republic, World Net Daily and others. I’d like to know how many UVs the sites shared – and how many went to only left sites, only right sites, and only neutral or nonpartisan sites.
Also, how many went to both left and right, or to all three? How many who categorized themselves as left-leaning went to right sites? Right-leaning to left, and so on? (Note: A lot of this data will send you into analysis paralysis, but there could be some actionable info here.)
In the minds of your audiences, is your site categorized as conservative/right, liberal/left or neutral/nonpartisan? Ideally, you should measure the differences in perception between news stories and editorials.
Are your pages coded and/or is your site set up to track all “political” content, whether it’s on the home page or the officially named “Politics” section?