Is More Data Always Better?

google think magazine data overload obesity information ideas processing analysisThere has been a discovery in the online marketing and data/statistics world in the last few years. We have had more websites, products and tools created online than we can possibly keep track of. The terms to describe this deluge of activity we have been hearing the most are “data overload” and “information overload” from both companies and consumers. This Google Magazine uses the term Data Obesity to describe this phenomenon.

They ask the question, why is more data always better?

I think the idea of “more data us better” is common from people who lived before the Internet was prevalent. We had to work hard to find data. Researching something meant going to a library and looking in a card catalog (or maybe something called Gopher) and then finding your way around the Dewey decimal system to find that book. And then sometimes they didn’t even have the book because it was checked out or possibly it was just filed wrong because nobody understood the Dewey decimal system.

On a related note recently we got invited to my cousin’s wedding in Santa Fe New Mexico. My dad promptly went to the library and checked out 3 books on Santa Fe and New Mexico. I cringed. He asked how to find out the flights to book something without a travel agent. I realized I have been traveling since 2000 this way and he stopped traveling about that time so he never has. I introduced him to Travelocity, it was mind blowing and a bit of data overload compared with the OAG book he used to use in the 80’s.

The point here is that finding data was really difficult. People had control over its distribution because it was in print. When it became more freely accessible due to Google and other companies efforts we assumed this would be good, because people could remember where to find it and use it whenever we wanted. We never thought it would get this big so fast. Now travel sites are overwhelming, they have too many choices and there are too many of them trying to get you to opt into something you don’t want while being over charged for bringing a suitcase on a flight. This is just one example of how data has gone exponential so quickly.

Others of us have come to a data overload conclusion when they have 200 emails in several in-boxes, 1000+ rss reader posts from feeds waiting, several work projects, 500+ Facebook wall posts in their feed and hundreds of tweets that have gone un-read. This is among a climate where you have to follow-up with projects 5-10 times to get things done, post blogs/tweets/FB status updates daily to keep on people’s radar, empty the DVR so it doesn’t get overloaded and auto delete something you really wanted, listen to the radio on the way to work just in case something big happens and still find time to scoop the litter box before it gets full and the cats poop on the floor.

And the real purpose in all those tweets/FB posts and feeds is that you business changes yearly and if you don’t know about the latest trend and some real insights about it before your boss asks about it, you won’t have a job for all that long. (in digital marketing)

Having data overload be a “good” problem to have from some people’s perspective (as in that it is growth oriented). The democratization of publishing combined with tracking methodology and databases have all contributed to this problem, giving everyone a voice, a potential following of readers, a data trail to analyze and method to say something important online 24/7/365.  And then we have an even bigger problem of processing what is being said, figuring out if it is important or not and sharing/processing/saving it in some way if it is. Acting on that data is way down the line and many of us don’t even get there.

And this isn’t even the big problem with data overload. Where will we store it all? Why do tweets disappear from search so quickly? Because there are millions of them and the failwhale is full. According to the ThinkQuarterly UK, there are 800 Exabytes of data/information created every two days. It took humans from the beginning of civilization until 2003 to create the first 800 Exabytes, and we’re on a roll now.

Where does all this seemingly random data go? How will we know what it says without having to go into a database table and read specific field information? Where are the software tools to manage all this and still give humans the ability to customize the out put in ways that match the behavior or business purposes that we really need? Does any of this stuff ever get deleted?

These are all huge questions we have to answer as more people publish, share, create, track and do business online. We also have to weigh the possibilities of sharing data openly and locking it behind walls as well as how will people comprehensively find what they need when they want to as well as gauge the validity/accuracy of the information presented?

I’m betting on paid services for personal and business data management/archiving & Analysis tools. We will pay for good analysis, good data access & processing and good reliability/backups when we feel the pain of missing good insight, losing good data and just too much happening. Both personally and professionally. But unless you know how to work with SAP, SPSS, SQL, Oracle or a bunch of other systems data management is largely out of your control at this point. They are the librarians of our digital data and they need to find a workable way to Dewey decimal system it back into order and allow us to use it as humans need to.

Advertisements

WebTrends won’t export data, what’s wrong?

I thought I would share another WebTrends Analytics morsel of information since it is so hard to find information about this analytics software that is quick and concise.

Today I was working from home and our self hosted software would not let me download the WebTrends reports that a client requested. It would abort the process and return to the login landing page each time.

We’re using WebTrends self hosted 8.1 software (not the most recent, yet still solid) and we found that WebTrends 8.1 does not play well with Internet Exporer 8.

We found that if I ran my browser in standard IE8 settings it would not export data and the popup of data processing screen was supressed and the system couldn’t complete the request.

There is a setting in the IE8 browser though that is called “compatibility view” that fixes this problem. Just set it to run in compatibility mode for the data site or all sites if you want and your analytics data is available for download again.

On another note, our recently update McAfee security and virus protection software is also blocking the javascript prompted popup window where the data is usually exported and I can’t download any data at work anymore either. We haven’t found a solution to this problem because the security settings get reset to the standard again when the system finds that the administrator has changed them, usually within a 1/2 hour. No word on if the powers that be would make an exception for my computer somehow or if our maintenance systems even allow that.  

It’s going to be a fun month.

WebTrends Email Stats Reports How To Setup

I love that WebTrends is a good solid web analytics reporting solution, but I really find the setup process for just about anything with this system to be very confusing and lengthy. I’m sure there is a reason for this (could be data integrity processes or cost savings) but I really just need a step by step list when I need to get something done quickly and someone to tell me where these oddball parts of the process exist. Therefore I’m writing a list to explain this process so I have it written down and other people can find this info too.

(technical note I’m using webtrends software 8.1, not the webtrends hosted solution)

Today the task at hand is setting up automated reports of webtrends data to be sent monthly by email. The duration of the data collected and the frequency of the reporting schedule are both flexible, it can be daily, weekly, monthly, quarterly or yearly.

The first step is to go to the administration menu from your login. There go to Report Designer and Templates. You can select one of their templates, I needed to create a new one.

Then name your template. Go to next, select the content by adding (and naming) a new chapter, then adding content to that chapter from the add report link on the menu above. Select the “built in report” list from the drop down to get the standard metrics available in webtrends. Check the boxes of the metrics you want included, I would say 4-8 per report is enough before you have too much data for someone to really use. You can make changes to the layout, although I was not looking for that level of detail now.

Click next at the bottom of the page. Then you have some configuration settings, like for wrapping text lines on long urls (ok) and how many rows of data in the reports (20-50 max for readability, top 5 is good).

Click next again and give profiles access to this report. I noticed mine are already given universal access and grey-ed out so nothing much to do with this screen. Then click save.

Next you then go back to the profiles list (admin menu and web analysis and reports & profiles) and edit the profile you want to get this report to add the report to the profile. This is one of those steps I think is redundant and should be automated or brought into the setup process before this because its confusing. You wave over the profile in the list (don’t click it) and get a menu with “actions” and edit is one of them.

From there go to reports in the top menu and on the drop down go to report templates. Click the box by your report to select it, ignore the second checkbox that is labeled default because it will change the default reporting style in the profile to this new report, and that isn’t the intention here.

Then go back to the admin menu a third time and to the scheduler menu (bottom) and then schedule jobs and click the button for a new job. This is the email setup part. Under job type select scheduled report and follow through the pieces of menu from left to right as you fill out each section. First select your profile you want reported on, next give the report a name and assign it to a user (yourself). (note this is also how to disable the emails with the check box below, no idea why this is hidden here). Report type: general report. Output type can me a database, pdf, excel/csv or pdf. I chose pdf because it looks professional and we don’t have to install Microsoft office/word on the server in order to export it. Its the only option that does not require that except the database. Number of data rows to report is up to you, I usually do top 20.

Next add the report destinations, this is where you need the email info. Add your email as the from, add theirs as the to address. Also, cc yourself on these reports so you get them too. Add the SMTP server address (if you don’t have the SMTP address it will hold up all of your other scheduled jobs, so don’t set this up without it.) So, the software knows where to connect to send it from. (contact IT about this if you don’t have it) You can also FTP it if you like your data that way, or save to a folder on the server. (not as exec friendly though) 

Under templates, complete view is ok. Under reports, here you select the reports you want to include. These are a duplicate of the ones you selected above, maybe redundant but this is literally the process we took on the phone with the WebTrends helpdesk people. Report type: standard again, date range: its up to you. Scheduling is next on the menu, you can’t run it on the 1st of the month because data may not have compiled yet in all time zones so the 2nd of the month is the first you can run a monthly report with the most recent previous month’s data. Ditto lag time for dailies, weeklies etc. Run once or run weekly/monthly/daily, as you choose.

The host binding section he literally told me to ignore. So I have no idea what that means. Then you get a summary page at the end and click save.

You just wait now and see if everything gets delivered correctly. It is good that the report is only generated once per month on the date you specify as a job that processes, so it can run data in the past (vs only from the point you created the report, forward like custom reports do because they create their own database table) and it won’t clog up your processing queue with a lot of memory/processing because it’s just once.

I wish there were short concise directions for setting up webtrends email reports like this on the web already but I realize that nothing is easy or self explanatory with database systems or webtrends. It’s just part of the territory until next generation tools come around, and no I don’t mean Google Analytics (which is almost as confusing now to beginners). Someday this has to get simpler in process so more people can use it.

Top 5 Web Analytics Metrics

chicago analytics consultant naperville ILI’ve been working with (Google/WebTrends/Omniture) Analytics data for 4 years now and the requests for Analytics data usually come in 2 styles.

1. The super basic: just tell me my site is still running, whatever everyone else looks at.

2. The super detailed auditors: tell me each of 180 customer segment’s data sliced and diced 10 ways and the month to month change, YOY change and a dozen other things the software doesn’t calculate for you. This could take months to implement and most of the time they have lost interest in it by the time you get it working properly.

I get frustrated with both. The super simple manager needs to look at more than just visits from month to month. The uber detailed guy needs to hire a developer to implement all that and not make changes each month to how they want data tracked and processed because all the time is spent on implementation and none on analysis and most of the time nobody even looks at all the 180 segments of reports.

They also need to realize that all the systems take the data and summarize it or cut off tracking at a max number of log files, web pages or analysis processes to maintain the integrity and size of the database tables. Try and do a full audit of every page view and click and you will crash WebTrends and re-processing it can take months. Google Analytics doesn’t even give you options to do more than what they summarize. Omniture really tries, but its a slow slow process.

Instead I am supporting the idea that web analytics data is really about trends and not audits. These numbers will never match your server logs perfectly nor your clicks from campaigns and that is OK. I also have listed here 5 metrics to look at and why they are important for your online site. One caveat is that I do not work for an e-commerce website so that has not been our focus. The focus is on conversion to application for recruitment purposes for companies. 

1. Visits – yes month over month traffic is important. What is more important is to look at the difference in traffic and drill down into what gained or lost traffic in the way of pages/content on the site and what sources changed in their contribution of the total traffic. This is actionable where as just visits aren’t. Also check back with the costs for each of these budget areas and compare the cost per visit provided by each.

2. Referrers – in a nutshell you should know how much traffic is coming from search, direct and your advertising/marketing plans online and offline. Within those groups you can drill down further but the direct category is always problematic because many analytics packages track page pop-up forms as new visits as well as returning to the site after a conversion process. Also remember that a session is usually 30 min, after that its reset as new.

3. Implementation – no this isn’t a metric but it is a focus you should have on a monthly basis to make sure new sites, pages get tracking added, new campaigns get tracked and that you keep researching new technology developments with your analytics package that may change everything. Having a good web developer along that has access to the servers and can make these changes is key if you’re not a developer (and no developers don’t make good analysts, a best case scenario is a dynamic duo where they are paired up and both work on projects together and learn from eachother) and the helpdesk type services available through Google are non-existent so good luck there interpreting the overly simplified online tutorials that don’t match what your clients want or answer your client’s specific needs/questions. WebTrends and Omniture are slightly better with web support but they expect you to pay a lot for it. A good independent consultant may be the fastest most reliable way to go here.

4. What people search for on your site. This can be tricky to implement but if you get this data it can be very telling. if people can’t find something on your site and search for it, you get a window into what they were thinking. this may tell you that the content you have isn’t what they want or that it isn’t as navigable as you thought. New product ideas also come from this data.

5. Where people exit from your site. This is classic application drop off analysis within any online linear process. But guess what? People don’t always think linear-ly. Expect some of this data to drop off in chunks but a small amount to drop off at all points for unknown reasons.  Its more actionable to focus on the large chunks and look at each page and the click maps for them but sometimes only so much optimization is possible here without doing real life usability testing with 5-10 people.

I’m sure there are more things that people can look into with geographic data and time on site but sometimes I think those are less actionable because you have little control over where your ads run because geo-targeting doesn’t always work well (excluding more than it includes) and time on site can be good or bad at short and long times. The content/pages that are popular on your site are also important but this is one of those custom setups that each division will need tracking by their geo-location and they never admit that so much traffic cross pollinated from each other’s campaigns. You have to read into the specific needs of your client to see if these apply and how to evaluate them without over complicating the reports. I really believe you should look at 5 key metrics or less in a report, more than that is not actionable and is distracting from your purpose/process of improvement. 

There is also a difference between researching a question/metric once, and doing it monthly when it never changes. I don’t believe its a good use of time to report on time on site if its been consistent month over month for the last 2 years. Check in once a year and leave the other data to be reported monthly, save the analyst’s energy for the new questions that need answering and trust your site.

What else do you think is applicable? Any feedback?

The Negatives of Social Networking Media

All the world is a Buzz about Facebook & Twitter these days. It’s almost like MySpace circa 2007, Google circa 2003 or Microsoft circa 1998. I don’t doubt the success, innovation or long-term viability of these social networking sites but I have seen that there are flaws in the system that mean that things won’t be perfect with the business along the way and we’re in for a bumpy road. Basically my point is that for all these sites give us in entertainment, social connections and opportunity they also have some negatives that are almost the equal and opposite pendulum action.

1. Time Suck – all social networking sites are using your time that you used to devote to other things. Maybe in some cases this is actually a better use of your time (instead of TV) but in most cases its time spent that you used to use for researching new information for work projects,  time actually spent talking with people in person (family/friends) or time spent doing things that really need to be done at work or home. Once the brain gets trained that you can go socialize instead of work at those times of day it’s a habit extremely hard to break. For all of us procrastinators looking for instant gratification its a real problem keeping up with work and affects the overall productivity of companies and the country as a whole. Internet access is much more prevalent and has far more users during the business day than it does at night, so there’s the proof. Unless your job is trolling these sites for sales prospects by “connecting” and making “relationships” with your customers, its a waste of time to spend more than 15 min a day.

2. Privacy – Of all the details analyzed about consumer privacy online (on Facebook) in the last few weeks the most suprising thing I’ve seen is that people really don’t care about their information online. Sure, nobody is going to post a ss number or cc number on their profile (duh) but they don’t really seem to realize the power of logging all their social interactions in one database and selling access to retailers and cpg companies who have even larger databases of information to analyze and strategize with. Is it really as fun when most of your friends are companies selling you things all the time? Twitter already has morphed into the largest opt in direct marketing platform I’ve ever seen. If people keep using it at this rate it will surpass email. The other obvious issues come with the work life balance thing and when people friend work makes and think nobody will see them rant about work or post drunk pictures on a sick day, but then again I’ve heard that its just people naturally selecting themselves out of the working pool.

3. Logic – the other issues I’ve seen coming for a while have to do with how everything that is built from large databases online with lots of consumer data seems to not work properly. There is always some algorithm developed by a science tech guy based on some theoretical calculus and it doesn’t provide relevant results. Which brings me to a repeating theme of data right now: we don’t really know what to do with it yet. Nobody knows enough real info about their customers to target them. (who has a budget for that?) And the database people just like to say they improved things a statistically insignificant amount with an algorithm tweak. The marketing strategy/process should always start with offline real life information about people and products and then develop an algorithm to show you information in that way. I don’t know why it’s always done backwards but it will keep our results irrelevant and marketing dollars wasted for a long time to come.

Ways Google Has Changed Media Consumption Behaviors

I was glancing at Google Fast Flip today and it struck me that they have been successful not only in providing what people want but in some ways changing human media consumption behavior.

We all know that Google has turned the media world upside down with the humble text ad because of it’s ad matching relevance and pay-per-click business model.

They have up-ended the rest of the media world because they have influenced people to stop using it. This may be completely un-intentional, but I think it has happened.

The obvious way is that Google has  gained brand preference as a reference tool and a information source on limitless topics. But there is another behavior that they have changed is not usually talked about.

This change in how people consume information is that they can scan headlines now and glean what has happened in the world without actually viewing the ads around the content. (or visiting the content site, via rss, email, search engine, aggregator or google news) This has been bad for online ad inventory (although some may say we need less inventory to drive up prices, not more) and worse for recouping the cost of producing the content.

I don’t think that Google is stealing anything like copyrighted material by linking headlines from Google News, the search engine or screen shots Google Fast Flip. That would be like saying you are stealing copyrighted material by cutting out an article about a local festival coming up and posting it on the break room bulletin board for your coworkers to see.

I do think there does need to be revenue sharing for content sharing on some level though. How this should come about, I haven’t the slightest clue yet. And it can’t happen in the search engine because it seems to vast to fully comprehend let alone orchestrate.

I do think Google wants to be in the media business without actually producing any content, and they don’t usually ask for exclusivity with that content. Google wants to provide more products for consumer use and consumption of information branded offline. If they offer basic content for free on these product/services and upgraded content for a fee they should share the fee with the content providers. The rates may depend on usage and of course demand, and they will probably always be in flux. (no more rate card anything)

Yet I think it’s important that these shared fees (content payments) should be as low as Adsense revenue share since Adsense revenue is largely regarded as welfare for website owners. It needs to be enough to incentivize content providers to really feel like Google is a partner in their business and devoted to a positive business relationship.

The alternative may be that someday you have to pay a large content creator to crawl its site and republish parts of the content. Yes sharing is good, but if the content borrower doesn’t bring in enough revenue (analytics can tell you if your google news readers view, click or buy things) then is it profitable to be hosting the traffic from that source? (yes, hosting costs a ton of money for large content sites) I guess everyone thought they could replace millions of dollars in branding with a simple search engine relevance project and all their traffic generation problems would be solved. It’s never that easy. You have to own the relationship with your customer, you can’t outsource that to Google or anyone else.

Trust is also one of the BIG hurdles Google has to overcome to really being a star in the B2B space. Google has always believed that any process can be automated by a computer and nobody needs to talk to a human because humans are either too expensive or busy engineering things. This seems to enrage some humans, mostly the ones that run large companies. Also, No customer service and No sales people that can actually answer your questions along with ridiculous inflated PPC rates have actually eroded their text ad client base in the last 2-3 years. (and that whole display thing isn’t really looking great for ROI either when you consider people under 30 don’t respond to them at all)

So, in order for Google to really keep that growth going, they need to compensate content creators when re-publishing their content on/in their branded products in the future or the content creators with the greatest authority won’t be there for very long. Yes, some laid-off journalists are blogging but in 20 years how many will be left doing any journalism at all if it doesn’t pay and very few newspapers exist?

I also think all businesses need to stop every few months and think about the future. We’re too busy overloaded with tasks from laid off coworkers to really do this, but in a profitable world we would make time to consider where things are going in 3,6,12 and 24 months out (not a swat analysis, those take too long and are somewhat cumbersome) and really think about what they think the business should be doing to compete and win and innovate.