Is More Data Always Better?

google think magazine data overload obesity information ideas processing analysisThere has been a discovery in the online marketing and data/statistics world in the last few years. We have had more websites, products and tools created online than we can possibly keep track of. The terms to describe this deluge of activity we have been hearing the most are “data overload” and “information overload” from both companies and consumers. This Google Magazine uses the term Data Obesity to describe this phenomenon.

They ask the question, why is more data always better?

I think the idea of “more data us better” is common from people who lived before the Internet was prevalent. We had to work hard to find data. Researching something meant going to a library and looking in a card catalog (or maybe something called Gopher) and then finding your way around the Dewey decimal system to find that book. And then sometimes they didn’t even have the book because it was checked out or possibly it was just filed wrong because nobody understood the Dewey decimal system.

On a related note recently we got invited to my cousin’s wedding in Santa Fe New Mexico. My dad promptly went to the library and checked out 3 books on Santa Fe and New Mexico. I cringed. He asked how to find out the flights to book something without a travel agent. I realized I have been traveling since 2000 this way and he stopped traveling about that time so he never has. I introduced him to Travelocity, it was mind blowing and a bit of data overload compared with the OAG book he used to use in the 80’s.

The point here is that finding data was really difficult. People had control over its distribution because it was in print. When it became more freely accessible due to Google and other companies efforts we assumed this would be good, because people could remember where to find it and use it whenever we wanted. We never thought it would get this big so fast. Now travel sites are overwhelming, they have too many choices and there are too many of them trying to get you to opt into something you don’t want while being over charged for bringing a suitcase on a flight. This is just one example of how data has gone exponential so quickly.

Others of us have come to a data overload conclusion when they have 200 emails in several in-boxes, 1000+ rss reader posts from feeds waiting, several work projects, 500+ Facebook wall posts in their feed and hundreds of tweets that have gone un-read. This is among a climate where you have to follow-up with projects 5-10 times to get things done, post blogs/tweets/FB status updates daily to keep on people’s radar, empty the DVR so it doesn’t get overloaded and auto delete something you really wanted, listen to the radio on the way to work just in case something big happens and still find time to scoop the litter box before it gets full and the cats poop on the floor.

And the real purpose in all those tweets/FB posts and feeds is that you business changes yearly and if you don’t know about the latest trend and some real insights about it before your boss asks about it, you won’t have a job for all that long. (in digital marketing)

Having data overload be a “good” problem to have from some people’s perspective (as in that it is growth oriented). The democratization of publishing combined with tracking methodology and databases have all contributed to this problem, giving everyone a voice, a potential following of readers, a data trail to analyze and method to say something important online 24/7/365.  And then we have an even bigger problem of processing what is being said, figuring out if it is important or not and sharing/processing/saving it in some way if it is. Acting on that data is way down the line and many of us don’t even get there.

And this isn’t even the big problem with data overload. Where will we store it all? Why do tweets disappear from search so quickly? Because there are millions of them and the failwhale is full. According to the ThinkQuarterly UK, there are 800 Exabytes of data/information created every two days. It took humans from the beginning of civilization until 2003 to create the first 800 Exabytes, and we’re on a roll now.

Where does all this seemingly random data go? How will we know what it says without having to go into a database table and read specific field information? Where are the software tools to manage all this and still give humans the ability to customize the out put in ways that match the behavior or business purposes that we really need? Does any of this stuff ever get deleted?

These are all huge questions we have to answer as more people publish, share, create, track and do business online. We also have to weigh the possibilities of sharing data openly and locking it behind walls as well as how will people comprehensively find what they need when they want to as well as gauge the validity/accuracy of the information presented?

I’m betting on paid services for personal and business data management/archiving & Analysis tools. We will pay for good analysis, good data access & processing and good reliability/backups when we feel the pain of missing good insight, losing good data and just too much happening. Both personally and professionally. But unless you know how to work with SAP, SPSS, SQL, Oracle or a bunch of other systems data management is largely out of your control at this point. They are the librarians of our digital data and they need to find a workable way to Dewey decimal system it back into order and allow us to use it as humans need to.


WebTrends Email Stats Reports How To Setup

I love that WebTrends is a good solid web analytics reporting solution, but I really find the setup process for just about anything with this system to be very confusing and lengthy. I’m sure there is a reason for this (could be data integrity processes or cost savings) but I really just need a step by step list when I need to get something done quickly and someone to tell me where these oddball parts of the process exist. Therefore I’m writing a list to explain this process so I have it written down and other people can find this info too.

(technical note I’m using webtrends software 8.1, not the webtrends hosted solution)

Today the task at hand is setting up automated reports of webtrends data to be sent monthly by email. The duration of the data collected and the frequency of the reporting schedule are both flexible, it can be daily, weekly, monthly, quarterly or yearly.

The first step is to go to the administration menu from your login. There go to Report Designer and Templates. You can select one of their templates, I needed to create a new one.

Then name your template. Go to next, select the content by adding (and naming) a new chapter, then adding content to that chapter from the add report link on the menu above. Select the “built in report” list from the drop down to get the standard metrics available in webtrends. Check the boxes of the metrics you want included, I would say 4-8 per report is enough before you have too much data for someone to really use. You can make changes to the layout, although I was not looking for that level of detail now.

Click next at the bottom of the page. Then you have some configuration settings, like for wrapping text lines on long urls (ok) and how many rows of data in the reports (20-50 max for readability, top 5 is good).

Click next again and give profiles access to this report. I noticed mine are already given universal access and grey-ed out so nothing much to do with this screen. Then click save.

Next you then go back to the profiles list (admin menu and web analysis and reports & profiles) and edit the profile you want to get this report to add the report to the profile. This is one of those steps I think is redundant and should be automated or brought into the setup process before this because its confusing. You wave over the profile in the list (don’t click it) and get a menu with “actions” and edit is one of them.

From there go to reports in the top menu and on the drop down go to report templates. Click the box by your report to select it, ignore the second checkbox that is labeled default because it will change the default reporting style in the profile to this new report, and that isn’t the intention here.

Then go back to the admin menu a third time and to the scheduler menu (bottom) and then schedule jobs and click the button for a new job. This is the email setup part. Under job type select scheduled report and follow through the pieces of menu from left to right as you fill out each section. First select your profile you want reported on, next give the report a name and assign it to a user (yourself). (note this is also how to disable the emails with the check box below, no idea why this is hidden here). Report type: general report. Output type can me a database, pdf, excel/csv or pdf. I chose pdf because it looks professional and we don’t have to install Microsoft office/word on the server in order to export it. Its the only option that does not require that except the database. Number of data rows to report is up to you, I usually do top 20.

Next add the report destinations, this is where you need the email info. Add your email as the from, add theirs as the to address. Also, cc yourself on these reports so you get them too. Add the SMTP server address (if you don’t have the SMTP address it will hold up all of your other scheduled jobs, so don’t set this up without it.) So, the software knows where to connect to send it from. (contact IT about this if you don’t have it) You can also FTP it if you like your data that way, or save to a folder on the server. (not as exec friendly though) 

Under templates, complete view is ok. Under reports, here you select the reports you want to include. These are a duplicate of the ones you selected above, maybe redundant but this is literally the process we took on the phone with the WebTrends helpdesk people. Report type: standard again, date range: its up to you. Scheduling is next on the menu, you can’t run it on the 1st of the month because data may not have compiled yet in all time zones so the 2nd of the month is the first you can run a monthly report with the most recent previous month’s data. Ditto lag time for dailies, weeklies etc. Run once or run weekly/monthly/daily, as you choose.

The host binding section he literally told me to ignore. So I have no idea what that means. Then you get a summary page at the end and click save.

You just wait now and see if everything gets delivered correctly. It is good that the report is only generated once per month on the date you specify as a job that processes, so it can run data in the past (vs only from the point you created the report, forward like custom reports do because they create their own database table) and it won’t clog up your processing queue with a lot of memory/processing because it’s just once.

I wish there were short concise directions for setting up webtrends email reports like this on the web already but I realize that nothing is easy or self explanatory with database systems or webtrends. It’s just part of the territory until next generation tools come around, and no I don’t mean Google Analytics (which is almost as confusing now to beginners). Someday this has to get simpler in process so more people can use it.