Support
Google Analytics: How to segment and filter out robot traffic
Posted on March 2nd, 2011 by

Google Analytics (“GA”) is the most popular web analytics tool on the Web, largely because it is both free and excellent. In the past, we have blogged about direct UX measurement in GA, and we are able to provide our beta customers with reports and data visualizations that combine Yottaa performance metrics with business metrics from GA. It’s exciting stuff if you share our passion for web performance and analytics.

Because performance monitoring systems like ours can impact GA reports, it feels right to spend a little time helping our users and the larger GA community with explicit instructions for filtering and/or segmenting traffic coming to your websites. These directions are specific to GA, but the principles are easily applied to other web analytics systems (such as Coremetrics Analytics or Omniture SiteCatalyst).

THE PROBLEM:

First, a description of the problem: some types of traffic to your website should simply never be counted in your reports. Internal traffic (from developers and testers) is one such category. Traffic generated by search engine crawlers like Googlebot is another. Similarly, traffic from automated solutions for testing or monitoring your site, such as Yottaa Insight, Keynote, Gomez and BrowserMob, should not appear in your metrics.

Because GA is implemented using JavaScript, simple crawlers that just follow links and don’t know how to execute JavaScript are automatically ignored by GA. However, there are increasingly sophisticated bots out there, including our own Yottaa Website Performance Monitoring robots, which can’t win at Jeopardy just yet but do know how to do things like run JavaScript and accept cookies. These smart bots are tougher to distinguish from normal human users, and by default GA will include the traffic they generate. Hence, the need to create custom segments and/or filters in GA. In both cases, you simply need to teach GA how to identify the bots, by looking at their browser type (which comes from the “User-Agent” header). Yottaa Website Performance Monitoring bots always identify themselves with “YottaaMonitor” in the User-Agent header.

GOOGLE ANALYTICS SEGMENTS AND FILTERS DEMYSTIFIED:

Before we dive into the step-by-step instructions, a quick overview of GA segments and filters may be helpful. Segments are merely ways of grouping users in your reports. Using segments will alter your view into your data, but will not change what’s actually being collected. Segments apply retroactively, which is to say that when you define a segment, you can then view all your historical data through the lens of the segment. So if you see traffic from Yottaa or other bots “polluting” your reports, have no fear, you can easily make it go away.

GA filters are more invasive than segments, in that they don’t just alter your view, they actually impact what is stored by GA and available for reporting. Filtering cannot be applied retroactively and only affects data collected after the filter has been created. Some GA users feel more comfortable first testing and refining custom segments to be sure they’ve got it right, then creating a filter (using the same rules or logic) once they’re sure. Alternately, you can create a duplicate profile for the same domain and only apply your filter to one of them, thus preserving collection of all “raw” data off to the side, while leveraging the power of filters in your main reporting profile. (See https://www.google.com/support/analytics/bin/answer.py?answer=55494 for more detail on this approach.)

Ok, without further ado, here’s what to do.

STEP-BY-STEP INSTRUCTIONS FOR CREATING A CUSTOM SEGMENT TO HIDE YOTTAA BOT TRAFFIC FROM YOUR GA REPORTS:

  1. Log in to GA
  2. Click “View Reports” for your site / profile
  3. In the left column under “My Customizations”, choose “Advanced Segments”:
    Advanced Segments link (screenshot)
  4. Choose “Create new custom segment”:
  5. Under Dimensions > Systems, choose “Browser”:
    Dimensions > Systems > Browser (screenshot)
    … and drag it to the “dimension or metric” area:
    Segment creation - dimension drag target area (screenshot)
  6. Edit the Condition to “Does not contain” (Don’t check the “case sensitive” checkbox)
  7. Under “Value”, type “YottaaMonitor” (without the quotes):
    Dragging "browser" dimension, creating condition (screenshot)
  8. Name the segment something like “Humans (no bots)”
  9. Click “Test Segment” (it should match on some number of visits, if you’ve been getting traffic from Yottaa monitoring bots)
  10. Click “Create Segment”, and you’re done.

Note if you find unwanted traffic from other bots too, you can either (a) create an additional “and” condition, and define a 2nd rule, or (b) change the Condition to “Does not match regular expression” and define a regex value to match on multiple bot names, e.g. “.*(YottaaMonitor|OtherBotNameHere).*” (without the quotes).

That’s it for segments. Now on to filters:

STEP-BY-STEP INSTRUCTIONS FOR CREATING A FILTER TO PREVENT YOTTAA BOT TRAFFIC FROM BEING COLLECTED IN GA:

  1. Log in to GA
  2. Click “Analytics Settings”
  3. Click “Filter Manager>>” (in the bottom-right corner of the page)
  4. Click “+ Add Filter”
    Add filter (screenshot)
  5. Name your filter (e.g. “Exclude YottaaMonitor”)
  6. Choose “Custom filter”
  7. Filter Type: [Exclude]
  8. Filter Field select “Visitor Browser Program”
  9. Filter Pattern “.*YottaaMonitor.*” (without the quotes)
  10. Case Sensitive: No
    Add filter details (screenshot)
  11. Select your relevant website profile(s) from “Available Website Profiles” on the left, and choose the “Add” button to move them to the “Selected Website Profiles” area on the right
  12. Click “Save Changes”

… and that’s it, you’re done.

ROBOTS.TXT AND BLOCKING MONITORING:

Finally, a note about the “robots.txt” Robots Exclusion Standard. Yottaa bots respect the rules of the road and will obey instructions found in robots.txt files. However, we strongly recommend against outright blocking of our bots, as doing so will prevent highly useful, free performance metrics from being collected. Filtering bot traffic out of your analytics tool of choice is simple, and allows you to continue monitoring your site in yottaa.com while keeping your analytics clean.

WHAT ABOUT YOU?

Do you have experience in adding segments and filters in GA? Do you have experience with segments and filters in other web analytics systems? Will you implement these as described above? Were we clear in our instructions? Feedback is always welcomed.

Posted in performance monitoring, Yottaa, Yottaa Insight™ | Tagged , , ,
cweekly

About cweekly

Chris is a happily married father of two young girls. Besides his family and day job at Yottaa in the area of product management (it is really more like a day job + evening job), he enjoys playing guitar, reading, playing soccer, and seeing live music. Chris has a passion for web technology and is a former developer and web architect. His favorite quote is "Life is to be enjoyed."
12 Comments
  • http://www.freshwaterschool.org/aerator-fishing-water/ Aerator Fishing Water | Freshwater & Saltwater Fishing Gear & Accessories

    [...] need the filter. your water will be gross and cloudy (and toxic)with out it. also with out that filter to remove nitrates and sunlight (assuming it is either out doors or by a window) you will have a [...]

  • http://www.guaranteed-web-traffic.org/free-website-traffic-in-90-minutes/ Free Website Traffic in 90 Minutes | Guaranteed Web Traffic

    [...] Google Analytics: How to segment and filter out robot traffic … [...]

  • http://www.sfwebdesign.com/google-analytics-how-to-segment-and-filter-out-robot-traffic Google Analytics: How to segment and filter out robot traffic … | SFWEBDESIGN.com

    [...] Original post:  Google Analytics: How to segment and filter out robot traffic … [...]

  • http://makemoneyfast.newwealthstreams.com/2011/watch-parallel-advisors-independent-wealth-management/ [WATCH]: Parallel Advisors Independent Wealth Management | Make Money Fast

    [...] Google Analytics: How to segment and filter out robot traffic … [...]

  • http://www.moneymakingwebsitesecrets.org/green-living/green-living-tips/earthworks-friendly Earthworks Friendly | Green Living Tips | Information and Free Resources |

    [...] Google Analytics: How to segment and filter out robot traffic … [...]

  • http://chris.weekly.org/blog/2011/03/23/blogging-for-real-over-at-yottaa Chris.Weekly.org – A Web Space » Blogging for real over at Yottaa

    [...] started doing is writing at http://blog.yottaa.com. My first post earlier this month was a how-to guide for creating filters and segments in Google Analytics, to prevent smart bots like ours from polluting web analytics. It’s not anything [...]

  • http://twitter.com/jvz Matt

    Awesome, thanks for the info! I was wondering how to check my realistic stats now that I have Yottaa making so many daily requests.

  • Jun

    Great! Thanks for the very detailed instruction!

  • http://www.CaseyCheshire.com/ Casey Cheshire

    Filter works perfectly- thanks!

  • Jonathan

    It’s sad but true that filtering out traffic by agent type may only remove about 20-30% of bots. The vast majority simply spoof the agent type to avoid being blocked. For example, take a look at the page views report in GA: even with agent types filtered, you will see hundreds of sessions that are traversing 100+ pages. Clearly not human.

  • Anonymous

    Thanks for article got my GA sorted. How about Get Clicky?

  • http://www.pegox.com/ Ravi

    This works wonderfully
    A point to note- activate the old version of GA while following the instructions.
    Thanks Chris