There are many ways of filtering out bots and spam traffic in Google Analytics, the most common and easiest of which is to enable Google’s built in spider and bot filter.
But what about if we want to see the traffic from bots, split out in a separate view? Understandably, there is no “include only traffic from known bots and spiders” setting in Google Analytics, as this is not something that many people would use.
However, bot traffic can do more than just skew website metrics, it can also have a negative impact on server load, result in ad-fraud and be responsible for content scraping. Therefore, we think isolating this and examining it is a good idea.
This blog post shows you how to identify this traffic in Google Tag Manager and track it in Google Analytics.
What is headless browser traffic?
Headless browser traffic simply refers to traffic that is using a browser without the graphical user interface, the section highlighted in red:
This reason that this is important to us, is that headless traffic is one of the best indicators of bots. The reason for this is twofold. Firstly, it is a very popular method for crawling websites as it renders JavaScript (unlike some of the other methods) and can be interacted with programmatically. This means a large number of bots are making use of this method for crawling sites.
Secondly, it is uncommon for normal humans to use a headless browser, as it makes navigating the web far more complicated.
As a result, it is a great way of differentiating bots from humans. So how do we track this?
Setting up the headless variable in GTM
First we need to set up a variable in GTM to decide whether the visit is coming from a headless browser.
- Navigate to variables and click “Add a new variable”
- Choose “Custom JS variable”
- Paste the following code
function(){ if (window.screen.height - jQuery(window).height() === 0){ return true; }else{ return false; }; }
- Hit save
Your variable should now look like this:
Setting up the headless custom dimension in Google Analytics
The next step is to set up a custom dimension in Google Analytics that we can populate with the value stored in our new variable.
- Switch over to Google Analytics, navigate to the admin section of your property, click on “Custom Definitions” and then “Custom Dimensions”
- Under custom dimensions, click on “+New Custom Dimension”
- Call the dimension “Is Headless Session” or something along those lines.
- Change the scope to “session”
- Create the dimension
- Once the dimension is created, take note of the dimension index – you can find this if you go back to your list of custom dimensions
This is what your Custom Dimension should look like:
Populating the custom dimension in GTM
We now need to populate the custom dimension that we just created. To do this, we will switch back to Google Tag Manager.
- In GTM, find your Google Analytics Settings Variable – if you are not using a GA Settings variable, find your standard GA PageView Tag
- Under “More Settings” go to “Custom Dimensions” and click “Add Custom Dimension”
- Under “Index” add the index from of the custom dimension you just created in the previous step
- Under “Dimension Value” add the variable you created in the first step (to add a variable, click on the icon with the “+”)
- Save your changes
- Publish the changes to your GTM container
We are setting a number of custom dimensions, but you can see that we are now populating the new custom dimension (with an index of 7) with the value stored in our Headless Session variable:
Building the filters in Google Analytics
At this point, we are sending through whether or not a session is coming from a headless browser. All that is left to do is to create a new Google Analytics view that captures only these sessions.
- Create a new Google Analytics view (“Create View” in your admin screen)
- In your new view, navigate to filters and click “Add Filter”
- Name your filter something appropriate, such as “Only Headless Traffic”
- Select the “Include” radio button
- Under filter field, select our new custom dimension – “Is Headless Session”
- Enter the value “1” into the filter pattern field
- And save your filter
Here is what our filter looks like:
This view will now only contain headless traffic that in all likelihood comes from bots and spiders. We would suggest unchecking the “Exclude all hits from known bots and spiders” checkbox in your view settings to make sure that all bot traffic is captured.
If you have any questions regarding this, you can send me an email.