If you regularly log into your Google Analytics account, you’ve probably run into some strange looking websites in your Referrals report at some point:
What are these sites and why are they linking to your site? It's more than likely that these sites aren't linking to your pages at all but instead represent fake referrals. They are created by bots sending false info to your Google Analytics account for a number of spammy reasons. If you open one of these URLs in your browser, you will likely be redirected to an online store, marketing scam or malware site. In this post, we'll show you how to remove these fake referrals from your Google Analytics data.
Is Google Analytics Spam Messing Up Your Metrics?
What’s the impact of a this spam data on your Google Analytics metrics? On UND.com or ND.edu, sites that receive tens or hundreds of thousands of visits per day, spam data is likely unnoticeable. However, if your site caters to a smaller audience, it's likely your metrics are being skewed by spam data.
Below is the Acquisition > All Traffic > Referrals report from this site, which typically gets only a few dozen visits per day, especially if no one has written a new blog post in a few days.
In the table above, you can see that spam accounts for 8 of the top 10 referrals. Not only is this annoying, but it messes up the metrics pretty badly. For example, excluding all spam referrals the bounce rate is actually around 35% for referral traffic as opposed to almost 95% shown in the current report.
You could export this data to an Excel or Google sheet, and recalculate the numbers to generate an accurate report, but there's also a way to prevent this type of spam from corrupting your Google Analytics data in the first place.
Filtering Out Google Analytics Spam
The techniques for removing spam rely on using Google Analytics View Filters.
Filtering Ghost Referrals
The first group is what are often referred to as “Ghost Referrals.” These are referrals generated in your reports by fake visits. In this scenario, the spammers don’t even visit your website. Instead, they just send the data directly to Google Analytics and it gets added to your reports. This is typically why you'll see a chunk of "visits" that don't even last a second.
To start cleaning this up, we create a new view and then add some filters. As shown below, you can create a view in the Admin section of Google Analytics. Pick the Account and Property where you want to create a spam-free view. Views do not contain historical data older than the date on which they are created. If you create a view on Jan 2nd, there will be no data in that view prior to Jan 2nd. So, this new spam-free view will not help clean up the historical data – only the new data coming in.
Next, we are going to create a list of the valid hostnames that should be showing up in your Google Analytics reports. Essentially, a ghost referral will come in from network hostnames that are not your own.
Notre Dame has sites using roughly 500 unique subdomains plus UND.com so we can use a regular expression that will filter out all but Notre Dame hostnames:
That final hostname is passed to your data if a user views your website using Google Translate. It's important to not filter this data out, as the University gets a fair amount of international traffic across several sites. If you think your site might be getting valid traffic from additional hostnames, you can look at the Audience > Technology > Network report and select Hostname as the primary dimension. Set a long time range of at least a year or more if possible to ensure that you capture all the valid hostnames. Once you have the list of additional hostnames, separate each one with a "|" and put a backslash in front of any period characters.
To test a filter expression, you can always create a new custom segment first. Testing expressions is always a good idea since filters will alter your data permanently, and there's no way to get the data back into the view once the filter is turned on. You'll also be able to use this segment to view historical data, as filters do not retroactively delete any data. Once you're confident in your expression, you can add a filter to your new view.
Filtering non-Ghost Spam Referrals
Some spam bots do actually visit your website. These can't be removed using the hostname filter and need a separate filter targeting specific spam domains. This filter will EXCLUDE known spam domains while the previously described filter will only INCLUDE your host domains. To find the non-ghost spammers visiting your website, open Acquisition > All Traffic > Referrals and add Hostname as a secondary dimension. Spam sources where the Hostname is valid (in our case, anything including nd.edu) are the spam domains we need to exclude.
Similar to the previous filter, we need to use a regular expression, but this time use Referral as the Filter Field. The result should look like this:
It's best to continue checking your Referral report on a regular basis to check for any additional spam domains and add these to the filter.