Using Google Correlate to Predict Success

Author: Data Scientist

In 2008, data scientists at Google discovered that search activity was a good indicator of actual flu activity. In response, they launched Google Flu Trends to provide users with estimates of flu activity all over the world. Over the next three years, researchers at Google and elsewhere continued to use search data to estimate real world activity. However, Google Trends—the main tool for accessing this data—only allows users to to enter a search term and see a see the trend over time, not to define a pattern and see which search terms match it. Researchers actively petitioned Google to create a tool to reverse the process and in May of 2011 Google Labs launched Google Correlate, allowing researchers to upload a data model and see a list of searches that correlate with that pattern.

google correlateExample of Google Correlate output. Photo courtesy of Google

At Notre Dame, we collect data provided by Google Correlate in addition to Twitter trends, and patterns gleaned from text analysis on a predetermined selection of media publications and blog outlets. We correlate this data to trend models of campus data to determine content and marketing strategy for various periods throughout the academic year. Combined with a machine learning algorithm that allows us to include variables not readily available in the above data-mining procedures, we can attempt to account for events that happen over larger periods of time or include multiple variables—like presidential elections—as well as those that don't occur on an easily perceived time cycle.

In this way, we can predict the best news stories to pitch media outlets or what stories are going to be worth telling in-depth and in-house. As a result, we were able to increase the number of stories featured or referenced in the media as well as increase traffic to news pieces that were planned using predictions from these data sets. In fact, pieces that were planned based on this data received approximately 30% more traffic then the average news story, with a notable peaks at ~1:00pm and ~6:00pm on the day of publication, corresponding with targeted social media efforts.

pageviewsAverage pageviews on news articles (blue) vs pageviews on articles based on data recommendations (orange) on the day of publication.

In the future, we're hoping to apply these efforts to a number of departments across campus in order to determine strategies for a variety of marketing efforts including the best seasons to advertise open faculty or staff positions, the most efficient time to begin promoting the sale of event tickets, or the most likely time and methods of engaging our alumni base. With enough data, and the right tools, correlating real world patterns with historical and current trends opens a number of exciting possibilities for marketing in higher education.