Forgot your password?
typodupeerror
Google Stats Technology

New Google Tool To Find Trend Correlations 76

Posted by samzenpus
from the not-causation-tag-in-3-2-1 dept.
Kilrah_il writes "In 2008 Google found correlation between seasonal flu activity and certain search term, a finding that allowed it to track flu activity better and more rapidly than previous methods. Now, Google is offering a new tool, Google Correlate, that allows researched to do the same for other trends. 'Using Correlate, you can upload your own data series and see a list of search terms whose popularity best corresponds with that real world trend.' Of course, Google reminds us that correlation does not imply causation."
This discussion has been archived. No new comments can be posted.

New Google Tool To Find Trend Correlations

Comments Filter:
  • by NReitzel (77941) on Thursday May 26, 2011 @09:50AM (#36250162) Homepage

    This is a wonderful tool. In the short term, it should allow a lot of people to track interesting trends.

    In the long term, though, Heisenberg Rules. If I may paraphrase, "Knowledge of the model, invalidates the model."

    Want a real world example today? Stock market. This is why automated make-money tools don't work nearly as well as they should.

    • by Anonymous Coward

      This is a wonderful tool. In the short term, it should allow a lot of people to track interesting trends.

      In the long term, though, Heisenberg Rules. If I may paraphrase, "Knowledge of the model, invalidates the model."

      Want a real world example today? Stock market. This is why automated make-money tools don't work nearly as well as they should.

      They work precisely as well as they should. It's just that because they're not adding any information, they will do about as well as the market.

      If you use automated tools and are satisfied with relatively low growth, you can split your money into a portfolio that will make a tidy return over time. That's what mutual funds and so forth basically do. You can even steadily shift your money to lower risk funds as you approach retirement. You can make plenty of money this way for the effort you put in, it just t

    • by maxume (22995)

      Don't drop a bomb like that and then leave us in suspense!

      How well should they work?

  • by Albanach (527650) on Thursday May 26, 2011 @09:58AM (#36250244) Homepage

    Unfortunately the service appears to be limited to US search data. Hopefully this will be extended in the future.

  • by cpu6502 (1960974) on Thursday May 26, 2011 @09:59AM (#36250258)

    I'm really starting to like this company. Free web browser, free word processor (and spreadsheet?), free language translation, free nudie pics, free scanned books, free email, free Usenet reader, and now this cool Dataset research tool.

    Still not sure I want to store my documents on the internet though. (1) Not secure. (2) Government can review the documents without having to ask a judge for a warrant.) But overall I guess Google is a decent company. Why pay for stuff you can get for free and legal?

    • Re: (Score:2, Informative)

      by Anonymous Coward

      “If you are not paying for it, you’re not the customer; you’re the product being sold.”

      • That's odd, I don't feel sold. I don't owe Google or its advertisers anything. No slave traders are knocking on my door.

        Perhaps you were searching for a less alarmist term?

    • I'm really starting to like this company. Free web browser, free word processor (and spreadsheet?), free language translation, free nudie pics, free scanned books, free email, free Usenet reader, and now this cool Dataset research tool.

      Free in terms of cash yes, but cash isn't the only form a payment possible. With Google you barter your personal information and habits for all those 'free' things.

    • by razorh (853659)

      Blah blah blah blah blah blah blah. Blah blah blah, blah blah blah (blah blah?), blah blah blah, free nudie pics, blah blah blah, blah blah, blah blah blah, blah blah blah blah blah blah blah.

      what?!? where?!?

  • So when do they release the next product: Google Causation?

    • by Chrisq (894406)

      So when do they release the next product: Google Causation?

      There is actually a very strong possibility of this. Because sites are ranked by popularity of selection Google itself could well amplify trends. If you do a search for "family health" and the top results are news reports on increased rates, what would your next search be? If you get sick what are you likely to put it down to?

  • by Anonymous Coward

    Just think of all the things I'll be able to prove with this!

  • From TFA: "like Google Trends but in reverse."

  • This tool finds an association between categorical data, namely a search word and counts for searches of that word. "Correlation" refers to a special type of association, i.e. between two quantitative data, which, correct me if I'm wrong, this tool does not measure. Am I being pedantic here? Or should we take a stand for correct and precise useage of statistical terms?

    • No, it's correlation. You have one data set (numbers of searches through time for the inputted term) and it compares with other data sets (the number of searches thorough time for each term available).

      If you click on "Search By Drawing" you can see the two lines - data sets - in the graph: the one you draw and the one with the best correlation from their search terms.

    • Correlation does not imply association; but correlation is interesting nonetheless. If you can predict one thing via another, more predictable or measurable thing, then you have a way to track elusive data. That it isn't a cause or an associated thing is immaterial.

      Think about paid time off. Sick leave is associated with the flu; but paid time off is a condensation of sick leave, vacation time, etc. So people take a vacation day for a vacation, or a vacation day for being sick... now it's no longer as

  • by SPBesui (687868)

    Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

    http://xkcd.com/552/ [xkcd.com]

    • by ArcherB (796902)

      Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

      http://xkcd.com/552/ [xkcd.com]

      Actually, correlation does IMPLY causation. However, correlation does not EQUAL causation.

      Take the following as an example:
      I mow my yard with out my shirt on. The neighbor lady sees me and pukes. We can IMPLY from the given data that seeing my giant uni-moob (gut), my white, saggy, hairless chest, and my man-scaped back hair landing strip caused my neighbor to lose her lunch.

      As it turns out, she had a stomach bug and had been bio-matter from both ends all day, but without all of the data, we were left wi

  • by Palmsie (1550787) on Thursday May 26, 2011 @10:27AM (#36250504)

    Correlations are one of those simple statistical terms that lots of non-technical people like to throw around without actually knowing what it means. It's a wonderful tool that Google has provided for everyone but people need to remember what the basic assumptions are of correlations, namely a relatively normal distribution of scores and independence of observations. Independence is especially important if you're tracking search engine results because if you were to look at how many times people Google'd Randy Savage's name the day he died it would influence the subsequent day, ultimately biasing whatever other variable you decided to correlate it with.

  • "Microsoft" corresponds heavily with "Windows", "software", "updates", and the like, while Apple corresponds with "Apple Store", "large dog", "extra large dog", and ... "muleys"... wtf?
  • Hmmm. The trends for both Correlation and Causation have a graph similar to the ribosome example given by google. With peaks and troughs of interest that seem to match the semesters of the school year. With less interest in summer, though smaller schools in the southern hemisphere would still be running. And a massive drop off around Christmas when schools world wide would be on holidays. But the two terms have an R value less than 0.91 (I haven't bothered to work out how much less though). So I guess there
    • So I guess there is some truth in the age old saying, Correlation != Causation.

      That's true as far as it goes, but in this case it's because they're selecting from a very large ensemble of data series to find the one with the highest R. If you account for the number (and sizes) of data series they evaluated to do that, you can estimate the "true" significance level in a realistic way. Statistics leads us astray only when we fail to apply it properly.

      Correlation of a preliminary kind may not "imply" causation, but it can certainly suggest it, sometimes very strongly. A repeatable correl

  • Weird science (Score:4, Insightful)

    by cultiv8 (1660093) on Thursday May 26, 2011 @10:52AM (#36250802) Homepage
    From the Google Correlate Whitepaper:

    Trends in online web search query data have been shown useful in providing models of real world phenomena. However, many of these results rely on the careful choice of queries that prior knowledge suggests should correspond with the phenomenon.

    Yes, that is how science is done; hypothesis, predict, test, evaluate.

    Here, we present an online, automated method for query selection that does not require such prior knowledge. Instead, given a temporal or spatial pattern of interest, we determine which queries best mimic the data. These search queries can then serve to build an estimate of the true value of the phenomenon.

    So we have a backwards type of science: Evaluate, test, predict, hypothesis. Cuz hey, if there's a correlation, there must be a relation, and if there's a relation, we can build an estimate of the value of the relation, right? The marketing manager is gonna LOVE this....

    • Re:Weird science (Score:5, Insightful)

      by Hatta (162192) on Thursday May 26, 2011 @11:13AM (#36251060) Journal

      Absolutely correct, this is going to swamp us in false positives. Remember, in order for science to work the way it's supposed to we have to report the negative results as well as the positive results. If 20 groups do the same experiment and only one gets a result significant at p=.05, that "positive" result doesn't mean anything. p=.05 means there's a one in 20 chance of the correlation being random.

      It's the same thing here. If Google goes out looking for positive results, and ignores all the negative results this is going to be so skewed as to be worthless.

      • by Anonymous Coward

        It's not designed to answer questions. It's designed to help you form hypotheses. Correlation is not causation but it *might* imply a relation. That's why a human being with a brain looks at the results and goes "hmm, that's interesting".

        As an example, "google" correlates well with "kratom", a plant used for herbal remedies. The correlation is very high, too, 0.98! I don't blindly assume they are related, though. But I could parse down to see if any other connections make sense and could then test tha

      • Or they could approach it as an engineering problem, where the key is to find signals of interest that have some predictive power. If they add (or have) a feature to partition the data into independent training and test sets either across time or across sites, preferably permuting over some subset of combinations, they can get help mitigate these issues. Without doing things like that, they are (as you point out) failing to address a brutal multiple comparisons problem.
      • by Geminii (954348)
        However, it will present many false potential correlations up front and from a single source. That way, researchers can check them all out before some smartass replies to their paper with "Oh yeah? Did you check XYZ, which has a 0.8 correlation? Huh?"
    • Re:Weird science (Score:5, Insightful)

      by i kan reed (749298) on Thursday May 26, 2011 @11:29AM (#36251290) Homepage Journal

      Yes, that's true, except there's a step before hypothesis. Observe. You're not allowed to use data from your observations that generated the hypothesis to support it, but you are allowed to use data to build a hypothesis in the first place.

      As their comic points out, property values correlate to liposuction searches. That's an interesting fact that you might make a socioeconic hypothesis based on. You could then turn to other avenues of research to validate your hypothesis.

      Not everything in science is a race to conclusions.

  • This is great! Now we can finally analyse what people are correlating in Google Trends that tells us what people are searching, then we can use this correlation search data to build Google Correlate Correlate, then we can use this to analyse what people are correlating on things that other people are correlating, then.. then the thing goes on and on and on..

    1) Google Search
    2) Google Trends
    3) Google Flu Trends
    4) Google Correlate
    5) Google Correlate Correlate
    6) Google Correlate Correlate Correlate
    7) ???
    8) Prof

  • How much time until they launch Google Causation?
  • If you type "recession" into Google Correlate, it tells you there is a correlation factor of 0.9059 with "microsoft word 2008".
  • Here's a quick game. Try and find a term with the highest weekly search volume when normalized against the usual search volume for that term.

    Here are a few that I tried:

    http://correlate.googlelabs.com/search?e=inauguration&t=weekly# [googlelabs.com] - 19.637
    http://correlate.googlelabs.com/search?e=Michael+Jackson&t=weekly# [googlelabs.com] - 14.537
    http://correlate.googlelabs.com/search?e=Olympics&t=weekly# [googlelabs.com] - 11.656
    http://correlate.googlelabs.com/search?e=new+year's+eve&t=weekly# [googlelabs.com] - 8.355

    Also, check out the "Search by Drawing"

  • A proof that correlation is evidence of causation,
    even though correlation does not imply causation:

    http://kim.oyhus.no/CorrelationAndCausation.html [oyhus.no]

  • So, why do people stop caring about autism at Christmas? http://correlate.googlelabs.com/search?e=autism&e=christmas&t=weekly# [googlelabs.com]
    • Try searching "cellulose". You'll see that Christmas is also a nadir, extending up to New Year. In fact, most of the correlations Google gives you have the same exact pattern: the major peak is every September - it that falls to the baseline around November. A second peak is found in January. I think what we're seeing is students searching for homework answers from the Internet. The correlated words are all those that students are likely to search, most of them being unrelated to cellulose. In effect, we're
  • I uploaded the closing stock prices of GOOG for the last two years. It showed fairly poor correlations with several random phrases. "Eye won't stop twitching" was my favorite.

Prediction is very difficult, especially of the future. - Niels Bohr

Working...