Page 1 of 1

Problems with Google Ngrams

PostPosted: 26 Sep 2016 07:30
by Richard
We've used Google Ngrams several times as a quick way of gaining insights into language use. While it's fun and easy to use, there are several problems - see https://www.wired.com/2015/10/pitfalls- ... gle-ngram/ Do you think these problems are serious enough to make Google Ngrams so unreliable that it shouldn't be used?
I like the point about the confusion between the traditional long 's' and 'f' (very well illustrated in the example).

Re: Problems with Google Ngrams

PostPosted: 28 Sep 2016 16:13
by stevelouw
I've just written a blog for language teachers about how they can use Google Ngram to help them in the classroom. I feel that it's a really useful tool for teachers, and that teachers can also easily teach students how to use it so they can experiment with it too. Some common classroom questions students have can be quite quickly answered with it - like whether to use 'toward' or 'towards'.

The writer of this article outlines some of the potential problems with the Ngram Viewer, but also ends by saying that problems aside, the tool is powerful and useful. What I read from the article is that in using Ngram, we don't the ultimate answer to whatever our queries are, just some additional data.

I also liked the 'f'/'s' discussion - that makes sense now, but I hadn't thought of it before.

Re: Problems with Google Ngrams

PostPosted: 30 Sep 2016 09:50
by punjaporn
A piece of data is useful in some way, however, with its limitations. More important point is that we need to know and aware of what it can or cannot be used for. It is actually a useful tool for learning and that’s a good idea to have something “fun and easy to use” in the classroom. But if you want to use it, read Steve’s blog first! For more serious research, I think researchers can use it carefully as a starting point for more detailed analysis. Its data can be used with cautions.

Re: Problems with Google Ngrams

PostPosted: 02 Oct 2016 12:00
by sgtowns
I agree with Aum that Google NGrams could be a "fun and easy to use" tool in the classroom -- perhaps it can give students some insights about how language changes over time. But I don't think that it is a very appropriate tool for use in research, unless it's just a starting point, as Aum said. Has anyone found a paper that uses Google n-grams as their primary data source? I'd be interested to read it.

Just as another example that is not mentioned in the linked article is the chart that you see when you go to https://books.google.com/ngrams now. It shows that "Frankenstein" has gained in popularity since 1960, easily outpacing "Albert Einstein" and "Sherlock Holmes". But chances are, the books are not referring to the famous Frankenstein story, but instead the new definition of this word meaning of "a monstrous creation; especially: a work or agency that ruins its originator" (from Merriam-Webster Dictionary). But there is nothing in the data that would show this. You would only find it by looking at concordance lines.

And by the way, I would also be interested in reading Steve's blog. What is the URL? :)

Re: Problems with Google Ngrams

PostPosted: 05 Oct 2016 18:15
by stevelouw
Here you go :

http://www.ajarn.com/blogs/stephen-louw ... y-part-two

Enjoy, and don't laugh too much. It's written for the run-of-the-mill teacher out there, so it has little meat to it. As a research tool, I agree with Aum that the Ngram Viewer gives a good starting point and maybe some initial insights, but it's a piece of data that has to be fitted in with other data to draw meaningful results.