Google Ngrams

For sharing and chatting about any interesting information related to Linguistics.

Google Ngrams

Postby Richard » 26 Jan 2016 13:10

In the linguistics class today, we looked at Google Ngram Viewer to check historical and dialectical differences in frequencies between different ways of saying the same thing (e.g. stupider v. more stupid). What is the most creative use of Google Ngrams you can think of for linguistics research?
Posts: 111
Joined: 28 Dec 2015 08:22

Re: Google Ngrams

Postby angvarrah » 28 Jan 2016 14:23

I think Google Ngram Viewer is not only useful for linguistic research but it is also fruitful for language teaching. For example, it allows the users to check differences in frequencies between words in different part of speech and clusters of words. POS tags available in the new Google Books NGram Corpus are such as verb (VERB), noun (NOUN), pronoun (PRON), conjunction (CONJ), and adjective (ADJ). When you need to find the frequencies of a particular word with different part of speech, you need to type as 'work_NOUN, work_VERB'.
Posts: 2
Joined: 06 Jan 2016 08:20

Re: Google Ngrams

Postby Woravut » 29 Jan 2016 09:31

I tried downloading the raw data, but it is indecipherable.

Has anyone tried?
Posts: 55
Joined: 05 Jan 2016 09:15

Re: Google Ngrams

Postby Thiwa » 29 Jan 2016 10:27

Google Ngrams might be useful if you want to look at the dependencies of words. It could be applied in analysing language used in spoken language in order to compare the perception of the dependencies of word used in conversation with those presented in written text.
Posts: 3
Joined: 26 Jan 2016 12:24

Re: Google Ngrams

Postby naratip » 29 Jan 2016 10:50

I think it can be useful for studying and predicting language change and evolution. For example, we can study lexico-grammatical innovation in language which started life as erroneous form but later gained acceptance as standard language.

Posts: 6
Joined: 13 Jan 2016 06:22

Re: Google Ngrams

Postby Richard » 01 Feb 2016 09:09

As an example, I thought I might use Ngrams to find out about the data set being analysed, but I ended up with a conclusion that I think is more interesting.

Since the application allows you to choose British or American English, I decided to see if we could see whether the proportions of these two sources had changed over the years. To do this, I looked at 'color' (for US English) and 'colour' (for British English). The general pattern follows expectations and from the English source suggests that British English sources dominated in the early years. However, looking at the US English graphs, it's interesting that 'colour' predominated up to about 1850. The 'color' spelling was promoted through Noah Webster's Dictionary (see etymonline page for '-or') which was published in 1828, although the 1844 edition may have been more influential (especially for Emily Dickinson). This suggests that the graphs for 'color' and 'colour' for American English may show the influence of Webster's dictionary.

This hypothesis seems to be confirmed for 'favor'/'favour' and 'flavor'/'flavour', both of which follow similar patterns to 'color'/'colour'. However, another of Webster's spelling innovations: '-er' for '-re', doesn't follow this pattern. 'Center' only becomes dominant around 1900, while 'theater' only dominates in the 1970s. Not sure what to make of this. Anyone got any thoughts why the two differ in the amount of time for the change to happen?
Posts: 111
Joined: 28 Dec 2015 08:22

Re: Google Ngrams

Postby stevelouw » 02 Feb 2016 10:26

On possibility is that the 're' and 'er' spellings had different denotations, which is what this article implies about the difference between theatre and theater in American English ( A 'theater' referred to the location or venue, but 'theatre' to the art form. Similarly, 'centre' would detonate the middle, while 'center' a location.

The google N-gram for 'theatre art' and 'theater art' seems to support this proposition - 'theatre art' in American English was more widely used until fairly recently, perhaps now the distinction is losing itself to a drive for consistency in spelling. However, I expected the results to be be confirmed by Ngrams for 'theatre critic', or even 'theater seats', but these seem to have been used fairly interchangeably until around the 1970s. Similarly, Ngram for 'centre point'/'center point', and 'dead centre'/'dead center' don't indicate that these two spellings carried different meanings. It seems, then, that the use of two spellings for different meanings is unlikely to be a reason for the delay in the adoption of the American spelling.
Posts: 49
Joined: 05 Jan 2016 12:55

Re: Google Ngrams

Postby sgtowns » 04 Feb 2016 14:07

The thread on sexism and dictionaries gave me the idea to test out "rabid". Sure enough, the word use jumped after 1800:"%20width=900%20height=500%20marginwidth=0%20marginheight=0%20hspace=0%20vspace=0%20frameborder=0%20scrolling=no

Doing a search for "rabid *_NOUN" (rabid followed by a noun) gives the following order of frequency:

rabid dog
rabid animal
rabid dogs
rabid animals
rabid fans
rabid wolf
rabid nationalism
rabid segregationist
rabid nationalist
rabid anti*_NOUN&year_start=1960&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t2%3B%2Crabid%20*_NOUN%3B%2Cc0%3B%2Cs0%3B%3Brabid%20dog_NOUN%3B%2Cc0%3B%3Brabid%20animal_NOUN%3B%2Cc0%3B%3Brabid%20dogs_NOUN%3B%2Cc0%3B%3Brabid%20animals_NOUN%3B%2Cc0%3B%3Brabid%20nationalism_NOUN%3B%2Cc0%3B%3Brabid%20wolf_NOUN%3B%2Cc0%3B%3Brabid%20anti_NOUN%3B%2Cc0%3B%3Brabid%20segregationist_NOUN%3B%2Cc0%3B%3Brabid%20nationalist_NOUN%3B%2Cc0%3B%3Brabid%20fans_NOUN%3B%2Cc0"%20width=900%20height=500%20marginwidth=0%20marginheight=0%20hspace=0%20vspace=0%20frameborder=0%20scrolling=no

Doing a search for nouns that are modified by rabid (*_NOUN=>rabid) a potential NEW definition of the word shows up -- rabid hatred and rabid fury. So it's not "support or belief". But it's more of a redundant phrase if you use the original meaning of the word. Or maybe you could define it as "extreme emotion".*_NOUN%3D%3Erabid&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t2%3B%2C%2A_NOUN%3D%3Erabid%3B%2Cc0%3B%2Cs0%3B%3Banimal_NOUN%3D%3Erabid%3B%2Cc0%3B%3Bdog_NOUN%3D%3Erabid%3B%2Cc0%3B%3Banimals_NOUN%3D%3Erabid%3B%2Cc0%3B%3Bdogs_NOUN%3D%3Erabid%3B%2Cc0%3B%3Bstate_NOUN%3D%3Erabid%3B%2Cc0%3B%3Bfury_NOUN%3D%3Erabid%3B%2Cc0%3B%3Bwolf_NOUN%3D%3Erabid%3B%2Cc0%3B%3Bwolves_NOUN%3D%3Erabid%3B%2Cc0%3B%3Bnationalism_NOUN%3D%3Erabid%3B%2Cc0%3B%3Bhatred_NOUN%3D%3Erabid%3B%2Cc0

(This last link doesn't seem to be working for some reason, but you can click on it then click on "Search" to get the graph.)
Posts: 67
Joined: 27 Dec 2015 13:55

Return to Linguistics

Who is online

Users browsing this forum: No registered users and 2 guests