By Suman Gupta
Catchwords/phrases catch on, i.e., their uptake is the starting point for thinking about them.
A few notes follow on the frequencies of usage that characterise catchwords/phrases as such (i.e., their usage-frequencies). (The assumption that words and phrases are units in the same way calls for some unpacking in due course.)
With catchiness in mind, I have the following as definitive features for catchwords/phrases:
Simple recognition appears to be an obvious basis for discerning what is or is not a catchword/phrase. If I ask my colleagues to name five current catchwords/phrases given one or two examples of what I have in mind, they readily come up with a list. (They must be getting fed up with my asking this.) Generally, they do this confidently, unhesitatingly, and without asking searching questions about what catchwords/phrases mean.
Such immediate recognition might be based on rule-of-thumb frequency estimations. If I notice that I am encountering a word/phrase often nowadays in the news media, official documents, popular publications, everyday conversations, etc. I think it is a current catchword/phrase. That is not just because I note this and start using it within my remit, but because I know that each of those occurrences in news media, official documents, etc. are also within the remits of many others like me. Simple recognition is a reasonably robust indication of what we may call perceptual frequency estimation.
Reasonably robust as it is, perceptual frequency estimation is loose enough to make for significant divergences. When requested, the lists of current catchwords/phrases my colleagues come up with have some overlaps and quite a lot of differences. Moreover, if I ask them to put their five choices in order (such as, more to less popular), the orderings are often different. We may say that perceptual frequency estimation is robust to the extent that there are overlaps and some similarity in orderings. It may be possible to enhance the robustness of perceptual frequency estimation by asking my colleagues to have a look at each other’s lists and agree upon one list – to come up with a consensual listing.
I have a strong interest in perceptual frequency estimations and consensus building. They are methodologically very useful, for reasons that I will come to in a later essay.
Among my colleagues, those who are accustomed to methods for interpreting specific integral texts are inclined to be content with perceptual frequency estimations. However, those who are accustomed to methods for analysing collections of texts as word/discourse corpora are generally dismissive of such estimations. They also tend to be contemptuous about those who take them seriously. They prefer what we may dub empirical frequency measurement.
There are now various norm-laden and rather spurious binaries with slightly different nuances to express this dismissive attitude, such as subjective knowledge/objective knowledge, qualitative method/quantitative method, soft science/hard science. In diplomatic mode, one may occasionally be told that both sides are of equal methodological value. In all honesty, one is constantly given to understand that perceptual estimation is subjective/qualitative/soft science, and of a lower order of methodological value than the objective/quantitative/hard science approach to empirical measurement. The McNamara fallacy is rife among researchers and academics.
Under the pressure of this normative attitude, various routinised modes of being quantitative with qualitative materials, deriving an objective quotient from subjective observations, being hard with soft data have developed. In my view, these modes are usually not happy syntheses. They are adopted to give the subjective/qualitative/soft a misleading appearance of being objective/quantitative/hard. Various problems reside in those routinised developments, to which I will address a separate essay.
I have a keen interest in empirical frequency measurement.
The obvious way to go about this is to:
If the graph shows a notable and steady upward curve over a significant time-period for a word/phrase, we may say that we have a measurement for that word/phrase ‘catching on in a significant manner’ – so, that is a catchword/phrase for that time-period.
A popular catchword/phrase (in the terms given earlier) might be trackable in a large language corpus for a given language (e.g., a corpus for general English usage).
The most accessible facility based on a large corpus for obtaining such graphs is the Google Ngram Viewer. It draws upon the Google Book corpus of around 200 billion American and British English words. The various grey areas of the facility notwithstanding (see e.g., Pechenick et al. 2015, Younes and Reips 2019), this offers immediately processed graphic representations of patterns of usage-frequencies over time for words/phrases. For ease of discussion, let me confine the rest of this section to words that catch, and put catchphrases aside for the time being – while noting that, usefully, the Google Ngram Viewer can be used for word as well as (with some limitations) phrase graphs.
To read graphs like Google Ngrams meaningfully, some orientation with word usage-frequency measurements is needed. A listing of a small selection of common words across a range of usage-frequencies could be used as an orientation tool. Since this project blog concerns political catchwords/phrases, I have made such an Orientation Table for my own use -- you will find it linked here as an Orientation Table of Usage-Frequencies for Selected Words (mostly with political associations). This may help you to get a sense of what usage-frequency measurements look like for various more or less common words – at any rate, I have found it helpful. I will devote a separate essay to what political refers to in this project.
The Orientation Table gives two kinds of usage-frequency measurements for the chosen words in the left-hand column.
Contemplating the relationship between the 2018-snapshot figures and the all-time averages could help towards building a kind of mental map in terms of which to read usage-frequency graphs.
In the OED explanation it is observed that the higher usage-frequencies (i.e. higher Bands) are found for a relatively small number of words. Only 5.20% of all words in the OED have a usage-frequency of 1 and above (>1) occurrences per million, in Bands 6-8. The catchwords that are likely to come up in our time are very likely to be in the area of Bands 1-3 (<0.099 occurrences per million).
However, it does not really matter which Band the word in question is. The point of catchiness is not how frequently the word is used in relation to a total corpus size, but about how much and how quickly it catches. So, a catchword can appear as such at any level of usage-frequency, as much in the range of OED’s low usage-frequency Band 2 as of the high usage-frequency Band 5. It is the significant increase in usage-frequency that is relevant, not the OED Band of usage-frequency.
The increase in usage-frequency over a time-period appears in a Google Ngram graph as follows. To exemplify, I track the word ‘internet’/‘Internet’ between 1990 and 1999. Note that the steady growth, characteristic of catchwords, appears (at smoothing 3) as a logistic curve and occasionally as an exponential curve.
Google Ngram for ‘internet+Internet’, 1990-1999
Google Ngram for ‘internet+Internet’, 1990-1999
And here it is again, with the percentage usage-frequency at the beginnings of 1990 and 1999 marked
This graph shows that the usage-frequency for the word ‘internet’/‘Internet’ increased over 15-fold in the course of the decade. ‘Internet’ could be regarded as a wildly popular catchword of the 1990s. In general, insofar as Google Ngram indications go, I have it as a rule of thumb that: a popular catchword/phrase will usually show approximately a two-fold increase in usage-frequency in or in less than a 10-year time-period (≤ 10 years). This is a rule of thumb and not innately meaningful. The boundaries of doubling within a decade are based on convenience and my experience rather than an inherent rationale; other boundaries may well work better.
A very large corpus like Google Books is unlikely to be helpful for measuring catchwords/phrases which are more extensive than intensive (like organizational catchwords/phrases) or more intensive than extensive (like in-group catchwords/phrases). For those, the corpus needs to be delimited by organizational or in-group domains. To track the significant catching on of words/phrases according to domains:
Perceptual frequency estimations and empirical frequency measurements could serve to identify catchwords/phrases and to put them in some scheme/order. They indicate what words/phrases have caught on and to what extent – i.e., what the catchwords/phrases are. But they say little about why they have caught on. Without considering methods for addressing the why-question very little can be done by way of studying catchwords/phrases.
In this respect too, judicious attention to usage-frequency measurements could take one some of the way – but, it seems to me, not far enough. Perceptual estimations and the process of getting to consensuses may well be able to take one further.
Methods for addressing the why-question call for several essays.
First Inset image: Wolfmann, CC BY-SA 4.0 via Wikimedia Commons