Glossary: corpus

Explanation

A collection of extracts of naturally occurring spoken or written language, often in computerised form. It can include written language and/or transcribed spoken language. See also http://www.englicious.org/lesson/corpora-and-web-tools/what-corpus.

Word frequency in speech and writing

Comparing word frequencies is an interesting way to think about some of the differences between speech and writing. Which are the most frequent words in speech, and how do they compare with the most frequent words in writing?

Word frequency in speech and writing: Activity

Spoken English

the

I

you

and

it

a

’s*

to

of

that

Written English

the

of

Word salads (secondary)

In this resource we’ll look at what grammar is and why we need it. First of all, take a look at the word salads. They can be found in the Activity pages within the menu entitled 'This Unit' in the upper right of this page. The slides show real spoken sentences drawn from our corpus, which have been jumbled up into the wrong order. The students' task is to rearrange the words into an order that makes sense.

Word clouds in action

Goals

  • Examine a poem as a corpus, like a body of linguistic data.
  • Linguistically analyse the words used in a poem.
  • Create a word list based on a poem.
  • Present linguistic findings visually using Wordle.

Lesson Plan

Wordle is a simple corpus tool which allows you to paste in text and create a ‘word cloud’ that displays the frequency of words by their relative size in a cloud.

Word frequency

What are the most frequently used words in English? And could we do without them?

Word frequency: Activity

The 10 most common English words are:

the

of

and

a

in

to

it

is/was

I

for

Can you answer the following questions without using these 10 words?

Corpora

A corpus is ‘a collection of pieces of language, selected and ordered according to explicit linguistic criteria in order to be used as a sample of language’ (Sinclair, 1996).

Corpora: Useful web tools

The following are corpus-related websites which we think are helpful for investigating language.

Wordle

Wordle is a simple-to-use site that lets you paste in your own data and then creates an attractive ‘word cloud’ based on the frequency of the words you’ve used. You can use Wordle as a very simple corpus tool for something like a poem, a song lyric, a political speech or a soliloquy from a play and get a visual representation of the language within it. (See also the lesson entitled 'Word clouds in action', which uses Wordle as a way in to analysing a poem).

Grammar

We define grammar as the study of the structure of words and sentences. As such it is an abstract system, worthy of study in its own right. However, we also see grammar as a system that is used in a range of contexts to unlock meaning. We want to look at grammar not only in written language, but also in spoken English, in a range of multimodal forms, and in all its rich variety.

»

Englicious contains many resources for English language in schools, but the vast majority of them require you to register and log in first. For more information, see What is Englicious?

Englicious (C) Survey of English Usage, UCL, 2012-15 | Supported by the AHRC and EPSRC. | Cookies