Showing posts with label Corpus. Show all posts
Showing posts with label Corpus. Show all posts

Friday, November 26, 2010

Corpus Linguistics

Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process.

The corpus approach runs counter to Noam Chomsky's view that real language is riddled with performance-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlled laboratory setting[citation needed].

The problem of laboratory-selected sentences is similar to that facing lab-based psychology: researchers do not have any measure of the ethnographic representativity of their data.

Corpus linguistics does away with Chomsky's competence/performance split[citation needed]: adherents believe that reliable language analysis best occurs on field-collected samples, in natural contexts and with minimal experimental interference. Within corpus linguistics there are divergent views as to the value of corpus annotation, from John Sinclair advocating minimal annotation and allowing texts to 'speak for themselves', to others, such as the Survey of English Usage team (based in University College, London) advocating annotation as a path to greater linguistic understanding and rigour.
Source: Wikipedia.org

CORPORA
AMERICAN NATIONAL CORPUS (ANC)
BERGEN CORPUS OF LONDON TEENAGER LANGUAGE (COLT)
BRITISH ACADEMIC SPOKEN ENGLISH CORPUS (BASE)
BRITISH NATIONAL CORPUS (BNC)
CAMBRIDGE AND NOTTINGHAM CORPUS OF DISCOURSE IN ENGLISH (CANCODE)
CAMBRIDGE INTERNATIONAL CORPUS (CIC)
COLLINS WORDBANKS ONLINE ENGLISH CORPUS

6 Online Corpus

What can we use Corpus with our students for?

Mainly to keep them updated about common colloquial language: idioms, collocations, slang, reduced forms which they may need as they encounter real English in movies or songs. Also, to make a research project.

Sunday, August 23, 2009

Corpus

• CORPUS
From Corpus to Classroom

WHAT IS CORPUS?

• What is Corpus?
A corpus is a collection of texts, written or spoken, which is stored on a computer.
A corpus is a principled collection of texts available for qualitative and quantitive analysis.
It must represent something and its merits will often be judged on how representative it is.

WHAT CAN WE USE FROM IT?

• COLLOCATIONS
Words that collocate with another and no other:
Depend on
Look up
Wooden box (ADJECTIVE+NOUN)

• WORDS/CHUNKS
A SMALL COMPONENT OF LANGUAGE:
I
YOU
I DON’T KNOW
A LOT OF
ONE OF THE
I MEAN
THE

• DISCOURSE MARKERS
OPENINGS AND CLOSINGS
YOU KNOW
I MEAN
ANYWAY
MIND YOU
WELL

• FREQUENCY
THE RANGE IN WHICH A WORD IS REPEATED IN CERTAIN DISCOURSE
S1—S2—S3
W1—W2—W3

• REGISTER
FORMAL/INFORMAL/COLLOQUIAL
TECHNICAL