Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process.
The corpus approach runs counter to Noam Chomsky's view that real language is riddled with performance-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlled laboratory setting[citation needed].
The problem of laboratory-selected sentences is similar to that facing lab-based psychology: researchers do not have any measure of the ethnographic representativity of their data.
Corpus linguistics does away with Chomsky's competence/performance split[citation needed]: adherents believe that reliable language analysis best occurs on field-collected samples, in natural contexts and with minimal experimental interference. Within corpus linguistics there are divergent views as to the value of corpus annotation, from John Sinclair advocating minimal annotation and allowing texts to 'speak for themselves', to others, such as the Survey of English Usage team (based in University College, London) advocating annotation as a path to greater linguistic understanding and rigour.
Source: Wikipedia.org
CORPORA
AMERICAN NATIONAL CORPUS (ANC)
BERGEN CORPUS OF LONDON TEENAGER LANGUAGE (COLT)
BRITISH ACADEMIC SPOKEN ENGLISH CORPUS (BASE)
BRITISH NATIONAL CORPUS (BNC)
CAMBRIDGE AND NOTTINGHAM CORPUS OF DISCOURSE IN ENGLISH (CANCODE)
CAMBRIDGE INTERNATIONAL CORPUS (CIC)
COLLINS WORDBANKS ONLINE ENGLISH CORPUS
6 Online Corpus
What can we use Corpus with our students for?
Mainly to keep them updated about common colloquial language: idioms, collocations, slang, reduced forms which they may need as they encounter real English in movies or songs. Also, to make a research project.
No comments:
Post a Comment