Using Corpus Analysis Software to Analyse
Specialised Texts
1. What is a
corpus?
In corpus
linguistics, a corpus can be generally defined as… ‘a collection of naturally-occurring texts in a computer-readable
format which can be retrieved and analyzed using corpus analysis software’ (Kennedy,
1998; McEnery & Wilson, 2001; O’Keeffe, A., McCarthy, M., & Carter, R. , 2007; Teubert & Cermakova, 2007)
2.Sources of language
corpora
· ‘Paraconc’
(http://www.athel.com/para.html)
3. Designing a specialized corpus
Corpus size
· There are
no fixed ruled; depending on research purposes, availability of data and time.
· Large, general corpora may be less useful than small, focused
corpora if searches are made on context-specific terms.
· There are
limitations of ‘too small’ corpora
e.g. not enough concepts, terms,
or patterns under investigation.
· It
is preferable to create a ‘monitor’ or ‘open’ corpus because specialized
words/usage are dynamic.
Text extracts vs. full texts
· Depends on the aim of corpus compilation.
· Whole
text offers more coverage because words or terms to be looked at may be
randomly distributed throughout the text.
· Specific sections may be helpful if we are looking for words or
phrase under particular content areas or want to create purposeful sub-corpora.
Number of texts
· Choices can be made between collect few texts of large size or a number of
texts with smaller sizes.
· Choices can also be made between selecting texts written by one or
two key writers or sources, or texts retrieved from different sources or
written by different authors.
· Depends on your research focus e.g. to study overall language use or to study idiosyncrasy or
linguistic choices preferred by particular writers.
Medium
· Can be spoken or written texts or mixed.
· Depends on research questions.
· Some practical factors should also be considered e.g.compiling spoken corpora can be time-consuming
and needs special types of tagging.
Subject and text type
· Should
mainly focus on the specialized text under investigation, although this is less
clear-cut in multidisciplinary subjects.
· Texts may come from different subject if the research focus is on
the study of particular language features rather than term extraction.
· Text
types within a specialized subject field may vary from‘expert-to-expert’ texts
to ‘expert-to-non-expert’ texts, or in other words,
from technical to popular texts.
Other considerations
· Authorship: Texts written by experts in a
field tend to present more reliable and authentic examples of specialized
language.
· Language: Specialized texts can be stored and retrieved in the form
of monolingual, comparable, or parallel corpora.
· Publication date: Texts should come from
recent publications unless queries are made in relation to particular periods
of time.
4. Sources
of specialized texts
·Printed
materials
· Word
document
· CD-ROMs
· Texts
on the Web
· Online
databases
5. Getting
started with Antconc
Download the latest version of
Antconc watch YouTube tutorials from http://www.antlab.sci.waseda.ac.jp/antconc_index.html
1.Run
the program.
2. Open
Files (browse and select targeted files) or
Open Dir (to select targeted folders)
3.Choose the function.
4.Clear All Tools and Files before selecting opening new files.
5. Save
Output to Text File to save output e.g.concordance
lines.
0 ความคิดเห็น:
แสดงความคิดเห็น