1.2 What to Record

This section provides a brief overview of what you as a language documenter might collect and archive.  As you will see, what you decide to collect, record, and curate for your documentation corpus depends on your ultimate goals.  One shared goal of language documenters is to create a lasting comprehensive record of language as it is used in spoken form.  Recordings of spoken naturalistic interactions are a major part of what language documenters collect.  Community documentation projects may also focus on photographs and existing written materials on the language. 

From Language Data to Language Archives

One reason for documenting a language is to provide  resources  for language maintenance and revitalization.  For that purpose, we collect audio, video, and text material that represents authentic use of the language.  We also create ways for people to access those materials.  This includes intellectual access and practical access.  Intellectual access is made possible with transcriptions and translations.  When we do this, recordings can be analyzed and described and prepared for use in teaching or other applications.  Practical access is made possible when the collected materials are easily located and freely accessed as through a publicly accessible language archive.  Here the material may be used by many people and for a long time to come.  First we’ll talk a little bit about gathering language data and in later modules we’ll explore how to prepare a language documentation project to make accessible through a language archive.

Types of Language Interactions to Collect

In the sections below, we will review what you can document about your language.  We will provide tutorials, tips and examples.


You may notice that young people are forgetting words for traditional food items, plants, birds, religious events and similar domains specific to your community's culture.  You may find it useful to collect these words based on semantic domains or around particular events.  Think about the following:

  • What are some domains that are of interest to me and my community at this time? 
  • How can I collect words on that domain?  
  • Are there traditional stories that consist of interesting words from this domain?

Another type of wordlist is used by linguists to compare across  languages to see how closely they are related.  Here is an example word list, called the Leipzig-Jakarta List.  You could collect these words in isolation, repeated three times and then in a phrase like “I like the word ___”.  

1. fire 34. who? 68. skin/hide
2. nose 35. 3rd person pronouns 69. to suck
3. to go 36. to hit/beat 70. to carry
4. water 37. leg/foot 71. ant
5. mouth 38. horn 72. heavy
6. tongue 39. this 73. to take
7. blood 40. fish 74. old
8. bone 41. yesterday 75. to eat
9. 2nd person pronouns 42. to drink 76. thigh
10. root 43. black 77. thick
11. to come 44. navel 78. long
12. breast 45. to stand 79. to blow
13. rain 46. to bite 80. wood
14. 1SG pronoun 47. back 81. to run
15. name 48. wind 82. to fall
16. louse 49. smoke 83. eye
17. wing 50. what? 84. ash
18. flesh/meat 51. child 85. tail
19. arm/hand 52. egg 86. dog
20. fly 53. to give 87. to cry/weep
21. night 54. new 88. to tie
22. ear 55. to burn 89. to see
23. neck 56. not 90. sweet
24. far 57. good 91. rope
25. to do/make 59. knee 92. shade/shadow
26. house 60. sand 93. bird
27. stone/rock 61. to laugh 94. salt
28. bitter 62. to hear 95. small
29. to say 63. soil 96. wide
30. tooth 64. leaf 97. star
31. hair 65. red 98. in
32. big 66. liver 99. hard
33. One 67. to hide 100. to crush/grind

Sentence Structure and Domains of Use

In the future, community members may access the wordlists you have collected.  They will want to know the appropriate domains of use or contexts in which to use the words.  To understand 'domains of use', think about the words you use in different aspects of your life, for example, at a cafe with friends or in a classroom with your fellow students and teacher.  What are some words used in each domain?  You will find that while many words are the same, some will be different.  In addition to domains of use, speakers will need to know how to use these words appropriately.  Is there a polite way or a more direct way that these words are used?  To provide this information, we suggest collecting a wide variety of genres.  It is important not to collect only word lists but also connected speech such as sentences.  Within a collection of connected speech of different genres, users of the language documentation project will be able to find words and their applications in different domains of use and in sentences.  Here are some examples of genres:

  • Conversations
  • Traditional Folktales
  • Speeches
  • Blessings
  • Proverbs
  • Jokes
  • Personal histories
  • Instructions (e.g., on how to cook a meal, build a house, fish, trap an animal, grow vegetables)

These are just some examples.  Think about the following:

  • What kinds of connected text would you like to collect?
  • What kinds of words and sentences do you think you will find in these texts?
  • What are some rules of appropriateness in interaction you can think of?

Verbs: Actions by Whom and When and for How Long

A lot of crucial information is associated with the verb.  This includes:

  • Who did an action (I, you, he/she)
  • If the action was
    • done in the past, right now, or will be done in the future (called tense).
    • ongoing or completed (called aspect).   
    • possible or probable (called epistemic mood).
    • an obligation or choice (called deontic mood).

It can be helpful to collect data on how this information is relayed.  We usually display such information in charts called verb paradigms. A sample paradigm can be found in this article on Lamkang Verb Conjugation

Importance of More than Just Words

As the old saying goes, a picture is worth a thousand words.  Don't forget to take photographs and write down the physical and social contexts of the interactions you record.  This may help you understand how the formality of the setting or multilingualism affects language use.  The extralinguistic parts of communication you note down or capture on video recording and photographs may range from hand gestures or facial expressions to how to properly hold a traditional implement.  This information provides context to when and where specific terms are used along with providing non-verbal information about language use.