The evaluation of TNU current oral testing practices is carried out in relation to four factors described in 4.1.1: (1) test design stage, (2) test operationalization stage, (3) test administration stage, and (4) use of test results.
As can be easily seen in Table 4.1, oral tests are explicitly identified as achievement ones from the very start. Obviously, clear identification of test type at the beginning of a course proves to be beneficial because the teachers can integrate the test content into the teaching program. As pointed out by Brown (1994), Heaton (1990), Hughes (1989) and Ur (1996), achievement tests should be integrated into the teaching program and related directly to the classroom lessons or units, the syllabus or curriculum. Therefore, information or indication of students’ performance on an achievement test reveals their achievement or progress at the end of a course of study (Bachman & Palmer, 1996), and an achievement test of speaking skill is of course a means of eliciting students’ progress in overall speaking ability after a course/term of study.
However, a product of this stage involving such four crucial things as students’ profile of language ability, construct/ability to be measured, sets of test tasks in the TLU domain and a plan for test quality evaluation, as described in 4.1.1, has never been produced and presented to the teachers as a principled basis or guidelines for the other two stages. This undoubtedly indicates that the first stage of oral test development at TNU is far from being consistent with the theoretical framework reviewed in 2.3.1 – Chapter 2. As a result, this big mismatch leads to the staff’s improper practices in the other two stages.
- Test Operationalization Process
Apart from the mismatch between practice and theory at this institution mentioned above, a remarkably essential fact shown in Table 4.1 is that the Department and English Section have not provided any specific guidance, i.e. a blueprint, for speaking test construction process, namely (1) the number of test tasks to be included in a speaking test, and (2) specifications of each test task. These two factors are critically analysed respectively.
Firstly, as previously discussed, an achievement test of speaking skill is a means of eliciting students’ progress in overall speaking ability after a course of study, yet most of the achievement speaking tests in use at TNU can be asserted to fail to serve this purpose because they make use of merely one type of oral test or one test task – Tests where the learner prepares in advance (Tables 4.2) - combined with only one elicitation technique - Oral Report (Table 4.3). This is partially because no blueprint is presented. Underhill (1987) points out that an oral test rarely consists of only one elicitation technique but it is usual that it involves several techniques placed in a sequence. The reasons he provides for including more than one technique in an oral test are as follows
- It is more authentic to use a mix of techniques, with the learner doing different things with the language....
- An oral test that consists only of Question and Answer, for example, will naturally favour learners who are good at answering questions....
- To help improve the consistency of assessment, a change of tasks during a test can be used as an opportunity to swap interviewers and so combine multiple tasks with multiple assessment....
- A live test with several different parts is more flexible and can be adapted quickly to meet changing circumstances or different needs....
(Underhill, 1987, p.38)
Probably, such test tasks have been carefully discussed in class, and the students are expected to produce ‘well-prepared’ talk, even predictable questions can also be prepared in advance. Of course, ‘the task(s) on which the student has to perform may be generally familiar in form to the student, but the student cannot ‘prepare’ a written version of what he will say’ (Brown & Yule, 1983, p.120). He must prove to the assessors that in his test performance he has learned to use, not to repeat, what he has been taught. What we as examiners want to know when testing a students is not whether the students has learned what to have been taught, but whether he is able to produce an extended piece of spoken English appropriate to the communicative situation he encounters (Brown & Yule, 1983, p.120).
Obviously, this popular kind of oral test at TNU is far from being useful in measuring the students’ overall language oral proficiency, and can be said to be lacking in construct validity and reliability (See 2.5, Chapter 2).
Secondly, no specifications of particular test task(s), especially specified components of oral ability to be tested, areas of language knowledge adequate and a marking key, to some extent, results in the teachers’ or test designers’ inadequate and useless tests. It can be said that there is lack of consideration of communicative stress in the oral test construction.
As can be seen in four achievement speaking tests (See Appendices 1 & 2), all the test questions/tasks – topics- are never accompanied with any external prompts helping the students make a structured presentation, and any explicit instructions quantifying language knowledge and ability needed to perform the tasks.
It is extremely necessary for test writers to provide clear instructions helping test takers to organise a spoken presentation for test performance because students are always encouraged to produce effectively organised speech so that the listener finds it easy to catch up with what is being said (Brown & Yule, 1983, p.119).
Also, in order to write test tasks fitting students’ proficiency levels, test writers need really give explicit instructions quantifying language knowledge and ability. The quantification of performance on a particular task much depends on the grading of tasks according to cognitive difficulty (Brown & Yule, 1983, p.121). To put in another way, the same task type can be made easier or more difficult. For example, describing a room with 8 elements is apparently more difficult than a room with 5 elements. Inevitably, test designers or teachers of speaking skill should always bear in mind informed judgements of the degree of this cognitive difficulty or communicative stress (Figure 2.2, Chapter 2) during test operationalization process.
Besides, no official instructions on criteria for marking students’ test performance are presented; thus, the test writers/teachers are unaware of the importance of scoring method(s) for each test task, and they never design a marking key (See 2.4.3, Chapter 2) instructing assessors how to assess students’ performance on test tasks. As discussed in 2.4.3, in a marking key, language and skill categories are identified and awarded separate marks according to test purpose(s). As Underhill (1987, p.94) points out the aim of a marking key is ‘to save time and uncertainty by specifying in advance, as far as possible, how markers should approach the marking of each question or task’. With help of a marking key and a level scale mentioned above, assessors can mark a test more quickly and reliably, for each language or skill category is expected to be separately marked.
- Test Administration Process
Table 4.1 and 4.3 indicate that TNU speaking test administration reveals many a shortcoming. These weak points include (1) lack of test administration standardisation, (2) lack of reliability in marking test takers’ test performance, and (3) lack of supportive testing environment.
First, before test administration there has been no official meeting - named ‘the standardisation meeting ’ by Alderson, Clapham & Wall (1995)- for discussion and agreement on how to mark each question/task among the group of assessors. Perhaps the administrators here tend to think the assessors, as language teachers, must obviously know how to fully elicit the students’ oral proficiency, so they do not need to be informed of what to do during the test. Even when the assessors can be aware of the importance of this meeting, they are unable to hold it. It is partially because the staff’s insufficient knowledge of oral testing cannot help them to design an appropriate marking key and a reasonable description of mark categories with a mark criterion.
Therefore, before test administration, a marking key and a mark criterion for mark categories are first needed from test designers, and then a considerable amount of time must be spent on discussion to reach agreement on the way to mark each question/task. Alderson, Clapham & Wall (1995, p.112) maintain, ‘although this is likely to be expensive, it is the safest way of ensuring that enough discussion will take place for all examiners to understand thoroughly the level scale and the procedures for scoring.’ All these things aim at assuring reliability of an achievement speaking test.
Second, Table 4.3 reveals that, during the students’ test performance, interaction hardly existed between the assessors and the test takers or students apart from 2 students out of 10. These two students were asked 1 or 2 questions. Moreover, the duration of these 10 students’ test performance varies 2 minutes on average. As discussed in 2.1- Chapter 2, spoken language has two functions, interactional and transactional, which are both necessarily incorporated into a speaking test. In fact, in most of the oral tests in use at TNU, namely the achievement test mentioned above, the students are expected to merely produce transactional instances of the language. Can such tests be considered to be able to measure test takers’ or students’ overall oral proficiency? The answer is surely no because they reveals no interactive communication. This also means that the assessors gave scores just on the students’ presentation, which also surely indicates a lack of validity and reliability (See 2.5, Chapter 2).
Last but not least, as regards a supportive testing environment the oral tests were almost administered in noisy rooms. Students should be put at ease before and during their performance, which can increase their confidence. Bachman & Palmer (1996), hence, demonstrate that it is crucial to maintain a supportive environment throughout the test, that is to avoid distractions due to temperature, noise, excessive movement, etc. In order to do this, test administrators and assessors are to be in control of techniques and create an atmosphere which will help each student to feel at ease (Alderson, Clapham & Wall (1995, p.116). For those students waiting for their turn should be sitting in a comfortable room, not standing along the corridor and talking so as not to affect the others’ performance.
The last factor under evaluation involves ways of how test results or students’ final scores in test performance are used. As previously described in 4.1, students’ oral test scores are used to grade them in terms of their progress or achievement after a term/course of study. This is the most popular and common purpose of all achievement tests, that is to make the final decision on students’ proficiency kept in their study record in the form of grades. Furthermore, teachers and students are really interested in receiving feedback on students’ progress which helps students ‘guide their own subsequent learning’, and helps teachers ‘modify their teaching methods and materials so as to make them more appropriate for their students’ needs, interests and capabilities’ (Bachman & Palmer, 1996, p. 98). However, TNU students’ test scores have never been used to either evaluate the effectiveness of instructional programs or make any improvement in teachers’ teaching methods and materials. In other words, oral testing at this institution has no effect on the teaching and learning of speaking skill which is named negative washback or backwash by Hughes (1989), Heaton (1988) and McNamara (2000). Information regarding inferences about students’ proficiency made from test performance can be really useful for assessing the efficiency of a teaching program as well as teachers (Bachman & Palmer, 1996, p. 98).
In conclusion, the analysis of TNU current practices of developing oral language tests reveals a number of weaknesses as follows:
There is no principled basis for oral test operationalization and administration
Oral tests in use lack construct validity and reliability
There is lack of consideration of communicative stress in oral test operationalization
There is lack of test administration standardisation
There is lack of a supportive test taking environment
These current practices are thus far from being consistent with the theoretical framework for test development.