Before a test can be normed/standardized, it is important to conduct a pilot with a number of people from the target group (Mann & Haug, 2014, p. 133). For example, before using the previously mentioned (from SLPI adapted) NFA test (interview) which targets people learning NGT in a higher education institution, the test developers piloted their first version of the test with all the signed language teachers in the team. The goal of such a pilot is to see if the test instructions, test items, procedure, rating procedure etc. work as planned. For example, when raters on a productive test have a huge difference in how they judge the signed language competences of a L2 learner, the rating scale/procedure needs to be revised. Or when students who already have already graduated from a four years BA programme perform worse on the same receptive test than first year students, one needs to investigate the reason and revise the test.
After revising a test, test developers can start with the study proper. During the study we also need to establish psychometric properties, namely the validity and reliability of a test (see section 1.2).
Below is a schema of the procedures used in the example of the NFA (Figure 4) that were followed to train the signed language teachers in working with the NFA. Because the NFA was adapted from the SLPI, the adaptation process is included here too.
Figure 4 Adaptation and training schema for NFA in ISL&D (from Boers-Visker et al., 2014)
Once the documents had been translated from English into Dutch, the team was trained in two sessions of one week, over a one-year period. In between training sessions with the SLPI expert from the USA, the teachers practised interviewing and scoring with each other monthly; teachers who were L2 learners of NGT were used as candidates. In teams of two signed language teachers, all interviews were scored and discussed with the whole group to establish and validate the scoring norms. For each interview the inter-rater reliability scores were calculated and discussed (via email) with the SLPI expert from NTID/RIT, USA. During the training sessions, the documents were fine-tuned to the NGT grammatical features, and terminology was unified in all documents, and discussed again during training sessions. The whole process took approximately 2 years. When the team was confident that the interviewing process was acquired satisfactorily by all teachers, and the scores were reliable, the NFA was offered to learners in a pilot study. This was done with parents learning NGT in an extra-curricular course. Currently the data are being analysed.
Conclusion
The goal of this assessment framework was to provide guidance and concrete steps to signed language teachers and test developers developing signed language assessments which can be applied to the CEFR. Wherever possible we provided hands-on materials for the reader by providing concrete examples of signed language tests used in a CEFR-aligned curriculum. We encourage all colleagues to share their materials and experience with others so that we can move forward with CEFR-aligned curricula across Europe.
The first step in the testing cycle is to identify why a test needs to be developed. It may be to evaluate the knowledge, understanding, ability and/or skills of the learners (efsli 2013, p. 12) or be needed to evaluate changes in the curriculum (Mann & Haug, 2014, p.131). For example, students enrolled in a signed language interpreters’ programme are required by programme regulations to take a course and program final exam in signed language before completing a course/ graduating (Leeson, Wurm, & Vermeerbergen, 2011).
To assess the feasibility of test development (see 1.2) it is necessary to identify possible constraints: how much time is there to develop the test, how many test takers are to be evaluated, what is the available budget and what technology is available? For instance, for Sign Language of the Netherlands (NGT) vocabulary assessment, test-software was developed at the Institute for Sign, Language & Deaf Studies at the University of Applied Sciences Utrecht (UUAS in the Netherlands, called Provisto (see Figure 2)). Constraints were: the availability of technology and its usability, available time to film NGT vocabulary and deadlines, having time available to do pilot-tests to determine validity and reliability (see 1.2).
Score reporting concerns how the test taker will learn about his or her test results (Mann & Haug, 2014). Students graduating from a BA interpreting programme will learn if they have passed or failed their final exam in signed language by mail in a fully Web-based implemented signed language testing system (e.g., Haug et al., 2014) or they will receive written notification by regular mail from their university.
Another important issue is the interpretation of test results which is used to inform decisions related to placement in a signed language class or to obtain a qualification for a job.
Determining the appropriate rating procedure depends on the content and the testing method (Haug, 2011). There will be a difference in the appropriate rating procedure for the assessment of between signed language production or reception. For example, for a receptive test that uses a multiple-choice format (see Example 1 for Signed Language Reception above), the rating will be automatic when implemented into Web-based signed language tests, e.g., even when for L1 testing a receptive test for DGS (Haug et al., 2014).
A rating procedure for signed language production is more difficult to achieve. For instance in an interview, like the adapted NGT Functional Analysis (NFA) version of the Sign Language Proficiency Interview (SLPI) which was developed for ASL, where a tester is engaged in a conversation with a test taker. The conversation is video-recorded and will be analysed later by three raters according to pre-defined criteria on a scoring-sheet, such as The individual rater Worksheet B (See for other ASL SLPI rating sheets here). The quality of any rating procedure will be evaluated during the pilot and main study (see also 2.8).
A schema (Figure 3) can illustrate the rating procedure for the NFA:
Figure 3: Schematic presentation of NGT Functional Assessment procedures (van den Broek, Boers-Visker, van den Bogaerde, 2014)
Above we provided an example of interaction test for level NGT A2 (see Example 3, in 2.4). Here we provide a template for an assessment sheet, which was developed for this test. This example of an assessment sheet was developed for and used in a workshop organized for teachers and other professionals (26 November 2013), by the ATERK team that works in the Netherlands to align NGT teaching and assessment to the CEFR for signed languages.
Download Assessment Interaction test, level A2
In order to develop a test, it is necessary to provide procedures (or a “blueprint”) which describe what needs to be done (Mann & Haug, 2014, p. 133) in what order. The different steps follow from the purpose, the design, the content of the test as well as the test method, and form the framework for the test proper. Other issues are the familiarity of the test to the test administrators, the location etc., as well as establishing procedures for test archives, and the establishing of the validity and reliability of the test.
A concrete example is to create a Word document which describes every step and “sub-steps” that need to be done to develop the test. For example the following information can be included into the test specifications: (1) people who will be involved in the process, (2) development of test items, (3) materials be used depending on the target group, (4) description of the test procedure, (5) environmental factors such as test administrator, test site, time of the day, (6) psychometric properties, (7) process of test development and (8) pilot and main study (from Haug, 2012), (9) milestones that need to be achieved.
Once we have defined the purpose of language assessment, the content needs to be defined. In the CEFR the content of assessment is “predetermined” within the domains of:
Below follow some examples for the three domains of test items, taken from CEFR-aligned curricula from different signed languages.
Download examples
The purpose of the test determines its type and form and is related to the validity of the test (see 1.2). The purpose should be clearly defined, and the testing method should be appropriate to the purpose. Is it an achievement test (see Test specification below or see 1.1) or a proficiency test (efsli, 2013, p. 12/13 or see 1.1) and does it concern formative or summative assessment?
PRO-Sign self assessment example