Designing multiple choice test items

- 0 / 0
(Tài liệu chưa được thẩm định)
Nguồn: Nguyễn Thùy Dương
Người gửi: Gã Đầu Bạc
Ngày gửi: 09h:44' 23-09-2014
Dung lượng: 1.2 MB
Số lượt tải: 9
Nguồn: Nguyễn Thùy Dương
Người gửi: Gã Đầu Bạc
Ngày gửi: 09h:44' 23-09-2014
Dung lượng: 1.2 MB
Số lượt tải: 9
Số lượt thích:
0 người
What to be covered
Multiple choice items
Alternatives in assessment
Quiz
You know not to run with scissors. Did you also know that scissors are linked to many different superstitions? Show how sharp you are by selecting the one FALSE superstition from the statements below.
A. Using scissors on New Year`s Day will double your fortune.
B. Placing scissors under a patient`s pillow will cut their pain.
C. Nailing a pair of scissors in the open position above a door will protect the house from witchcraft.
D. Dropping a pair of scissors is a warning that a lover is unfaithful.
Quiz answer: A
Information for this Quiz comes from commercial web site Uncommon Scissors whose Scissors History and Superstitions explains that scissors may have been in use in Egypt "as far back as 1500 BC." They also note that, centuries later, "as calligraphy spread throughout the Islamic countries, concave blades were developed to cut paper. Scissors became a part of everyone`s life and not just for the use of guilds and the wealthy." Uncommon Scissors also shares many old superstitions about scissors, including how they could cut a patient`s pain, protect a house from witchcraft, or signal a lover`s infidelity. The one false statement above is the first one; the actual superstition maintains that using scissors on New Year`s Day will "cut off fortune."
Designing multiple choice test items
Week 8
Multiple choice
MC items are all receptive, or selective: test-takers choose from a set of responses rather than creating a response.
(Other receptive item types include true-false questions and matching lists)
Every MC item has a stem, which presents several options (usually between three and five) or alternatives to choose from.
One of those options, the key, is the correct response, while the others serve as distractors.
Multiple choice items
A preferred mode for large-sale tests, co’z MC items provide an “objective’ means for determining correct or incorrect responses
Scoring procedures are streamlined (for either scannable computerized scoring or hand-scoring with a hole-punched grid) for fast turnaround time.
Weaknesses in multiple-choice items
The technique tests only recognition knowledge.
Guessing may have a considerable effect on test scores.
The technique severely restricts what can be tested.
It is very difficult to write successful items.
Washback may be harmful.
Cheating may be facilitated.
However, the two principles that stand out in support of MC are practicality and reliability
Guidelines for designing MC items
1. Design each item to measure a specific objective.
2. State both stem and options as simply and directly as possible.
. Do not use superfluous words,
rule of succinctness is to remove needless redundancy from your options
Guidelines (cont.)
3. Make certain that the intended answer is clearly the only correct one. Eliminating unintended possible answers is often the most difficult problem of designing MC items. With only a minimum of context in each stem, a wide variety of responses may be perceived as correct.
A CBT is a method of administering tests in which responses are … recorded, assessed, or both.
Electrically
Appropriately
Personally
Carefully
Guidelines (cont.)
4. Use item indices to accept, discard, or revise items
The appropriate selection and arrangement of suitable MC items on a test can best be accomplished by measuring items against three indices:
Item facility (IF) (or Item difficulty)
Item discrimination (ID)/Item differentiation
Distractor efficiency
Item facility (IF)
is the extent to which an item is easy or difficult for the proposed group of test-takers.
A too easy item (e.g., 99% of the test-takers get it right)
A too difficult item (99% get it wrong)
→ Does nothing to separate high-ability and low-ability test-takers
Item facility (IF)
The formula looks like this
Item Facility (IF)
There is no absolute IF value that must be met
The appropriate test items will have IFs that range between .15 and .85
Item Discrimination (ID)
is the extent to which an item differentiates between high- and low-ability test-takers.
An item on which high-ability students and low-ability students score equally well would have poor ID because it did not discriminate between the two groups.
An item that garners correct responses from most of the high-ability group and incorrect responses from most of the low-ability group has good discrimination power.
Item Discrimination (ID)
Item Discrimination (ID)
ID: 7-2=5/ 10= 0,50 → The result tells us that us that the item has a moderate level of ID.
High discriminating level would approach 1.0 and no discriminating power at all would be zero.
In most cases, you would want to discard an item that scored near zero.
As with IF, no absolute rule governs the establishment of acceptable and unacceptable ID indices.
Distractor Efficiency
is the extent to which
the distractors “lure” a sufficient number of test-takers, especially lower-ability ones, and
those responses are somewhat evenly distributed across all distractors.
Note: C is the correct response
DE (cont.)
The item might be improved in two ways:
a) Distractor D doesn’t fool anyone. Therefore it probably has no utility. A revision might provide a distractor that actually attracts a response or two.
b) Distractor E attracts more responses (2) from the high-ability group than the low-ability group (0). Why are good students choosing this one? Perhaps it includes a subtle reference that entices the high group but is “over the head” of the low group, and therefore the latter students don’t even consider it.
The other two distractor (A and B) seem to be fulfilling their function of attracting some attention from the lower-ability students.
Alternatives in assessment
Portfolios
Journals
Conferences and interviews
Observations
Self- and peer-assessments
Portfolios
Is “a purposeful collection of students’ work that demonstrates … their efforts, progress, and achievements in given areas” (Genesee & Upshur, 1996, p. 99)
Portfolios (cont.)
Portfolios include materials such as
Essays and compositions in draft and final forms;
Reports, project outlines;
Poetry and creative prose;
Artwork, photos, newspapers or magazine clippings;
Audio and/or video recordings of presentations, demonstrations, etc.
Journals, diaries, and other personal reflections;
Tests, test scores, and written homework exercises;
Notes on lectures; and
Self- and peer-assessments – comments, evaluations, and checklists
Attributes of Portfolios
Collecting
Reflecting
Assessing
Documenting
Linking
Evaluating
Benefits of Portfolios
Foster intrinsic motivation, responsibility and ownership,
Promote student-teacher interaction with the teacher as facilitator,
Individualize learning and celebrate the uniqueness of each student,
Provide tangible evidence of a student’s work,
Facilitate critical thinking, self-assessment, and revision processes,
Offer opportunities for collaborative work with peers,
Permit assessment of multiple dimension of language learning.
Guidelines on portfolios development
State objectives clearly,
Give guidelines on what materials to include,
Communicate assessment criteria to students,
Designate time within the curriculum for portfolio development,
Establish periodic schedules for review and conferencing,
Designate an accessible place to keep portfolio,
Provide positive washback-giving final assessment s.
Journals
Is a log/account of one’s thoughts, feelings, reactions, assessments, ideas or progress toward goals, usually written with little attention to structure, form or correctness
Classroom-orientated journals now known as dialogue journals: i.e.; interaction between a reader (teacher) and the student through dialogues or responses → Ts is acquainted with their Ss in terms of learning progress and affective states
Sample (Handout #1)
Guidelines on journals development
Sensitively introduce Ss to the concept of journal writing,
State objectives of the journal,
Give guidelines on what kinds of topics to include,
Carefully specify the criteria for assessing or grading journals: e.g., effort exhibited in the thoroughness of Ss’ entries, processing of course content
Provide optimal feedback in your responses
Designate appropriate time frames and schedules for review
Provide formative, washback-giving final comments
Conferences and interviews
One-on-one interaction between T and S
Commenting on drafts of essays and reports
Reviewing portfolios
Responding to journals
Advising on a student’s plan for an oral presentation
Assessing a proposal for a project
Giving feedback on the results of performance on a test
Clarifying understanding of a reading
Exploring strategies-based options for enhancement or compensation
Focusing on aspects of oral production
Checking a student’s self-assessment of a performance
Setting personal goals for the near future
Assessing general progress in a course
Conferences and interviews (cont.)
T plays the role of a facilitator, and guide not of an administrator, of a formal assessment
T is an ally who is encouraging self-reflection and improvement.
Conferences are formative, not summative, their primary purpose is to offer positive washback
Observations
A systematic, planned procedure for real time, almost surreptitious recording of S’s verbal and nonverbal behaviour
To assess Ss without their awareness (and possible consequent anxiety) so that the naturalness of their linguistic performance is maximized
Guidelines on Classroom Observations
Determine the specific objectives of the observation
Decide how many Ss will be observed at one time
Set up the logistics for making unnoticed observations
Design a system for recording observed performances: anecdote records, checklists, rating scale
Do not overestimate the number of different elements you can observe at one time – keep them very limited
Plan how many observations you will make
Determine specifically how you will use the results
Sample checklist
Types Self- and peer-assessments
Assessment of (a specific) performance
Indirect assessment of (general) competence
Metacognitive assessment
more strategic in nature
setting goals and maintaining an eye on the process of their pursuit)
4. Socioaffective assessment
5. Student-generated tests
Indirect assessment
Metacognitive assessment
Socioaffective assessment
Handout #2
Student-generated tests
Ss in small groups are directed to e.g.; create content questions on their reading passage; generate their own lists of words, grammatical concepts, and content that they think are important over the course of a unit.
Student-generated tests would be productive, intrinsically motivating, autonomy-building process
Guidelines for self- and peer assessment
Tell Ss the purpose of the assessment
Define the task(s) clearly
Encourage impartial evaluation of performance or ability
Ensure beneficial washback through follow-up tasks
Conclusion:
Principled evaluation of alternatives to assessment
Van-Trao, Nguyen PhD - Hanoi University
Multiple choice items
Alternatives in assessment
Quiz
You know not to run with scissors. Did you also know that scissors are linked to many different superstitions? Show how sharp you are by selecting the one FALSE superstition from the statements below.
A. Using scissors on New Year`s Day will double your fortune.
B. Placing scissors under a patient`s pillow will cut their pain.
C. Nailing a pair of scissors in the open position above a door will protect the house from witchcraft.
D. Dropping a pair of scissors is a warning that a lover is unfaithful.
Quiz answer: A
Information for this Quiz comes from commercial web site Uncommon Scissors whose Scissors History and Superstitions explains that scissors may have been in use in Egypt "as far back as 1500 BC." They also note that, centuries later, "as calligraphy spread throughout the Islamic countries, concave blades were developed to cut paper. Scissors became a part of everyone`s life and not just for the use of guilds and the wealthy." Uncommon Scissors also shares many old superstitions about scissors, including how they could cut a patient`s pain, protect a house from witchcraft, or signal a lover`s infidelity. The one false statement above is the first one; the actual superstition maintains that using scissors on New Year`s Day will "cut off fortune."
Designing multiple choice test items
Week 8
Multiple choice
MC items are all receptive, or selective: test-takers choose from a set of responses rather than creating a response.
(Other receptive item types include true-false questions and matching lists)
Every MC item has a stem, which presents several options (usually between three and five) or alternatives to choose from.
One of those options, the key, is the correct response, while the others serve as distractors.
Multiple choice items
A preferred mode for large-sale tests, co’z MC items provide an “objective’ means for determining correct or incorrect responses
Scoring procedures are streamlined (for either scannable computerized scoring or hand-scoring with a hole-punched grid) for fast turnaround time.
Weaknesses in multiple-choice items
The technique tests only recognition knowledge.
Guessing may have a considerable effect on test scores.
The technique severely restricts what can be tested.
It is very difficult to write successful items.
Washback may be harmful.
Cheating may be facilitated.
However, the two principles that stand out in support of MC are practicality and reliability
Guidelines for designing MC items
1. Design each item to measure a specific objective.
2. State both stem and options as simply and directly as possible.
. Do not use superfluous words,
rule of succinctness is to remove needless redundancy from your options
Guidelines (cont.)
3. Make certain that the intended answer is clearly the only correct one. Eliminating unintended possible answers is often the most difficult problem of designing MC items. With only a minimum of context in each stem, a wide variety of responses may be perceived as correct.
A CBT is a method of administering tests in which responses are … recorded, assessed, or both.
Electrically
Appropriately
Personally
Carefully
Guidelines (cont.)
4. Use item indices to accept, discard, or revise items
The appropriate selection and arrangement of suitable MC items on a test can best be accomplished by measuring items against three indices:
Item facility (IF) (or Item difficulty)
Item discrimination (ID)/Item differentiation
Distractor efficiency
Item facility (IF)
is the extent to which an item is easy or difficult for the proposed group of test-takers.
A too easy item (e.g., 99% of the test-takers get it right)
A too difficult item (99% get it wrong)
→ Does nothing to separate high-ability and low-ability test-takers
Item facility (IF)
The formula looks like this
Item Facility (IF)
There is no absolute IF value that must be met
The appropriate test items will have IFs that range between .15 and .85
Item Discrimination (ID)
is the extent to which an item differentiates between high- and low-ability test-takers.
An item on which high-ability students and low-ability students score equally well would have poor ID because it did not discriminate between the two groups.
An item that garners correct responses from most of the high-ability group and incorrect responses from most of the low-ability group has good discrimination power.
Item Discrimination (ID)
Item Discrimination (ID)
ID: 7-2=5/ 10= 0,50 → The result tells us that us that the item has a moderate level of ID.
High discriminating level would approach 1.0 and no discriminating power at all would be zero.
In most cases, you would want to discard an item that scored near zero.
As with IF, no absolute rule governs the establishment of acceptable and unacceptable ID indices.
Distractor Efficiency
is the extent to which
the distractors “lure” a sufficient number of test-takers, especially lower-ability ones, and
those responses are somewhat evenly distributed across all distractors.
Note: C is the correct response
DE (cont.)
The item might be improved in two ways:
a) Distractor D doesn’t fool anyone. Therefore it probably has no utility. A revision might provide a distractor that actually attracts a response or two.
b) Distractor E attracts more responses (2) from the high-ability group than the low-ability group (0). Why are good students choosing this one? Perhaps it includes a subtle reference that entices the high group but is “over the head” of the low group, and therefore the latter students don’t even consider it.
The other two distractor (A and B) seem to be fulfilling their function of attracting some attention from the lower-ability students.
Alternatives in assessment
Portfolios
Journals
Conferences and interviews
Observations
Self- and peer-assessments
Portfolios
Is “a purposeful collection of students’ work that demonstrates … their efforts, progress, and achievements in given areas” (Genesee & Upshur, 1996, p. 99)
Portfolios (cont.)
Portfolios include materials such as
Essays and compositions in draft and final forms;
Reports, project outlines;
Poetry and creative prose;
Artwork, photos, newspapers or magazine clippings;
Audio and/or video recordings of presentations, demonstrations, etc.
Journals, diaries, and other personal reflections;
Tests, test scores, and written homework exercises;
Notes on lectures; and
Self- and peer-assessments – comments, evaluations, and checklists
Attributes of Portfolios
Collecting
Reflecting
Assessing
Documenting
Linking
Evaluating
Benefits of Portfolios
Foster intrinsic motivation, responsibility and ownership,
Promote student-teacher interaction with the teacher as facilitator,
Individualize learning and celebrate the uniqueness of each student,
Provide tangible evidence of a student’s work,
Facilitate critical thinking, self-assessment, and revision processes,
Offer opportunities for collaborative work with peers,
Permit assessment of multiple dimension of language learning.
Guidelines on portfolios development
State objectives clearly,
Give guidelines on what materials to include,
Communicate assessment criteria to students,
Designate time within the curriculum for portfolio development,
Establish periodic schedules for review and conferencing,
Designate an accessible place to keep portfolio,
Provide positive washback-giving final assessment s.
Journals
Is a log/account of one’s thoughts, feelings, reactions, assessments, ideas or progress toward goals, usually written with little attention to structure, form or correctness
Classroom-orientated journals now known as dialogue journals: i.e.; interaction between a reader (teacher) and the student through dialogues or responses → Ts is acquainted with their Ss in terms of learning progress and affective states
Sample (Handout #1)
Guidelines on journals development
Sensitively introduce Ss to the concept of journal writing,
State objectives of the journal,
Give guidelines on what kinds of topics to include,
Carefully specify the criteria for assessing or grading journals: e.g., effort exhibited in the thoroughness of Ss’ entries, processing of course content
Provide optimal feedback in your responses
Designate appropriate time frames and schedules for review
Provide formative, washback-giving final comments
Conferences and interviews
One-on-one interaction between T and S
Commenting on drafts of essays and reports
Reviewing portfolios
Responding to journals
Advising on a student’s plan for an oral presentation
Assessing a proposal for a project
Giving feedback on the results of performance on a test
Clarifying understanding of a reading
Exploring strategies-based options for enhancement or compensation
Focusing on aspects of oral production
Checking a student’s self-assessment of a performance
Setting personal goals for the near future
Assessing general progress in a course
Conferences and interviews (cont.)
T plays the role of a facilitator, and guide not of an administrator, of a formal assessment
T is an ally who is encouraging self-reflection and improvement.
Conferences are formative, not summative, their primary purpose is to offer positive washback
Observations
A systematic, planned procedure for real time, almost surreptitious recording of S’s verbal and nonverbal behaviour
To assess Ss without their awareness (and possible consequent anxiety) so that the naturalness of their linguistic performance is maximized
Guidelines on Classroom Observations
Determine the specific objectives of the observation
Decide how many Ss will be observed at one time
Set up the logistics for making unnoticed observations
Design a system for recording observed performances: anecdote records, checklists, rating scale
Do not overestimate the number of different elements you can observe at one time – keep them very limited
Plan how many observations you will make
Determine specifically how you will use the results
Sample checklist
Types Self- and peer-assessments
Assessment of (a specific) performance
Indirect assessment of (general) competence
Metacognitive assessment
more strategic in nature
setting goals and maintaining an eye on the process of their pursuit)
4. Socioaffective assessment
5. Student-generated tests
Indirect assessment
Metacognitive assessment
Socioaffective assessment
Handout #2
Student-generated tests
Ss in small groups are directed to e.g.; create content questions on their reading passage; generate their own lists of words, grammatical concepts, and content that they think are important over the course of a unit.
Student-generated tests would be productive, intrinsically motivating, autonomy-building process
Guidelines for self- and peer assessment
Tell Ss the purpose of the assessment
Define the task(s) clearly
Encourage impartial evaluation of performance or ability
Ensure beneficial washback through follow-up tasks
Conclusion:
Principled evaluation of alternatives to assessment
Van-Trao, Nguyen PhD - Hanoi University
 







Các ý kiến mới nhất