Validity and Reliability are undoubtedly the two pillars of assessment theory. As an adjunct professor of Assessment, I have had to define these concepts and explain their significance to teacher candidates in all content areas. Often, they confuse the two, so I had to develop a few memorable examples that would stick in their minds, long after they earned their teaching license.
What Does It Mean For a Test to be Valid?(テストが有効であるとはどういうことか?)
Does the test measure what it purports to measure? For example, if you suspect you have the flu, would you take a pregnancy test? No, because it does not purport to measure whether you have the flu. To measure if you have the flu, you would need to take a test that was developed to determine whether you have the flu.
If you want a proficiency test, then be sure to select a test that has been proven by research to be a valid test of proficiency. If you want to test achievement, there are many assessment tools that do that. So be sure you know what you really want to test – then select the tool that has been demonstrated as valid for that purpose.
What Does It Mean For an Assessment to be Reliable?(アセスメントが信頼できるとはどういうことか?)
Does the test provide consistent results? A test that is reliable produces stable, dependable results over time and across different conditions. You should be able to test the same individuals repeatedly with similar results. For example, a student will not go to bed one night, functioning at Novice Mid on the ACTFL Proficiency scale, and wake up the next morning and magically function at Advanced High. The words rely and reliable are related. Remember that you can rely on a Reliable test to provide consistent results.
そのテストは一貫した結果を提供していますか?信頼性の高いテストは、時間が経過しても、異なる条件下でも、安定した信頼できる結果を示します。同じ人を繰り返しテストして、同じような結果を得ることができるはずです。例えば、ACTFLの習熟度評価でNovice Mid(初中級)の生徒が、一晩寝て翌朝起きるとAdvanced High(上級)になっているというようなことはありえません。信頼できるテストを受ければ、信頼できる一貫した結果が得られることを覚えておいてください。
Why Does It Matter?(なぜそれが重要なのか?)
For one thing, it matters to test takers. They want to know that their score really means something to them and to admissions officers, educators, and prospective employers. If you cannot rely on something to test what it says it tests and provide consistent results, then why test at all?
Imagine you are on a diet. You weigh yourself every morning on the bathroom scale. Every day you get a different number that does not seem to correlate with what you ate yesterday or how your pants fit. Now you go for your annual checkup at the doctor’s office and when the nurse weighs you, the number is different again! The scale in the doctor’s office has been calibrated by professionals who use it day after day to weigh many different people. That scale is valid and reliable, so you can be sure that the number on the doctor’s office scale is the amount you really weigh!
Prove It!(証明せよ!)
So now, with my funny examples, you care that your assessment tool is valid and reliable, but how can you really know that? Check if the tool has been thoroughly researched by academics who determined it to be both valid and reliable, and then published their findings in a peer-reviewed journal. If not, then the tool has not been confirmed to be valid and reliable. Just saying something is valid and reliable does not demonstrate that.
Be wary of” research” that is not conducted by external researchers that are not affiliated with an assessment company but rather internally with a small sample size. Such research does not prove validity or reliability. What it shows is that there has been NO external validation of the test’s validity and reliability. If there was, it would be publicized!
Establishing AAPPL’s Validity and Reliability(AAPPLの妥当性と信頼性の確立)
The research into AAPPL’s validity and reliability is well-known. AAPPL’s original design and test framework were based on the 2006 ACTFL Assessment of Uses and Needs, a survey of over 1,600 world language instructors and administrators regarding the assessments they used and the kinds of assessments they needed. Based on rigorous piloting and field testing and follow-on studies conducted for nearly a decade, the AAPPL represents effective practices in world language assessment.
AAPPLの妥当性と信頼性に関する研究はよく知られています。 AAPPLのオリジナルの設計とテストの枠組みは、1,600人以上の世界言語インストラクターと管理者を対象に実施された、2006年の「ACTFL Assessment of Uses and Needs」に基づいています。AAPPLは、10年近くにわたって実施された厳密な試験と実地テスト、および追跡調査に基づいており、世界言語アセスメントにおける効果的な実践を示すものです。
Analyses of 9,000 test takers demonstrate that the AAPPL can reliably differentiate examinee results according to different levels as described by the AAPPL performance scores. In addition, item difficulty parameters reflect the targeted proficiency levels.
9,000人の受験者を分析した結果、AAPPLは、パフォーマンススコアで示されるように、異なるレベルに応じて受験者の結果を確実に区別できることが実証さ れました。さらに、項目の難易度パラメータは、目標とする習熟度レベルを反映しています。
Cox and Malone (2018) further document AAPPL rater reliability and articulate a validity argument using evidence from over 10,000 test results. For a more detailed discussion of AAPPL validity and reliability, refer to:
Cox and Malone(2018)は、さらにAAPPLの評価者の信頼性を文書化し、10,000人以上の試験結果からの証拠を用いて妥当性の議論を明確にしています。AAPPLの妥当性と信頼性のより詳細な議論については、以下を参照してください:
Cox, T.L., & Malone, M.E. (2018). A validity argument to support the ACTFL Assessment of Performance toward proficiency. Foreign Language Annals, 51 (3), 548-574. Retrieved from
Be an Alert Consumer!(注意深い消費者であれ!)
Think critically before you buy language assessments. Ask about external research that confirms claims of reliability and validity. Who did it? Where was it published? Then, read it yourself to be sure.
