Use this glossary to identify many of the terms used by educators when discussing assessment and related education topics.
- Accountability System
- Achievement Descriptors
- Achievement Gap
- Achievement Levels
- Achievement Standards
- Achievement Test
- Adaptive Assessment
- Adequate Yearly Progress (AYP)
- Age Appropriate
- Alternate Assessment
- Academic Assessment
- Assessment Literacy
- Assessment Method
- Authentic Assessment
- Baseline Data
- Benchmark Assessment
- Body of Evidence
- Consequential Relevance
- Criterion-Referenced Test (CRT)
- Curriculum-based Assessments
- Cut Score
- Depth of Knowledge (DOK)
- Diagnostic Assessment
- Formative Assessment
- Gap Analysis
- Grade Level
- High Stakes Testing
- Interim Assessment
- Large-Scale Assessments
- Multiple Measures
- Normative Data
- Norm-referenced Test (NRT)
- Percentile Score
- Performance Assessment
- Performance Descriptors
- Performance Levels
- Performance Standards
- Portfolio Assessment
- Progress Monitoring Tools
- Real-World Application
- Response Requirements
- Screening Assessment
- Self Assessment
- Standard Deviation
- Standard Error of Measurement (SEM)
- Standardized Test
- Standards-based Assessments
- Summative Assessment
- Test Forms
- Test Presentation
- Test Security
- Through-course Assessment
- Universal Design of Assessment
- Universal Screener
Accommodations are changes in the administration of an assessment, such as setting, scheduling, timing, presentation format, response mode, or others, including any combination of these that does not change the construct intended to be measured by the assessment or the meaning of the resulting scores. Accommodations are used for equity, not advantage, and serve to level the playing field. To be appropriate, assessment accommodations must be identified in the student’s Individualized Education Plan (IEP) or Section 504 plan and used regularly during instruction and classroom assessment.
Accountability is the use of assessment results and other data to ensure that schools are moving in desired directions. Common elements include standards, indicators of progress toward meeting those standards, analysis of data, reporting procedures and rewards or sanctions.
Accountability System is a plan that uses assessment results and other data and outlines the goals and expectations for students, teachers, schools, districts, and states to demonstrate the established components or requirements of accountability. An accountability system typically includes rewards for those who exceed the goals and sanctions for those who fail to meet the goals.
Achievement Descriptors are narrative descriptions of performance levels that convey student performance at each achievement level. They further define content standards by connecting them to information that describes how well students are learning the knowledge and skills contained in the content standards. (See also Performance Descriptors.)
Achievement Gap is the difference in performance between two groups on given assessment. This is usually expressed as a comparison of aggregate scores or a comparison of percent of students in each group that reach an achievement level like proficiency.
Achievement Levels are measurements that distinguish an adequate performance from a novice or expert performance. Achievement levels provide a determination of the extent to which a student has met the content standards. (See also Performance Levels.)
Achievement Standards are a system that includes performance levels (e.g., unsatisfactory, proficient, advanced), descriptions of student performance for each level, examples of student work representing the entire range of performance for each level, and cut scores. A system of performance standards operationalizes and further defines content standards by connecting them to information that describes how well students are doing in learning the knowledge and skills contained in the content standards. (See also Performance Standards.)
Achievement Test is a test designed to identify the skills and estimate the knowledge a student has attained at a specific point in time. The results often are used to assist in determining the appropriate level of instruction for the student. Achievement tests are primarily used to determine proficiency/mastery, award credit or certification of some sort.
Adaptive Assessment is used to refer to a computer adaptive assessment (CAT), which is a computer-based assessment that responds to a student’s answers by increasing or decreasing the difficulty of the next question or set of questions based on the student’s response. In some tests, the adjustments are based on the previous question and in others; it is based on cumulative responses. These tests require fewer items than fixed-form multiple-choice tests to get to equally precise results.
Adequate Yearly Progress (AYP) is a provision of the federal No Child Left Behind Act of 2001 (NCLB), which is legislation that requires schools, districts and states to demonstrate, using students’ test scores, that their students are making academic progress based on the percentage of students attaining proficiency on state standards.
Age Appropriate refers to the characteristics of the skills taught, the activities and materials selected, the assessment items used, and the language level employed; each should reflect the chronological age of the student.
Alignment refers to the similarity or match between or among content standards, achievement (performance) standards, curriculum, instruction and assessments in terms of breadth, depth and complexity of knowledge and skill expectations.
Alternate Assessment is an instrument used in gathering information on the standards-based performance and progress of students whose disabilities preclude their valid and reliable participation in general assessments. Alternate assessments measure the performance of a relatively small population of students unable to participate in the general assessment system, even with accommodations, as determined by the IEP team.
Academic Assessment is the process of obtaining information, usually in measurable terms, about knowledge and skills. Examples of these assessments include selected response, constructed response, short answer, extended written response, performance and personal communication.
Assessment Literacy is the knowledge of the basic principles of sound assessment practice—including terminology, development, administration, analysis and standards of quality.
Assessment Method refers to any one of multiple ways (deliberately selected) to collect information about student learning. The assessment method can range from informal, ongoing formative assessment that happens in a classroom regularly throughout the day, to formal summative assessment that follows a strict protocol. Observation is also a method of assessment, as is portfolio review. Assessment method is a broad term that covers many tools and protocols—what unites all academic assessment methods is that they are done intentionally in order to make an inference regarding a student-level or class-level trait.
Authentic Assessment is composed of performance tasks designed to simulate important, real-world challenges; therefore, it may show students what the “doing” of a subject or set of tasks is like. Authentic assessment can also include working on real-world tasks—not simply ones that simulate a task. Building an engine, giving someone a new hair style, and solving a local traffic problem are all examples of authentic assessment involving real-world tasks.
Baseline Data are the initial measures of performance against which future measures will be compared.
Benchmarks are specific statements of knowledge and skills within a content area continuum. These indicate what a student must possess to demonstrate a level of progress toward mastery of a standard.
Benchmark Assessment is given periodically (e.g., at the end of every quarter or as frequently as once per month) throughout a school year to establish baseline achievement data and measure progress toward a standard or set of academic standards and goals. Typically these assessments are formal, and may be computer-scored and administered. They provide teachers with information about which content standards have been mastered and which require additional instruction, identifying students’ strengths and needs. Well-articulated benchmark assessments can also be used to measure student progress over time.
Bias, or test bias, in a statistical context, is a systematic error in a test score. In discussing test fairness, bias is created by not allowing certain groups into the sample, not designing the test to allow all groups to participate equitably, selecting discriminatory material, testing content that has not been taught, etc. Bias usually favors one group of test takers over another, resulting in discrimination. (See also Fairness.)
Body of Evidence constitutes information or data that establish that a student can perform a particular skill or has mastered a specific content standard. The evidence must be either produced by the student or collected by someone who is knowledgeable about the student.
Breadth indicates the comprehensiveness of the content and skills embodied in the standards, curriculum, or assessments.
Consequential Relevance is that the usefulness of the assessment results justifies the investment of time and effort in administering and scoring the assessment, and then understanding and meaningfully applying the information to adjust instruction and better support student learning.
Criterion-Referenced Test (CRT) is one that tests an individual student’s performance on previously identified content, skills, or other criteria. The student’s performance is evaluated in terms of a specific learning objective or content standard and not to the performance of other students (student-to-standard). The student’s performance is generally interpreted in terms of performance level descriptors linked to particular score ranges.
Curriculum is a document that describes what teachers do to convey grade-level knowledge and skills to a student.
Curriculum-based Assessments, often called a Curriculum-based Measure (CBM), is an assessment that is a valid and reliable indicator of students’ generalized performance. Short in duration to allow frequent administration, having multiple forms that are both inexpensive to create and produce, and sensitive to changes in student achievement over time, CBMs have been primarily used in basic skill areas in elementary schools.
Cut Score is a specified point on a score scale. Scores at or above that point are interpreted differently from scores below that point.
Depth of Knowledge (DOK) is a system that can be used to classify items based on their level of complexity and cognitive demand. Dr. Norman L. Webb of the Wisconsin Center for Education Research developed his depth of knowledge categories as one of the criteria for establishing the alignment of assessments to a set of standards. There are four DOK categories: DOK 1 or “Recall,” DOK 2 or “Basic Application,” DOK 3 or “Strategic Thinking,” and DOK 4 or “Extended Thinking.”
Diagnostic Assessment is used to identify academic deficiencies and to determine what instructional path will move the student toward an acceptable level of performance in the specific area tested. Diagnostic assessments usually are administered in advance of instruction.
Disaggregation refers to the collection and reporting of student achievement results by particular subgroups (e.g., students with disabilities, limited-English proficient students) to ascertain the subgroup’s academic progress. Disaggregation makes it possible to compare subgroups or cohorts.
Exemplar refers to scored student work that demonstrates or exhibits the ideal for a particular rubric score point.
Formative Assessment is a planned process wherein both students and teachers continually gather evidence of learning and use it to change and adapt what happens in the classroom minute-to-minute and day-by-day. Used during instruction, this process permits educators and students to collect critical information about student and classroom progress and to uncover opportunities for review, to provide feedback and to suggest adjustments to the teacher’s approach to instruction and the student’s approach to learning.
Fairness is about all students regardless of their individual characteristics having the same chance to show what they understand, know or can do. Nothing about the assessment is systematically unfair to a group of students based on gender, culture, geographical location, linguistic heritage, physical capabilities, etc.
Gap Analysis is an investigation of differences in achievement performance between two or more different groups of students, such as general education students and students with disabilities.
Grade Level is the grade in which a student is enrolled.
High Stakes Testing refers to a test that has important consequences for students, teachers, schools, districts and/or states attached to the results. Consequences may include promotion, graduation, rewards or sanctions.
Interim Assessment may be administered multiple times between instances of summative assessment to measure progress towards meeting the summative expectations (interim benchmark assessment) or to measure growth on a continuum of learning (interim growth measures). These assessments help teachers look for patterns or trends and help identify instructional and resources needs. Interim assessments may be used to identify individual strengths and weakness and are useful for grouping students for instruction based on those strengths or weaknesses. (See also Benchmark Assessment.)
Large-Scale Assessments are tests administered simultaneously to large groups of students within the district or state.
Mean is the arithmetic average of a group of scores. The mean is sensitive to extreme scores when population samples are small.
Median is the middle score in a list of scores; it is the point at which half the scores are above and half the scores are below.
Modifications are changes made to the test itself: reduced number of distracters, fewer items, etc.
Multiple Measures are measurements of student or school performance through more than one form or test. For students, these might include teacher observations, performance assessments or portfolios. For teachers, these might include classroom observation, student performance assessments and peer review. For schools, these might include dropout rates, absenteeism, college attendance rates or documented behavior problems.
Normative Data is a preliminary reference point for educators to compare class or grade-level performance of students in the same grade from a wide variety of nationwide schools.
Norm-referenced Test (NRT) is an assessment wherein student performance is reported as a comparison with the performance of students in a larger “norming” group (student-to-student). Norms frequently are based on a national sample which is selected to represent, proportionally, the diversity of students in the United States. Performance of students in schools and districts is also compared with the performance of students in other schools and districts. Norm-referenced assessments typically are used to sort and compare students rather than measure achievement towards a standard.
Norms reflect the distribution of test scores for a sample of students used to compare individual student performance with performance of students in the norming sample. Typically the distribution will follow a normal curve in which 68 percent of the students fall within plus or minus one standard deviation (σ) of the mean (μ), with progressively fewer students on the tails of the distribution.
Student scores in a norm-referenced test most often are reported to teachers in percentiles (which see); although other normalized scores are used in statistical analysis.
Percentage is used to represent a fraction of the whole. In student assessment it is the number of correct responses divided by the total number of items in the test.
Percentile Score is used in reports from norm-referenced tests to indicate where a student’s score falls in relationship to the proportion of students in the norming sample who had a lower score. A percentile score of 79 indicates that for every 100 students in the norming sample, 79 had a lower score than the student for which this score is being reported.
Performance Assessment measures student performance on complex tasks and projects rather than on short responses to test items. The individual being assessed is required to produce an artifact that demonstrates his or her learning. Both the process of creating the final product and/or the final product may be scored usually using a rubric that defines the attributes of various performance levels. These assessments are thought to be more “authentic” than multiple choice and similar tests.
Performance Descriptors are narrative descriptions of performance levels that convey student performance at each achievement level. They further define content standards by connecting them to information that describes how well students are learning the knowledge and skills contained in the content standards. (See also Achievement Descriptors.)
Performance Levels are measurements that distinguish one performance from others (e.g., an adequate performance from a novice or expert performance). Performance levels provide a determination of the extent to which a student has met the content standards. (See also Achievement Levels.)
Performance Standards are a system that includes performance levels (e.g., unsatisfactory, proficient, advanced), descriptions of student performance for each level, examples of student work representing the entire range of performance for each level, and cut scores. A system of performance standards operationalizes and further defines content standards by connecting them to information that describes how well students are doing in learning the knowledge and skills contained in the content standards. (See also Achievement Standards.)
Portfolio is a deliberate collection of student-generated or student-focused evidence that provides the basis for demonstrating the student’s mastery of a range of skills, performance level or improvement in skills over time. The portfolio evidence may include student work samples, photographs, videotapes, interviews, anecdotal records, interviews and observations.
Portfolio Assessment is an organized collection or documentation of student-generated or student-focused work that typically depicts the range of individual student skills.
Progress Monitoring Tools are generally short assessment instruments that are either General Outcome Measures (GOM) or Mastery Measures. GOMs are more typical and are used to reflect overall competence in the curriculum. GOMs are designed to measure increasing competence over short time intervals. Fluency measures are examples of GOMs with increasing fluency as the measure of progress. Mastery Measures indicate a student’s successive mastery of sequenced skills within a subject area. The increasing number of skills mastered is the measure of progress.
Real-World Application refers to the opportunity for a student to exhibit a behavior or complete a task that he or she would normally be expected to perform outside of the school environment.
Reliability is concerned with making sure that different test forms in a single administration are equivalent; that retests of a given test are equivalent to the original test; and that test difficulty remains constant year to year, administration to administration.
Response Requirements are the types, kinds or methods of action required of a student to answer a question or testing item. The response may include, but not be limited to, reading, writing, speaking, creating and drawing.
Screening Assessment typically is used to identify readiness to enter specific programs and to identify deficits or risks that need to be addressed for the individual to be successful in the program. Screening may apply to academics, health, socialization, and other attributes required for the student to be successful in school. This is often used in relation to Response to Intervention (RtI) programs.
Self Assessment is the process in which students review their own work to identify strengths and needs for the purpose of improving their performance.
Standard Deviation is statistic expressing the homogeneity/heterogeneity of instructional level within a group of students. The larger the standard deviation, the more academically diverse the group.
Standard Error of Measurement (SEM) is an estimate of the achievement of a test score. The smaller the standard error, the more precise the achievement estimate is. Measuring any human attribute, whether physical or mental, produces an estimate with associated error. In the case of measuring academic achievement and growth, measurement error is defined in terms of a statistical range around the observed score. That is, SEM represents a probability that the actual score for the student lies within a given range, with a 68% probability that a student’s true score is within ±1 SEM of the observed score, and a 95% probability that it is within ±2 SEM of the observed score.
Standardized Test is an assessment given in a consistent way to all students, or any test that uses uniform procedures for administration and scoring. Frequently, these are mass-produced, machine-scored tests.
Standards come in two types—content and achievement (performance). Content Standards are statements of the subject-specific knowledge and skills that schools are expected to teach students and indicate what students should know and be able to do. Achievement (Performance) Standards are indices of qualities that specify how adept or competent a student demonstration must be. They consist of four components: 1) levels that provide descriptive labels or narratives for student performance (e.g., advanced, proficient); 2) descriptions of what students at each particular level must demonstrate relative to the task; 3) examples of student work at each level that illustrate the range of performance within each level; and 4) cut scores that clearly separate each performance level from others.
Standards-based Assessments are assessments constructed to measure how well students have mastered specific content standards or skills.
Subgroup refers to a well-defined group of students. For example, NCLB identifies the following specific subgroups that must achieve adequate yearly progress: race/ethnicity groups, students with disabilities, limited-English proficient (LEP) students and economically disadvantaged students.
Summative Assessment is a culminating assessment (or experience), which measures and reports whether a student has learned a prescribed set of content. For example, state summative assessments provide proof of progress towards standards and report this in terms of performance levels descriptions that usually include grade-level proficiency among other levels of performance. End-of-year and end-of-course exams are also summative.
Test is a measuring device or procedure. Educational tests are typically composed of questions or tasks designed to elicit predetermined behavioral responses or to measure specific academic content standards.
Test Forms are parallel or alternate versions of a test that are considered interchangeable; that is, they measure the same constructs, are intended for the same purposes, and are administered using the same directions.
Test Presentation is the method, manner or structure by which test items or assessments are administered to the student.
Test Security refers to procedures established to ensure current or future confidentiality, fidelity and integrity of a test. Public access is limited and strictly monitored, with clearly outlined consequences for breaches in test security.
Through-course Assessment is based on either on a defined subset of the standards or all standards for a subject. One purpose of a through-course assessment is to move the summative assessment closer to the time the material was instructed. Hence a through-course assessment may assess 25% of the standards three times a year and all the standards at the end. The most prevalent example of a through-course assessment is a mid-year exam or even a unit test. If administered 4 to 5 times throughout the year based on all the standards, the resulting scores should show progress towards the standards (25%, 50%, 75%, 90%, 100%). (See also Benchmark Assessment.)
Universal Design of Assessment is a method for developing an assessment to ensure accessibility by all students, regardless of ability or disability. Universal design is based on principles that originated in the field of architecture to consider user diversity during the conceptual stage of development.
Universal Screener is used to identify, in a total population of students, those students who may not have mastered a baseline of age-appropriate skills. A universal screener measures readiness and sometimes diagnoses areas of weakness that might require special interventions beyond standard classroom instruction. These are sometimes used to monitor a student’s progress over time.
Validity is the extent to which a test measures what it was designed to measure. Multiple types of validity exist. Common types of validity include: 1) Construct Validity, which refers to the extent to which the characteristic to be measured relates to test scores that measure the behavior in situations where the construct is thought to be an important variable; 2) Content Validity, which refers to the extent to which the stimulus materials or situations that compose the test call for a range of responses that represent the entire domain of skills, understandings, or behaviors that the test is intended to measure: 3) Convergent Validity refers to the extent to which the assessment results positively correlate with the results of other measures designed to assess the same or similar constructs; 4) Criterion-Related Validity refers to the extent to which test scores of a group or subgroup are compared to other criterion measures (ratings, classifications, other tests) assigned to the examinees; 5) Face Validity is a concept based on a judgment concerning how relevant the test items appear to be. It relates more to what a test appears to measure than to what the test actually measures; and 6) Consequential Validity is the extent to which the assessment results in the intended positive outcomes for students, e.g., results in improved instruction and improved student achievement.
These definitions are a mix of definitions from NWEA and those adapted from the Glossary of Assessment Terms and Acronyms Used in Assessing Special Education Students: A Report from the Assessing Special Education Students (ASES) State Collaborative on Assessment and Student Standards (SCASS) (Council of Chief State School Officers, 2003) and Assessment Literacy in a Standards-Based Urban Education Setting (Webb, Wisconsin Center for Education Research, 2002).