Wednesday, 5 August 2020

Statistical definitions for career assessment

Wikimedia Creative Commons (2020)
Carrying on the theme of assessment, Osborn & Zunker provide some great definitions of the statistical terms (2016):
  1. Norms: "Norms represent the level of performance obtained by the individuals (normative sample) used in developing score standards. Norms can thus be thought of as typical or normal scores. Norms for some tests and inventories are based on the general population. Other norms are based on specific groups such as all 12th-grade students, 12th-grade students who plan to attend college, left-handed individuals, former drug abusers, former alcoholics, or individuals with physical disabilities" (Osborn & Zunker, 2016, p. 29).
  2. Norm Tables: These "provide specific definitions of normative groups. Such detailed descriptions of persons sampled in standardizing am inventory provide good data for comparing the norm samples with [specific] client groups. [It is useful to have...] score differences between age and ethnic groups and between individuals in different geographical locations. The more descriptive the norms are, the greater their utility and flexibility" (Osborn & Zunker, 2016, p. 29).
  3. National Norms (Country Profile): National population norms "are usually controlled in the sampling process to be balanced in geographical area, ethnicity, educational level, sex, age, and other factors. National norms may be helpful in determining underlying individual characteristics and patterns" (Osborn & Zunker, 2016, p. 30), and for identifying cultural difference. What is normal in the US is not necessarily normal in New Zealand, and less likely to be 'normal' for Maori.
  4. Local Norms: While we are aware of local differences within New Zealand, the idea of local norms is less applicable here. "requirements vary from one location to another, using local norms is recommended". For example, in the US, if a client wanted to know their chances of success at a local tertiary provider, the practitioner could tap into secondary graduate grade data from the client's high school, map it to the tertiary institute grades of previous graduates from the client's secondary school (and therefore where the client ranked), and develop an expectancy table showing where the client would be likely to perform at tertiary. This allows the client an opportunity examine their fit with the tertiary institute's requirements. "Think of this as acclimatisation, or preparation" (Osborn & Zunker, 2016, p. 30).
  5. Score Profiles: These "provide a visual representation of the peaks and valleys in a person's test results" and "help to identify what falls within the 'normal' range, as well as indicators of where the individual scored higher or lower" (Osborn & Zunker, 2016, p. 30). However, we also need to be careful about generalising too much with score profiles: Osborn and Zunker note that we need to be careful when explaining variation; that we are cautious about comparing too closely to norms and not taking enough notice of cultural difference, and that we use ranges rather than specific scores to allow for bias and natural variation.
  6. Bell Curve (Normal Distribution): Normal distribution of a sample or a population is usually shown as a "bell-shaped curve [where] M represents the mean, or midpoint (50th percentile), with 4 standard deviations on each side of the mean" (Osborn & Zunker, 2016, p. 32). "It is useful at times to compare an individual's score to where the majority of the scores lie. For example, + 1 or -1 standard deviation from the mean will capture approximately 68% of the variance" (Osborn & Zunker, 2016, p. 33). Two standard deviations from the mean captures 95% of the variance, and three, 99%.
  7. Percentiles: "The most common definition of a percentile is a number where a certain percentage of scores fall below that number. You might know that you scored 67 out of 90 on a test. But that figure has no real meaning unless you know what percentile you fall into. If you know that your score is in the 90th percentile, that means you scored better than 90% of people who took the test" (Glen, 2020a). "Percentile equivalents are direct and relatively easy to understand, which is a primary reason for their popularity. However, it is important to identify the norm reference group from which the percentile equivalents have been derived", otherwise we may "attach labels to these percentile equivalents" which are inappropriate, such as assuming a high score is an 'A', or that a low score is a 'D' (Osborn & Zunker, 2016, p. 34).
  8. Stanines: Divide the bell curve into 9, with the mean range at 5, and four standard deviations each way, and we get stanines. "A stanine ('standard nine') score is a way to scale scores on a nine-point scale. It can be used to convert any test score to a single-digit score" and "Stanines are also similar to normal distributions. You can think of these scores as a bell curve that has been sliced up into 9 pieces. These pieces are numbered 1 through 9, starting at the left hand section. However, where a standard normal distribution has a mean of 0 and a standard deviation of 1, stanines have a mean of 5 and a standard deviation of 2" (Glen, 2020b). "For example, stanine scores 1, 2, and 3 are considered below average; stanine scores 4, 5, and 6 are considered average; and stanines 7, 8, and 9 are above average" (Osborn & Zunker, 2016, p. 35-36).
  9. Reliability: "Reliability looks at how consistently a test measures the construct under consistently, and the degree to which tests scores are free from error", while acknowledging that our "client's observed score on a test is actually their true score with some error added". To assess reliability we consider reliability coefficients (internal reliability), and test-retest reliability (testing again), both of which can be assessed using Cronbach's alpha (Osborn & Zunker, 2016, p. 37). We may have good internal reliability, but poor test-retest; poor internal, and good test-retest; both good; both poor. A number of test companies get internal reliability scores, where they are comparing their test against their own data.
  10. Validity: "Validity answers the question, 'Does the test measure what it purports to answer?'". There are three types of validity: content validity, where what is being tested is accurate; criterion-related validity, how much the test results are related to a particular outcome; and construct-validity, where the test measures what it says it measures. Tests being evaluated for validity will have a lower coefficient score than reliability, as they are compared to other tests (Osborn & Zunker, 2016, p. 37), and this is what we practitioners must be alert to when relying on these tests. Few companies obtain external validity scores, possibly - note this is just supposition - because they would score poorly.
A very useful list of terms to understand.



No comments :

Post a comment

Thanks for your feedback. The elves will post it shortly.