Yes, once you have anchors in the form of vignettes, you can study the reasons for respondents' different understandings. In the chopit model, the thresholds between the response categories are explained with a set of explanatory variables. We can therefore estimate the effects of these variables on the thresholds. Another way to say this is that the model has multiple systematic components, predicting both the actual values of the concept being measured and the actual thresholds between the response categories, across respondents.
Such a goal is probably not achievable across all domains of inquiry. It is probably not even workable for individual domains in many areas, although it is still important to try. Whether or not universal measurement devices (or universally applicable vignettes) can be invented, we still will often want to compare many aspects of health and other concepts across many different places. Our preference for how to do this in most situations is to get it right in specific contexts, and to build up to more generality when possible by comparing across different small sets of areas in separate... Read more about Are universally applicable, culture-independent survey questions possible?
The basic process of measurement involves comparing an object under study with some standard. Without the standard, we have no (valid or meaningful) measurement. Anchoring vignettes provide one possible standard, or anchor, to make measurements meaningful. They serve the same purpose as medical tests or other physical measurements when they are available. If you can afford to do the physical tests, and if they are accurate in the area which you are measuring, then you have no need for vignettes as anchors.
For some concepts, direct physical measurement is infeasible. Consider...
Variables that predict thresholds help chopit if they are available. Both chopit and our nonparametric procedure will both work without variables that can predict threshold variation, but both procedures would then require having respondents who are asked both self-assessments and vignettes.
Direct measurement, that is without statistical analysis, is preferable when possible. We have tried a variety of simpler strategies in a diverse array of national surveys, but none seem to do remotely as well as anchoring vignettes. For example, we tried asking which of a set of vignettes the respondent is most like, but we found that respondents had a difficult time remembering them all at the same time. Another possibility is to ask if the respondent has a higher or lower level of health/efficacy/etc than the first vignette, and then the second, etc. This is better, but it also does...
The one research area where our approach clearly does not work is educational testing. The difficulty with educational testing is that no matter how carefully you write the common test questions as anchors, test takers will differ in their responses to them according to both DIF and their knowledge or achievement. Anchoring vignettes solve the problem in other areas because a respondent's answer is only a function of DIF (and estimation variability), and so can be used to adjust the self-assessments. An appropriate anchoring vignette in educational testing would be a test question where all... Read more about Why will anchoring vignettes work when we know that putting educational achievement tests on a common scale has not been possible?
First, for simplicity and since statistical methods can deal with it in fairly straightforward ways, imagine that random perceptual and measurement error were nonexistent. Then what needs to happen for all the problems to be fixed is that respondents differ in their interpretation of the vignettes only due to DIF (differential item functioning, or interpersonal incomparability), whereas the responses to the self-assessments must differ due to DIF and the actual values (A) on the concept of interest. In...
Vignette answers are a function of both the actual level of the person in the vignette (θ, the same for all respondents) and the DIF applied by each respondent (differing over respondents). We can think of these answers as responses to the portions of the vignette text that are, respectively, (1) an integral part of describing θ and (2) words used to package these concepts. DIF is generated by the packaging, which human language of course prevents us from eliminating entirely. Fortunately, to meet the assumption of the model,...