top of page

Developing and Validating the Numeric Understanding Measures (NUMs)

How numerate are you? See if you can answer the following question:

  • If 50 people in a town of 100,000 people catch a virus, what percent of the town has the virus? ______% of the town (answer at the end of the blog)

  • *Please note that this question is not part of any of the measures but was calibrated alongside the items in the Numeric Understanding Measures.*


Numeracy refers to a person’s ability to understand and use numbers, including basic probabilistic and mathematical concepts. It is often thought of as numeric literacy. Numeracy is important because it is associated with financial and health outcomes. Highly numerate people tend to make more money and are more likely to be employed compared to the less numerate. The less numerate are also more likely to have a chronic disease and take more prescription drugs, all while having less ability to follow complex health regimens. Innumeracy is a problem in the US, especially as data becomes more and more accessible. In fact, about 30% of US adults can only perform simple processes with numbers—counting, sorting, simple arithmetic, simple percents—and they can only do so if there is little text and minimal distractors around the numbers.


Numeracy is assessed using a math test. However, existing numeracy measures have flaws. Some are too easy, making them less discerning for the more highly numerate. Others are too difficult, so you cannot distinguish among people with lower skills. Still others have been used and publicized, and their answers can be easily found online. To solve these issues, we created 84 new math problems and calibrated them using item response theory (IRT)—a method of modeling how latent traits are related to responses on items. IRT tells us how informative an item is and at what level of numeracy it is most informative. Using this information, we created three short new Numeric Understanding Measures (NUMs).

  • The A-NUM is adaptive, meaning it asks different questions based on the participant’s performance and requires the participants to answer four questions.

  • The 4-NUM measure also has four items, but all participants see the same four items.

  • The single-item measure uses one of the items from the adaptive measure.


To ensure that we measure what we mean to measure, we validated the new measures. Participants completed our new numeracy measures and two established numeracy measures to test whether all the numeracy measures measured the same thing. We used a confirmatory factor analysis and concluded that they did. Latent numeracy explains about 74% of the variability on A-NUM scores, 70% of the variability in 4-NUM scores, 65% in Weller scores, and 56% in Berlin scores.

Next, we tested if the measure was correlated with similar constructs and less correlated with dissimilar constructs; thus, we tested convergent and discriminant validity, respectively. Think about convergent and discriminant validity as a continuum. Some variables should be highly correlated with the NUMs, like other measures of numeracy. Other variables might be fairly highly correlated, like subjective numeracy and its two subscales of numeric self-efficacy and numeric preference. We expected a non-verbal measure of fluid intelligence (Raven’s matrices) and a measure of crystallized intelligence (vocabulary) to have weaker correlations; Big 5 personality traits should be weakly or not at all related. Our new NUMs correlated as expected and demonstrated similar correlations to these variables as more established measures of numeracy.

Lastly, we checked how much the NUMs predicted behaviors in tasks previously related to more established numeracy measures (predictive validity). We expected that more numerate individuals would more accurately interpret probabilities, better identify numeric information needed to interpret a product's benefit, show weaker framing effects, rate an inferior bet with a small loss as more attractive, and have more consistent risk perceptions. Overall, the new measures predicted behaviors for all the tasks that established measures also predicted. No measure, however, predicted framing effects, an unexpected finding.

The NUMs were created to measure a wide range of numeric ability using only a few items. We demonstrated convergent, discriminant, and predictive validity. Moreover, the new measures appear to assess numeracy in a more fine-grained manner as compared to established measures. Importantly, these new measures are comprised of items not easily found online.

For more information about the development and validation of the NUMs, please see the article published in Judgment and Decision Making:

  • Silverstein, M.,Bjälkebring, P., Shoots-Reinhard, B., & Peters, E. (2023). The numeric understanding measures: Developing and validating adaptive and nonadaptive numeracy scales. Judgment and Decision Making, 18, E19.

For information about how to use the NUMs in your research (including the items and Qualtrics coding to present and score them), please see

This work was supported by the National Science Foundation (Grant No. 2017651) and the Decision Sciences Collaborative at The Ohio State University.


0.05% of the town.

135 views0 comments


bottom of page