
Other examples of concise knowledge structures can be found in stoichiometry (mathematical chemistry Segedinac et al., 2018), stochastic problem solving (Stefanutti et al., 2012), or the laws of mechanics (Reif & Heller, 1982). The domain structure of fractions is hierarchical, because some skills are prerequisites for other skills (e.g., performing the basic fraction subtraction operation and distinguishing whole numbers from fractions are prerequisites for borrowing one from the whole number to the fraction). For example, Tatsuoka ( 1990) described the domain of solving fraction problems on the basis of seven elements, termed skills (e.g., distinguishing whole numbers from fractions or converting whole numbers to fractions). This approach has been formally defined in (probabilistic) knowledge space theory (e.g., Stefanutti et al., 2012). Structure-driven approaches assume that a domain consists of distinct elements (i.e., concepts or skills) that have a defined relationship with one another, and solving an item requires some subset of these elements. One way to organize these approaches is to divide them into structure-driven, complexity-driven, and exposure-driven approaches. Different approaches to identifying domain-related item features have proven successful in various fields. Domain-related and theory-based features of test items should explain item difficulty to allow valid interpretations of test results.

Essentially, difficulty is a property of an item that describes how much skill, ability, or knowledge is required to solve the item (Embretson & Reise, 2013). A high-quality assessment should be based on a solid theory about the domain, and this theory should be able to explain why items are difficult or easy (Mislevy et al., 2003). The design of high-quality assessments for knowledge, abilities, and competencies is a major research topic in educational assessment and psychometrics (American Psychological Association, APA Task Force on Psychological Assessment & Evaluation Guidelines, 2020 Care et al., 2018). Examining word frequency from different language settings can help researchers investigate test score interpretations and is a useful tool for predicting item difficulty and refining knowledge test items. High word frequencies and relatively higher word frequency in everyday settings could be associated with higher probability of exposure, conceptual complexity, and better readability of item content. Items with words that are more frequent in both settings and, in particular, relatively frequent in everyday settings are easier. However, both types of word frequency combined explain a considerable amount of the variance in item difficulty. Results from a study with 99 political knowledge test items administered to N = 250 German seventh (age: 11–14 years) and tenth (age: 15–18 years) graders showed that word frequencies in everyday settings (SUBTLEX-DE) explain variance in item difficulty, while word frequencies in academic settings (dlexDB) alone do not. Thus, we hypothesize that item difficulty in knowledge tests should be related to the probability of exposure to the item content in everyday life and/or academic settings and therefore also to word frequency. Exposure to language associated with facts and concepts might be an indicator of the opportunity to learn.

Knowledge is mainly conveyed through language.

A major determining factor of item difficulty in knowledge tests is the opportunity to learn about the facts and concepts in question. Item difficulty models are vital to generating test result interpretations based on evidence.

The quality of tests in psychological and educational assessment is of great scholarly and public interest.
