Candidates who take the CHI™ oral performance examination will not receive preliminary results upon completion of the CHI™ examination since this examination requires human scoring. Candidates who take the CHI™ oral performance examination will receive official results within approximately six to eight weeks from the last date of the corresponding testing window (when all testing is done, not from the date of the candidate’s exam) via email.

The CHI™ oral performance examination consists of:

  • seven items (or “vignettes”) of candidate’s audio recorded responses that are scored by human raters, and
  • one four-option, multiple-choice question assessing candidate’s the written translation abilities that is scored electronically as a single correct response.

Raters score the examination by applying the Behaviorally Anchored Rating Scales which was developed and validated by CCHI’s Subject Matter Experts under the guidance of a nationally-recognized psychometrician.

Each candidate’s audio response is scored on the following four rating scales, which have equal weight and are applied independently:

  1. Lexical content: Raters evaluate how accurately the candidate preserves ‘units of information’ of the source speech/text. A unit of information can be an individual word, a group of words or a phrase that communicates a single concept. On this scale, errors include omissions, additions, and the inaccurate translation of a unit of information.
  2. Register of speech: Register is a variety of language used for a particular purpose or in a particular social setting, the level of formality chosen by the speaker. Raters evaluate how accurately the candidate preserves the register of the source speech/text, taking into account natural differences between languages.
  3. Grammar: Grammar includes a set of rules that govern how sentences, phrases and words are put together in a given language. Raters evaluate the candidate’s command of grammar in both languages. On this scale, errors include changes in verb tense or agreement, use of incorrect pronouns, inaccurate word order (syntax) in the target language, etc.
  4. Quality of speech: Quality of speech focuses on the physical characteristics of the speech produced by the candidate. On this scale, common errors include false starts, hesitations, numerous self-repairs, poor pronunciation or pace that hinders understanding.

All raters have undergone extensive training and are monitored by a psychometrician to assure valid and reliable performance. Raters do not know candidate identities when scoring examinations.

Each oral response (i.e. recording of interpreting one exam item/vignette) is scored by two raters independently. Raters do not score the entire exam of one candidate; they score individual responses. This process allows up to 14 raters to score a candidate’s exam. Additionally, if two raters disagree by one point on a particular score for a particular response, that response is then scored by a third rater. Raters do not know if a candidate passes or fails the exam because they do not score a whole exam and have no access to other rater’s scores or the final score.

Total scores for each of the exam’s subdomains are weighted according to CCHI’s proprietary formula based on exam specifications. The passing score (passing standard) is determined by the teams of Subject Matter Experts (SMEs) and the CCHI Commissioners through a standard setting process (see its detailed explanation below). The raw score is then scaled (via a mathematical formula) to the distribution of 300 to 600 with the passing score set at 450. Since different forms of the test may differ slightly in difficulty, a statistical procedure called equating is used to ensure that the passing score of 450 is comparable from form to form (see explanation of the equating procedure below).

The Score Report, in addition to the overall test score, indicates how candidates scored on the exam subdomains (Interpret Consecutively, Interpret Simultaneously, and Sight Translate/Translate a written message) to help identify strengths and weaknesses for future study.

Keep in mind that the Score Report states two separate things: the overall test score, and how well you did in specific parts of the test. There is no relationship between the percentages reported for the parts of the test (subdomains) and the overall scaled score.

We report the percentage correct for 3 subdomains: consecutive interpreting, simultaneous interpreting, and sight translation/translation. The percentage correct for a part of the test (subdomain, e.g. consecutive interpreting) is computed as the portion of the points that you earned relative to the number of points it is possible to earn in that part. For example, if the maximum number of points that it is possible to earn in a part of the test is 72 and you earned 51 points, the percentage on your score report would be 71% out of 100% possible in that subdomain.

Your total score is not the average of your performance in subdomains. The total score is based on the full examination. There is no pass or fail status associated with an individual content area (subdomain). The percentages reported for subdomains are intended only as a guide and should be interpreted cautiously due to the small number of items included in each content area. In order to improve your score, if you failed an exam, you need to practice and improve all modes of interpreting. For more information on the domains, see the Test Content Outline.

Explanation of Standard Setting

To establish the passing score for the CoreCHI exam, CCHI uses the Extended Modified Angoff method that has an established history of determining credible passing standards for credentialing examinations, and, additionally, the Beuk Relative-Absolute Compromise method.

The Extended Modified Angoff method involves two basic elements: conceptualization of a minimally competent candidate and the estimation, as assigned by SMEs, of the average score a minimally competent candidate would receive on each item. A minimally competent candidate is described as an individual who would be able to demonstrate just enough knowledge and skills to pass the examination. In general, such a candidate has enough interpreting skills to practice safely and competently, but does not demonstrate the skill level to be considered an expert.

SMEs provide ratings for each test item estimating a score a minimally competent candidate would get on the item. Then they compare their ratings with empirical data collected during the pilot phase for each item and discuss their ratings as a group, with the goal to reach as close a consensus as possible. The SMEs’ ratings are then averaged, and this “provisional cut score” is further reviewed and validated.

To establish an operational cut score, SMEs are also asked to make a specific prediction about the test as a whole. This prediction is then used to adjust the panel-recommended rating and is known as the Beuk Relative-Absolute Compromise method.

For more information about the standard setting methods, see:

  • Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.) (pp. 508‐600). Washington, DC: American Council on Education.
  • Beuk, C. H. (1984). A method of reaching a compromise between absolute and relative standards in examinations. Journal of Educational Measurement, 21, 147-152.
  • Hambleton, R. M., & Plake, B. S. (1995). Using an extended Angoff procedure to set standards on complex performance assessments. Applied Measurement in Education, 8, 41‐56.
  • Plake, B. S., & Cizek, G. J. (2012). Variations on a theme: The Modified Angoff, Extended Angoff, and Yes/No standard setting methods. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods and innovations (pp. 181‐200). New York, NY: Routledge.

Explanation of Equating

Following the best testing practices, CCHI has several versions of the same exam (called test forms) administered to candidates. One of the reasons for this is to be fair to candidates who take the exam for the first time and to those who retake the exam. Ideally, each candidate should have a new to them version of the exam.

Different test forms may be of slightly different difficulty, because of the natural variations in the language of the test items (e.g., dialogs). And, again, it is important to be fair to all candidates regardless of which form they took. To achieve this fairness, the test forms undergo a procedure called equating.

Equating is a mathematical calculation that ensures that the test forms have the passing points at the same level of the candidate’s performance, i.e., that the forms are “equal” and “fair.” Test forms are equated to the “standard.” The “standard” is the form that the SMEs used to establish the passing score, and all subsequently developed forms are equated to it. Let’s say the standard is Form 1, and Forms 2 and 3 are equated to Form 1. Forms 2 and 3 will have different raw passing points because of this equating but they will be then scaled to represent the same passing score of 450 points. As a result of equating, a slightly easier form will require the candidate get higher points on some test items (called “raw scores”) to pass the exam. And a slightly more difficult test form will allow the candidate get lower points on some test items to pass.

Equating calculations are done by psychometricians and then reviewed and approved by CCHI.

As an analogy, if a second grade mathematics test included both addition and multiplication problems, you might expect the addition problems to be easier and multiplications problems to be harder. Let’s say Class 1 has an exam with 75 addition questions and 25 multiplication ones, whereas class 2 has an exam with 65 addition  and 35 multiplication questions. Then, to be fair for both classes, the final grade on two exams would have to be mathematically adjusted. Let’s say the addition question is worth 1 point, and the multiplication question is worth 4 points. Now imagine these 4 students:

  • Student A from Class 1 who correctly answers all addition questions and misses all multiplication problems would have a final score of 75, and 75% of his questions would be correct.
  • Student B from Class 1 who misses all the addition questions, but answers all the multiplication problems correctly would have a test score of 100, but would have only answered 25% of the questions correctly.
  • Student C from Class 2 who correctly answers all addition questions and misses all multiplication problems would have a final score of 65, and 65% of his questions would be correct.
  • Student D from Class 2 who misses all the addition questions, but answers all the multiplication problems correctly would have a test score of 140, but would have only answered 35% of the questions correctly.

To conclude, the percent scoring should be seen more as an indication that you did better in one domain than another on that particular test. You cannot compare between tests because they have a mix of test items with differing difficulties and, therefore, different weights for the final overall exam score.

When CCHI applies for accrediting and re-accrediting its exams, the equating procedures are submitted for review to the accrediting body. Accreditation is a form of final review and confirmation that the accredited exam meets all the requirements to be fair and reliable.

Subscribe and be the first to know.