Assessing beginner arabic proficiency in saudi foundation programs: Aligning Arabic-as-a-foreign-language learning outcomes with the Imtāʿ reference framework at the preparatory level
Main Article Content
Abstract
Saudi foundation (preparatory-year) programs increasingly host Arabic-as-a-foreign-language (AFL) learners whose progression must be reported in ways that are interpretable across institutions. Yet beginner assessment practices often remain local, outcome statements are under-specified, and score reports are weakly linked to externally described proficiency levels. This study advances a replicable alignment-and-linking approach that connects (i) course learning outcomes, (ii) assessment blueprints and rating scales, and (iii) score interpretations to the Imtāʿ Reference Framework at the preparatory level. We argue that such alignment strengthens validity arguments, improves fairness, and enables mobility by making beginner proficiency claims auditable and comparable.
Methodologically, the paper combines (a) outcomes-to-descriptor mapping by an expert panel, (b) development of an Imtāʿ-aligned beginner proficiency assessment covering listening, reading, interaction, and guided writing, (c) rater training and many-facet Rasch measurement for productive tasks, and (d) standard-setting (bookmark) to locate cut scores for preparatory sublevels. Results from a synthetic cohort dataset (N = 360) illustrate how the workflow yields high content relevance indices for mapped outcomes (median Aiken’s V = .86), stable measurement for the receptive test (Rasch person reliability = .82), manageable rater severity after calibration (MFRM severity range < 0.8 logits), and interpretable distributions of learners across Imtāʿ preparatory sublevels.
The study contributes a practical blueprint for institutions seeking to operationalize Imtāʿ at entry and early-exit points, with recommendations for outcome rewriting, item banking, rater certification, and reporting formats that communicate what learners can do with Arabic at the preparatory stage.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This open-access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license.
You are free to: Share — copy and redistribute the material in any medium or format. Adapt — remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms: Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
How to Cite
Share
References
Aiken, L. R. (1985). Three coefficients for analyzing the reliability and validity of ratings. Educational and Psychological Measurement, 45(1), 131–142.
Al-Batal, M. (Ed.). (2017). Arabic as One Language: Integrating Dialect in the Arabic Language Curriculum. Georgetown University Press.
Alderson, J. C. (2007). The CEFR and the need for more research. The Modern Language Journal, 91(4), 659–663.
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language Test Construction and Evaluation. Cambridge University Press.
American Council on the Teaching of Foreign Languages (ACTFL). (2012). ACTFL Proficiency Guidelines 2012. ACTFL.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573.
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational Measurement (2nd ed., pp. 508–600). American Council on Education.
Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford University Press.
Bachman, L. F., & Palmer, A. S. (1996). Language Testing in Practice: Designing and Developing Useful Language Tests. Oxford University Press.
Bachman, L. F., & Palmer, A. S. (2010). Language Assessment in Practice. Oxford University Press.
Bailey, K. M. (1998). Learning about Language Assessment: Dilemmas, Decisions, and Directions. Heinle & Heinle.
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch Model: Fundamental Measurement in the Human Sciences (3rd ed.). Routledge.
Boone, W. J., Staver, J. R., & Yale, M. S. (2014). Rasch Analysis in the Human Sciences. Springer.
Brennan, R. L. (Ed.). (2006). Educational Measurement (4th ed.). American Council on Education / Praeger.
Brown, H. D., & Abeywickrama, P. (2019). Language Assessment: Principles and Classroom Practices (3rd ed.). Pearson.
Brown, J. D. (1996). Testing in Language Programs. Prentice Hall Regents.
Brown, J. D., & Hudson, T. (2002). Criterion-Referenced Language Testing. Cambridge University Press.
Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1(1), 1–47.
Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–272.
Chapelle, C. A., Enright, M. K., & Jamieson, J. (Eds.). (2008). Building a Validity Argument for the Test of English as a Foreign Language. Routledge.
Cizek, G. J., & Bunch, M. B. (2007). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. SAGE.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge.
Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press.
Council of Europe. (2020). Common European Framework of Reference for Languages: Learning, Teaching, Assessment – Companion Volume. Council of Europe Publishing.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Davies, A., Brown, A., Elder, C., Hill, K., Lumley, T., & McNamara, T. (1999). Dictionary of Language Testing. Cambridge University Press.
DeVellis, R. F. (2016). Scale Development: Theory and Applications (4th ed.). SAGE.
Douglas, D. (2000). Assessing Languages for Specific Purposes. Cambridge University Press.
Eckes, T. (2015). Introduction to Many-Facet Rasch Measurement: Analyzing and Evaluating Rater-Mediated Assessments (2nd ed.). Peter Lang.
Educational Center for the Arabic Language for the Gulf States. (2023). The reference framework for teaching Arabic to speakers of other languages: Authorship–teaching–training (IMTAA) (1st ed.). United Arab Emirates: Educational Center for the Arabic Language for the Gulf Stat.
Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. Lawrence Erlbaum Associates.
Fulcher, G. (2010). Practical Language Testing. Hodder Education.
Fulcher, G., & Davidson, F. (2007). Language Testing and Assessment: An Advanced Resource Book. Routledge.
Green, A. (2013). Exploring Language Assessment and Testing: Language in Action. Routledge.
Haertel, E. H. (2006). Reliability. In R. L. Brennan (Ed.), Educational Measurement (4th ed., pp. 65–110). American Council on Education / Praeger.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309–334.
Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational Measurement (4th ed., pp. 433–470). American Council on Education / Praeger.
Hughes, A. (2003). Testing for Language Teachers (2nd ed.). Cambridge University Press.
Interagency Language Roundtable. (2011). ILR Skill Level Descriptions. https://www.govtilr.org/Skills/ILRscale1.htm
Islamic University of Madinah. (n.d.). Arabic language level test (AKFA). https://iu.edu.sa/%D8%A7%D8%AE%D8%AA%D8%A8%D8%A7%D8%B1-%D8%AA%D8%AD%D8%AF%D9%8A%D8%AF-%D9%85%D8%B3%D8%AA%D9%88%D9%89-%D8%A7%D9%84%D9%84%D8%BA%D8%A9-%D8%A7%D9%84%D8%B9%D8%B1%D8%A8%D9%8A%D8%A9-%D8%A7%D9%83%D9%81%D8%A7--0
Islamic World Educational, Scientific and Cultural Organization (ICESCO). (2025, June 26). ICESCO releases 10 new specialized books on teaching Arabic to non-Arabic speakers. https://icesco.org/en/2025/06/26/icesco-releases-10-new-specialized-books-on-teaching-arabic-to-non-arabic-speakers/
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527–535.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational Measurement (4th ed., pp. 17–64). American Council on Education / Praeger.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.
Kane, M. T., Crooks, T. J., & Cohen, A. S. (1999). Validating measures of performance. Educational Measurement: Issues and Practice, 18(2), 5–17.
Khaldieh, S. A. (2001). Learning Arabic as a foreign language: The role of phonology. Foreign Language Annals, 34(2), 137–147.
King Salman Global Academy for Arabic Language. (n.d.). Building and implementing language proficiency tests (Hamzah Academic Assessment). https://ksaa.gov.sa/en/-/building-and-implementing-language-proficiency-tests-1
Knoch, U. (2009). Diagnostic Writing Assessment: The Development and Validation of a Rating Scale. Peter Lang.
Kunnan, A. J. (2004). Test fairness. In M. Milanovic & C. J. Weir (Eds.), European Language Testing in a Global Context (pp. 27–48). Cambridge University Press.
Kunnan, A. J. (Ed.). (1998). Validation in Language Assessment. Lawrence Erlbaum Associates.
Lantolf, J. P., & Thorne, S. L. (2006). Sociocultural Theory and the Genesis of Second Language Development. Oxford University Press.
Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Memorandum No. 26. University of Chicago.
Linacre, J. M. (2014). Winsteps Rasch Measurement Computer Program User’s Guide. Winsteps.com.
Long, M. H. (2015). Second Language Acquisition and Task-Based Language Teaching. Wiley-Blackwell.
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246–276.
Luoma, S. (2004). Assessing Speaking. Cambridge University Press.
McNamara, T. (1996). Measuring Second Language Performance. Longman.
McNamara, T. (2000). Language Testing. Oxford University Press.
McNamara, T., & Roever, C. (2006). Language Testing: The Social Dimension. Blackwell.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13–103). Macmillan.
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1(1), 3–62.
Mitrajati, K., Musthofa, T., & Baroroh, R. U. (2025). IMTA-Based Arabic Language Learning Curriculum at LKBA At-Tasniim Yogyakarta. QALAMUNA: Jurnal Pendidikan, Sosial, dan Agama, 17(1), 135–146. https://doi.org/10.37680/qalamuna.v17i1.6493
Mohamed, S. (2021). Developing an Arabic curriculum framework based on a compilation of salient features from CEFR-level descriptors. The Language Learning Journal, 51(1), 33–47. https://doi.org/10.1080/09571736.2021.1923781
Norris, J. M. (2016). Current uses for language assessment in language programs. In D. Tsagari & J. Banerjee (Eds.), Handbook of Second Language Assessment (pp. 279–296). De Gruyter Mouton.
North, B. (2000). The Development of a Common Framework Scale of Language Proficiency. Peter Lang.
O’Sullivan, B. (2012). Language Testing: Theories and Practices. Palgrave Macmillan.
Papageorgiou, S. (2010). Investigating the decision consistency of the TOEFL iBT. Language Testing, 27(4), 547–564.
Porter, A. C. (2002). Measuring the content of instruction: Uses in research and practice. Educational Researcher, 31(7), 3–14.
Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Danish Institute for Educational Research.
Ryding, K. C. (2005). A Reference Grammar of Modern Standard Arabic. Cambridge University Press.
Ryding, K. C. (2013). Teaching and Learning Arabic as a Foreign Language: A Guide for Teachers. Georgetown University Press.
Saudi Electronic University. (2017). Standardized Arabic Test: Test request form. https://seu.edu.sa/media/1757/test-request.pdf
Schmitt, N. (2010). Researching Vocabulary: A Vocabulary Research Manual. Palgrave Macmillan.
Syakur, A. (2023). Al-Iṭār al-Marjiʿī li Taʿlīm al-Lughah al-ʿArabiyyah (Imtāʿ). al-Markaz al-Tarbawī li al-Lughah al-ʿArabiyyah li-Duwal al-Khalij (Gulf States Arabic Language Education Center), Sharjah.
Teddlie, C., & Tashakkori, A. (2009). Foundations of Mixed Methods Research. SAGE.
Thompson, B. (2004). Exploratory and Confirmatory Factor Analysis: Understanding Concepts and Applications. American Psychological Association.
Webb, N. L. (1997). Criteria for Alignment of Expectations and Assessments in Mathematics and Science Education. University of Wisconsin-Madison, National Institute for Science Education.
Weigle, S. C. (2002). Assessing Writing. Cambridge University Press.
Weir, C. J. (2005). Language Testing and Validation: An Evidence-Based Approach. Palgrave Macmillan.
Wright, B. D., & Masters, G. N. (1982). Rating Scale Analysis. MESA Press.
Zieky, M. J., Perie, M., & Livingston, S. A. (2008). Cutscores: A Manual for Setting Standards of Performance on Educational and Occupational Tests. Educational Testing Service.