Is Cross-Marking A Way To Increase Rater Reliability?

Author :  

Year-Number: 2018-Volume 6 Issue 3
Language : English
Konu : null
Number of pages: 331-346
Mendeley EndNote Alıntı Yap

Abstract

Keywords

Abstract

Most of the error correction research has focused on whether teachers should correct errors in student writing, how they should do it and how deep it should be. Recent research, thus, has mostly focused on pedagogical merits of error correction and its possible benefits for student learning. However, in some particular contexts where graders make multiple scorings on the same paper, not much has been investigated to see if those corrections manipulate other graders or whether the writing teachers’ corrections on students’ papers have a positive or negative impact on the reliability of the scores when raters see the corrections of the other graders on the papers they mark. This study intended to explore whether corrections made by the graders affect the scores of colleagues who are scoring the same papers second time to gain more accurate results and to ensure the rating reliability. To do that, 12 writing teachers graded 20 student essays written by intermediate level English learners. The participants were first asked to grade 10 papers without doing error correction and those papers were re-scored after 3 weeks by the same graders, inter-rater and intra-rater reliability computations were carried out for this set of papers to see the actual reliability levels of the raters under normal circumstances. In the second stage, the graders were asked to score the other 10 papers, but this time they also made error corrections on the papers and after 3 weeks, the same teachers graded the same papers that were corrected by their pair graders. The scores assigned each time to these papers by the same raters, were compared statistically and the effect of error correction was investigated on their scores. In conclusion, the results revealed that error marking and grader comment on writing papers may have a negative effect on raters’ intra-rater reliability levels whereas it could have a positive effect on raters’ inter-rater reliability levels when a pool of raters grade the same papers.

Keywords


  • Bell, R.C. (1980). Problems in improving the reliability of essay marks. Assessment & Evaluation in Higher Education. 5(3): 254-263.

  • Bloxham, S. (2009). Marking and Moderation in the UK: False Assumptions and Wasted Resources. Assessment & Evaluation in Higher Education. 34(2):209-220.

  • Brown, G. (2009). The reliability of essay scores: The necessity of rubrics and moderation. In Tertiary Assessment and Higher Education Student Outcomes: Policy, Practice, and Research. Editors: Meyer L, Davidson S, Anderson H, Fletcher R, Johnston PM, Rees M. 43-50. Ako Aotearoa - The National Centre for Tertiary Teaching, Wellington, NZ 2009.

  • Brown, G., Glasswell, K., Harland, D. (2004). Accuracy in the scoring of writing: Studies of reliability and validity using a New Zealand writing assessment system. Assessing Writing. 9 (2004):105121.

  • Caryl, P. G. (1999). Psychology examiners re-examined: A 5-year perspective. Studies in Higher Education. 24 (1): 61-74.

  • Ecclestone, K. (2001). “I Know a 2:1 When I See It: Understanding Criteria for Degree Classifications in Franchised University Programmes.” Journal of Further and Higher Education. 25 (3): 301-313.

  • Fleming, N.D. (1999). “Biases in Marking Students’ Written Work: Quality?” In Assessment matters in higher Education: Choosing and Using Diverse Approaches, edited by S. Brown and A. Glasner, 8392. Buckingham, UK.

  • Guanxin, R. (2007). The Reliability of Essay Marking in High-Stakes Chinese Second Language Examinations. Academic Journal, Babel. Vol 42(2):25-31.

  • Janopoulos, M. (1992). “University Faculty Tolerance of NS and NNS Writing Errors: A Comparison.” Journal of Second Language Writing. 1 (2): 109–20. doi: 10.1016/1060-3743(92)90011

  • Johnson, M., Nádas, R., Shiell, H. (2009). An investigation into marker reliability and other qualitative aspects of on-screen essay. Paper presented at the British Educational Research Association annual conference, Manchester University, September 2009. Cambridge Assessment.

  • Kuper, A. (2006). Literature and Medicine: a problem of assessment. Academic Medicine: 81(10): 128- 137.

  • Laming, D. (1990). “The Reliability of a Certain University Examination Compared with the Precision of Absolute Judgements: Quarterly Journal of Experimental Psychology. 42A (2):239-254.

  • O’Hagan, S.R., Wigglesworth, G. (2015). Who is marking my essay? The assessment of non-native- speaker and native-speaker undergraduate essays in an Australian higher education context. Studies in Higher Education. 40:9, 1729-1747, DOI: 10.1080/03075079.2014.896890

  • Oruc, N. E. (2015). Testing your Tests: Reliability Issues of Academic English Exams. International Journal of Psychology and Educational Studies. 2015, 2 (2), 47-52.

  • Price, M.J., Carroll, B., O’Donovan, B., Rust, C. (2011). “If I Was Going There I Wouldn’t Start From Here: A Critical Commentary on Current Assessment Practice. “ Assessment & Evaluation in Higher Education. 36(4): 479-492.

  • Read, B., Francis, b, Robson, J. (2005). “Gender, Bias, assessment and Feedback: Analysing the Written Assessment of Undergraduate History Essays.” Assessment & Evaluation in Higher Education. 30(3): 241-260.

  • Robson, J., Francis, B., Read, B. (2002). “Writes of Passage: Stylistic Features of Male Undergraduate History Essays. “Journal of Further and Higher Education. 26(4): 351-362.

  • Rom, M. C. (2011). Grading More Accurately. Journal of Political Science Education. 7: 208-223.

  • Rust, C. (2007). “Towards a Scholarship of Assessment.” Assessment and Evaluation in Higher Education. 32 (2): 229–37. doi: 10.1080/0260293060080519

  • Rust, C., Price, M., O'Donovan, B. (2003). “Improving Students' Learning by Developing Their Understanding of Assessment Criteria and Processes.” Assessment and Evaluation in Higher Education. 28 (3): 147–64.

  • Shavelson, R.J., Webb, N.M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.

  • Shaw, S. (2008). Essay Marking On-Screen: implications for assessment validity. E-Learning. Vol: 5 (3). https://doi.org/10.2304/elea.2008.5.3.256

  • Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating inter-rater reliability. Practical Assessment Research & Evaluation. 9(4).

  • Wood, R., Quinn, B. (1976) Double Impression Marking Of English Language Essay and Summary Questions. Educational Review. 28 :(3): 229-246, DOI: 10.1080/0013191760280307

  • Yorke, M. (2008). Grading Student Achievement in Higher Education: Signals and Shortcomings. Abingdon: Routledge.

  • Yorke, M. (2011). “Summative Assessment: Dealing with the ‘Measurement Fallacy’.” Studies in Higher Education. 36 (3): 251–73. doi: 10.1080/03075070903545082.

                                                                                                                                                                                                        
  • Article Statistics