Measuring English Writing at Secondary Level. A Binational Comparative Study
Final Report Abstract
1. Summary 1.1. Research work conducted during funding period in relation to the objectives, milestones and hypotheses mentioned in the research plan MEWS used a multi-level repeated measurement design to examine English writing competencies of learners in their penultimate year of baccalaureate education (11th grade) in Swiss and German upper secondary schools. It investigated the following research questions: (1) How proficient are learners in essay writing in English two years before their baccalaureate? (2) What is the effect of individual factors (e.g. motivation, intelligence), family background and extracurricular activities on English essay writing competences? (3) What is the effect of school and classroom factors (e.g. classroom instruction, school types, ‘baccalaureate rate’) on English essay writing competences? To answer research question 1, we focused on two key genres which figure prominently in the relevant curricula, argumentative writing and source-based writing. As planned, we used tasks from the TOEFL iBT writing test because its writing prompts and assessment rubrics are congruent with the relevant curricula in Germany and Switzerland. The conceptualisation of the writing assessment was done in co-operation with the Educational Testing Service (ETS) in Princeton, NJ, USA, which also conducted the scoring of learner texts according to procedures similar to the TOEFL iBT writing assessment (see Sect. 5 for details). In accordance with the research plan, we tested students in these competences at the beginning and end of their 11th school year in Gymnasium. At both points, we also gathered a wide range of additional data at student-, school- and system level in exact accordance with the research plan. In generating and evaluating data in MEWS, we stuck very closely to the methods and milestones described in the research plan. Data collection took place as planned between Sept.-Oct 2016 (T1) and June-July 2017 (T2) in seven Swiss cantons (Aargau, Basel Stadt, Basel Land, Luzern, St. Gallen, Schwyz, Zurich) and the German state of Schleswig-Holstein. Data were collected from n = 1.882 students in Switzerland (58% female; age: 𝑥̅ T1 = 17.56, SDT1 = .91; 𝑥̅ T2 = 18.27, SDT2 = .91) and n = 965 students in Germany (58.6% female; age: 𝑥̅ T1 = 16.91, SDT1 = .56; 𝑥̅ T2 = 17.61, SDT2 = .56). Students worked in a computer-based test environment on several tests covering writing, reading and listening skills as well as general cognitive ability. Furthermore, they worked on a questionnaire measuring background variables as well as individual characteristics and students’ perceptions of their English classes in school. As each of these students wrote four essays over the two time-points, we collected and analyzed over 10.000 authentic student essays, making MEWS the largest and most detailed empirical study of English writing ever to be undertaken in participating countries at upper-secondary level. Publication of results from the MEWS study happened in international peer-reviewed journals and is still ongoing (see Research Output). The first publication was a paper describing the whole process of human-scoring and machine-scoring of student texts, which also examined the quality of scoring models (Rupp et al., 2019). We then published standard setting study which showed how the TOEFL writing tasks were related to educational curricula in Switzerland and Germany, and how the MEWS scores could be expressed within the Common European Framework MEWS -2- Scientific Report (Fleckenstein et al., 2019). A further publication examined differences between various types of Gymnasium schools in Schleswig-Holstein relating to English learning (Köller et al., 2019). A key study from the project was published in 2020 and contained the main results of our writing assessments, in which we showed that a majority of learners reach minimal standards of English writing (CEFR level B2) one year before the end of upper secondary education (Keller et al., 2020). This paper also contained a comparison of proficiency levels in the two countries (Germany and Switzerland) as well as data on the influence of background factors at student level such a gender, economic background or language background. Doctoral theses in this project examined the influence of extracurricular learning activities and media use on school-related English competences (Maleika Krüger, Basel), in-depth linguistic analysis of learners’ genre-specific writing abilities (Oliver Meyer, Basel), and the influence of individual motivational factors on English writing (Jennifer Meyer, Kiel). 1.2. Relevance of main research results in relation to the research output One main result of MEWS is that, by and large, the majority of students already reach CEFR level B2 in English writing at T1, with only 35.3% of students failing to reach that minimal standards for upper secondary education. That percentage decreased to 28.2% at T2. This is a positive result as it shows that CEFR level B2 is a realistic learning goal for upper secondary education in English writing. However, we also found that very little learning progress in English writing over the course of one school year. Nearly 80% of the students that had not achieved CEFR level B2 at T1 did not achieve it at T2, indicating that there is significant tier of students at risk of not being adequately prepared for the challenges of further education. In general, learning gains over over a time period one school year were relatively small (d ~ 0.20). This corresponds to previous research on English receptive skills but indicates that more scope should be given to English writing at the end of upper secondary level. We need larger learning gains in English writing at the end of upper secondary education, where this competence is a major factor in all the curricula of the relevant cantons and states. The MEWS study has shown a clear and apparent need for better teacher education in English as regards students’ writing skills. At the school- and classroom level, we found a number of effects which are detailed in the empirical studies that came out of the project. For example, we found that that both students’ general cognitive abilities and socioeconomic status (SES) were a significant influence their school achievements. However, the effect of cognitive abilities was much larger compared to SES, suggesting that cognitive entry characteristics are more important for achievement outcomes in upper secondary school than SES. We assume that this is due that SES factors are ‘filtered out’ because the upper secondary educational system is quite selective in both Switzerland and Germany. Result might well be different if all school types were included, suggesting avenues for further study. Surprisingly, we did not find significant effects for language background or gender. It is sometimes assumed in the literature that female students do slightly better in writing than males, but it seems that upper-secondary schools provide educational settings which foster both gender equally. At system level, we found that the average score difference between Swiss and German students was significant as Swiss students outperformed German students by more than a third of a standard deviation (Keller et al. 2020). Eight months of English learning corresponds to an increase of 20 points on the measurement scale (M = 500; SD = 100), suggesting that Swiss students were about 15 to 16 months ahead of German students. The advantage of Swiss students disappears, however, once T1 writing performance is entered into the model, indicating that students in both Germany and Switzerland have about the same learning gains over the course of 8 months but started from MEWS -3- Scientific Report different baselines at the beginning of the school year. This is in all likeliness due to the fact that the Swiss educational system is more restrictive at Gymnasium level than the German one. The data produced in the MEWS project are highly relevant at student- teacher and classroom level. For example, it becomes apparent that teacher education should focus on familiarising teachers more deeply with the relevant genres for tertiary education. Teachers need to become more familiar with the internal workings of key genres to support learners in mastering them. While students have sufficient general competencies in English, they often do not know about the typical conventions of English argumentative essays. In interviews, teachers participating in the study said that they introduced argumentative writing only at the very end of upper-secondary education, in year 12. Our results suggest that this genre should be introduced one year earlier, in 11th grade, so students have more time to familiarize themselves with this key genre for tertiary education. The MEWS research team has therefore started to develop a free, on-line learning platform for students to learn how to write argumentative essays in English, including scoring engines for automated in-time feedback. Furthermore, source-based writing – as operationalised by the “integrated” TOEFL-tasks in this study – is also in need of attention. Students at T1 seemed unfamiliar with this type of task, which requires them to synthesise information from different sources of input (reading & listening). Again, this is a key competence for all types of university study and employability in the modern world of business and commerce. Students improved their competence in source-based writing more significantly than they did in argumentative writing over one school year, which could be due to increased listening and reading competences at T2. We know from teacher interviews, however, that source-based writing as operationalised in the TOEFL test does not seem to be given much room in uppersecondary English classrooms, which are typically focused on reading and discussing literary works. Without wanting to denigrate literary studies, source-based writing appears to be an important avenue for development both in English classrooms and in English teacher education. This includes a focus on teachers’ diagnostic competences to identify students who are not achieving the required levels of competence in this area. 1.3. Deviations from the research plan Overall, there were no significant deviations from the submitted research plan. A small deviation occurred regarding data collection in the two countries: In Switzerland, educational departments were contacted for permission to gather data in schools. Where such permission was received, the research team contacted all relevant public schools in the canton and asked them to participate. Originally, the research team had planned to sample students within schools and classes. However, after feedback from schools it was decided to recruit whole classes in order to make the organization easier for schools. Thus, the sample in Switzerland was a convenience sample. Furthermore, it was left up to schools how many classes from that year should participate in the study. Where possible, classes were selected which had different ‘special subjects’ (i.e., subjects which receive special attention in the curriculum and are given extra lessons in certain semesters such as modern languages, science, and economics in order to maximise the representativeness of the sample. In Germany, all different types of Gymnasiums were also involved. However, students were asked to volunteer for the study, thus reducing the degree to which the two national datasets can be compared. This limitation was due to systemic factors and privacy laws in the two countries and thus out of the control of the research team. MEWS -4- Scientific Report 1.4. Contributions made by the project staff MEWS was a co-operative project between the School of Education at the University of Applied Sciences and Arts Northwestern Switzerland (PH FHNW; Prof. S. Keller) and the Leibniz Institute for Science and Mathematics Education (IPN; Prof. O. Köller) It was supported in the D-A-CH program both by SNF and the German Research Foundation (DFG). Contributions by the research group of Prof. Köller are described in the next section. Swiss project partner Prof. Urs Moser and his team (IBE, UZH) were instrumental in planning and executing the design, data gathering methods, and sampling procedures of the Swiss part of the study. Most of the groundwork for the empirical studies was conducted by two doctoral students: Oliver Meyer (who is registered in mysnf) and Maleika Krüger (who was employed in the project but paid by FHNW overhead costs and therefore does not appear on mysnf). Oliver Meyer was instrumental in designing and implementing the classroom questionnaire. He also contributed heavily to organizing data collection in schools, both in the planning phase and during school visits at both measurement points. O. Meyer is currently working on his PdD study, which looks at detailed linguistic profiles of learner texts at different proficiency levels. Maleika Krüger was with the MEWS project from the very start. She was instrumental in planning the design, research questions, research methodology and data evaluation of this project. In the project phase, she was involved in organizing and implementing data gathering. She was also a leading researcher when it came to statistical evaluations and in compiling the dataset for the study. She is currently finishing her PhD study on extramural influence (use of English media in students’ free time) on English writing competences. She focuses in particular on where, what, why and how often students engage with English media. She also investigates influence of gender, country, language background and social background on English media use. She is due to hand in her dissertation in September 2020. Due to the diligent work of these two students, the planned costs for data gathering were significantly reduced in comparison to the research plan. 1.5. Major contributions made by project partners abroad. The project partner in Kiel (research group of Prof. Olaf Köller) made important and wide-ranging contributions to the project. Prof. Köller contributed the finances for the text scoring done by ETS in Princeton ($195.308). He also provided the finances for programming the on-line data collection tool (€ 85.000). With his substantial experience in large-scale educational measurement, Prof. Köller and his team were instrumental in planning and executing the design, data gathering methods, variables, testing instruments, questionnaires, and statistical evaluations of the project. PhD Student Jennifer Meyer was intensively involved in data gathering and evaluation. She completed her PhD on the role of personality traits, motivation when predicting academic achievement and language learning. This PhD study won the faculty award at the University of Kiel in 2020. Dr. Johanna Fleckenstein got involved in the project after the second data gathering (T2) and was instrumental in planning and executing the “standard setting study” in which scores of MEWS were anchored in the Common European Framework of Reference for Languages (CEFR; Fleckenstein et al., 2019). Dr. Fleckenstein also participated in subsequent publications and follow-up projects (see Scientific Output). Significant contributions to the project were also made by the external project partner, Educational Testing Service in Princeton, USA. ETS was responsible for human-scoring and machine-scoring of the more than 10.000 essays within the MEWS project (Rupp et al., 2019). In this design, each student text was scored both by human raters and automated essay scoring. First, all essays were scored by two independent, experienced human raters on the operational holistic TOEFL iBT scale from 0 to 5. Experienced raters from the operational TOEFL pool were hired to obtain statistically reliable human ratings that achieved a sufficient degree of construct coverage through the consistent MEWS -5- Scientific Report application of the scoring rubrics. Inter-rater agreement, as measured by quadratic weighted kappa was .64/.67 at the two measurement points. For the machine scores, automated essay scoring (AES) models were developed by processing the digitally collected written responses via computational routines. The routines lead to a set of statistical variables - called features - that can be used as predictor variables in statistical models to yield predicted human scores for these essays (Rupp et al., 2019). Each text was scored by e-rater®, the operational AEE engine of the TOEFL iBT test (Burstein, Tetreault, & Madnani, 2013). Human–machine agreement was satisfactorily high for all prompt-specific automated scoring models, as correlations ranged from r =.761 to r =.809 for the independent prompts and from r =.698 to r =.825 for the integrated prompts in the two samples from Switzerland and Germany. By rating each student text with two human and one machine-based score (“h-h-m scoring), MEWS achieved the elusive ‘gold standard’ of text scoring in large-scale educational assessment. Especially the automated text measurement models produced in MEWS are an important innovation in the project which we will continue to explore in follow-up projects. As a consequence of the MEWS project, the research groups of Prof. Stefan Keller and Prof. Olaf Köller have become competence clusters for automated essay evaluation in Europe, and we will pursue this line of research vigorously in the years to come. Among others, this will happen in project TrACE (Training Assessment Competences in English), which we submitted to SNF in April 2020.
Publications
- Measuring Writing at Secondary Level (MEWS). Eine binationale Studie. Babylonia 3 / 2016, 46-48
Keller, S.
- (2019). Automated Essay Scoring at Scale: A Case Study in Switzerland and Germany. Wiley Online Library
Rupp, A.; Casabianca, J.; Krüger, M.; Keller, S. & Köller, O.
(See online at https://doi.org/10.1002/ets2.12249) - (2019). Linking TOEFL iBT® Writing Rubrics to CEFR Levels: Cut Scores and Validity Evidence from a Standard Setting Study. Assessing Writing, 41,1-13
Fleckenstein, J., Keller, S., Krüger, M., Tannenbaum, R., & Köller, O.
(See online at https://doi.org/10.1016/j.asw.2019.100420) - (2019). Schreibkompetenzen im Fach Englisch in der gymnasialen Oberstufe. Zeitschrift für Erziehungswissenschaften, 22, 1281-1312
Köller, O., Fleckenstein, J., Meyer, J., Paeske, A.L., Krüger, M., Rupp, A., & Keller, S.
(See online at https://doi.org/10.1007/s11618-019-00910-3) - English writing skills of students in upper secondary education: Results from an empirical study in Switzerland and Germany. Journal of Second Language Writing 1/2020, 1-13
Keller, S., Fleckenstein, J., Krüger, M., Köller, O., & Rupp, A.
(See online at https://doi.org/10.1016/j.jslw.2019.100700) - Wie gut schreiben Lernende auf der gymnasialen Oberstufe Englische Texte? Gymnasium Helveticum 2/2020, 8-10
Keller, S.