标准设定(Standard setting)是定义标准界限的方法和过程,在标准参照测验开发过程中起着重要作用。通过标准设定,可以得到标准参照测验的临界分数,据此测试者将被划分为“通过”或“未通过”,或者是被划分为更多的有序表现类别。
Body of Work(BoW)方法是Holistic standard-setting methods中的一种,主要特征是由专家对不同水平的考生作答进行评估,并直接对每个作答样本做出(类别型)评判。该方法主要用于写作、口语、结构性作答等题型。
在标准设定过程中,评分员的评定数据需要即刻分析并以报告的形式呈现。本文为使用BoW方法进行标准设定的实例,运用Rmarkdown ,对12个评分员参与的写作(50篇样文)标准设定结果给出实时分析报告。
以下为代码:
---title: "写作标准设定(Body of work)"author: - BENEdocumentclass: ctexartoutput: rticles::ctex: fig_caption: yes number_sections: yes toc: yesclassoption: "hyperref"---$$ $$# 反馈卡```{r setup, results='hide',include=FALSE}#####设置全局选项,echo = TRUE表示在输出文件中显示代码#####knitr::opts_chunk$set(echo = TRUE)``````{r, include=FALSE}###设置全局选项,tinytex.verbose = TRUE表示在输出中显示详细信息###options(tinytex.verbose = TRUE)``````{r, echo=FALSE, message = FALSE, warning = FALSE}##清除内存##rm(list = ls()) ##载入相关R包##library(readxl)library(knitr)library("kableExtra")library("tinytex")library(reshape)library(plyr)library(dplyr)library('scales')##载入分析文件##dat1 = read_excel("dat1.xlsx",col_names = TRUE)dat11 = dat1[,-c(1,2)]ff1 = function(x) (length(which(x==1)))ff2 = function(x) (length(which(x==2)))ff3 = function(x) (length(which(x==3)))ff4 = function(x) (length(which(x==4)))###评分有四类,计算所有评分员在各类上的评分数量###one = apply(dat11,1,ff1)two = apply(dat11,1,ff2)thr = apply(dat11,1,ff3)fou = apply(dat11,1,ff4)cc = cbind(one,two,thr,fou)##四类时使用#cc = cbind(one,two)##两类时使用tt = apply(cc,1,sum)ccc = cbind(dat1[,2],one,two,thr,fou,tt)##四类时使用#ccc = cbind(dat1[,2],one,two,tt)##两类时使用###计算二、三、四类的评分比例###lp1 = apply(ccc[,c(3:5)],1,sum)/ccc$tt##四类时使用lp2 = apply(ccc[,c(4:5)],1,sum)/ccc$tt##四类时使用lp3 = apply(as.data.frame(ccc[,5]),1,sum)/ccc$tt##四类时使用#lp = round(apply(as.data.frame(ccc[,3]),1,sum)/ccc$tt,2)##两类时使用####low#####lpp1 = cbind(dat1[,2],tt,lp1)lpp1[lpp1==0]<-NAlpp1[lpp1==1]<-NAlpp1<-na.omit(lpp1)####mid#####lpp2 = cbind(dat1[,2],tt,lp2)lpp2[lpp2==0]<-NAlpp2[lpp2==1]<-NAlpp2<-na.omit(lpp2)####high#####lpp3 = cbind(dat1[,2],tt,lp3)lpp3[lpp3==0]<-NAlpp3[lpp3==1]<-NAlpp3<-na.omit(lpp3)####定义百分数转换到对数的函数####fe = function(x) (log(x/(1-x)))####逻辑回归-计算low-cut point#####le1 = apply(as.data.frame(lpp1$lp1),2,fe)lll1 = glm(lp1~point,family=binomial(link='logit'),weights=tt,data=lpp1)###cutp1 = round(abs(-lll1$coefficients[1]/lll1$coefficients[2]),1)rr1 = summary(lll1)###pval1 = round(rr1$coef[8],4)####逻辑回归-计算mid-cut point#####le2 = apply(as.data.frame(lpp2$lp2),2,fe)lll2 = glm(lp2~point,family=binomial(link='logit'),weights=tt,data=lpp2)###cutp2 = round(abs(-lll2$coefficients[1]/lll2$coefficients[2]),1)rr2 = summary(lll2)###pval2 = round(rr2$coef[8],4)####逻辑回归-计算high-cut point#####le3 = apply(as.data.frame(lpp3$lp3),2,fe)lll3 = glm(lp3~point,family=binomial(link='logit'),weights=tt,data=lpp3)###cutp3 = round(abs(-lll3$coefficients[1]/lll3$coefficients[2]),1)rr3 = summary(lll3)###pval3 = round(rr3$coef[8],4)###计算评分信度###v1 = melt(as.data.frame(dat11))Tvar1 = var(v1$value)item_mean11 = apply(dat11, 1, mean)Ivar1 = as.numeric(var(item_mean11)) rater_consis1 = Ivar1/Tvar1 ###ICC信度###rater_consis1a = (1- (sum(apply(dat11, 2, var)))/var(apply(dat11, 1, sum))) ###Cronbach a信度######形成评分反馈表###ddd = ccc[3:5]/length(colnames(dat11))back1 = cbind(dat1[1],ccc,ddd)colnames(back1)=c("ID","point","onen","twon","thrn","foun","tt","two","thr","fou")back1[,which(colnames(back1)=="two")] = percent(back1$two,accuracy = TRUE)back1[,which(colnames(back1)=="thr")] = percent(back1$thr,accuracy = TRUE)back1[,which(colnames(back1)=="fou")] = percent(back1$fou,accuracy = TRUE)ss1 = apply(dat11,2,ff1)ss2 = apply(dat11,2,ff2)ss3 = apply(dat11,2,ff3)ss4 = apply(dat11,2,ff4)ave1234 = cbind(ss1,ss2*2,ss3*3,ss4*4)aves = apply(ave1234,1,mean)sdp = sd(aves)*2sdn = sd(aves)*(-2)means = mean(aves)upp = sdp+meansdownn = sdn+meansssst = ifelse(aves<upp&aves>downn,"NO","YES")strick = cbind(aves,ssst)maxs = max(aves)mins = min(aves)maxmin = c(maxs,mins)meansd = c(round(mean(aves),2),round(sd(aves),2))###形成评分员严厉度表###strickf = rbind(strick,maxmin,meansd)rownames(strickf) = c(colnames(dat11),"最大值&最小值","平均值&标准差")```\Large 本轮的评分员一致性**ICC信度**为**`r round(rater_consis1,2)`**, **Cronbach a信度**为**`r round(rater_consis1a,2)`**。划定的合格Cut score为**`r cutp1`**分,计算pvalue为**`r round(pval1,3)`**(小于0.05表明整个评分过程无明显异常)。划定的低中Cut score为**`r cutp2`**分,计算pvalue为**`r round(pval2,3)`**(小于0.05表明整个评分过程无明显异常)。划定的中高Cut score为**`r cutp3`**分,计算pvalue为**`r round(pval3,3)`**(小于0.05表明整个评分过程无明显异常)。$$ $$```{r, echo=FALSE, message = FALSE, warning = FALSE}###评分反馈表格化###kable(back1,"latex", booktabs = T,longtable = TRUE, col.names=c("ID","Point","Below-N","Low-N","Mid-N","High-N","Rater-N","Low-P","Mid-P","High-P"), caption = "评定反馈表", align='ccccccccccc')%>% kable_styling(latex_options = c("striped","repeat_header","scale_down",full_width = T ,position = "center", font_size =3))%>% column_spec(1, bold = T)%>% column_spec(2:length(colnames(back1)), width = "0.8cm")%>% row_spec(0, bold = T,italic = T,font_size =8)%>% row_spec(1:length(rownames(back1)), font_size =10)%>% footnote(general = "本表列名依次为:作文标号、作文得分、评定未达标数量、低水平数量、中水平数量、高水平数量、评分员数量、低水平比例、中水平比例、高水平比例。",threeparttable = T,fixed_small_size = 3)###评分员严厉度表格化###kable(strickf,"latex", booktabs = T,longtable = TRUE, col.names=c("严厉度","是否异常"), caption = "评分员严厉度", align='ccc')%>% kable_styling(latex_options = c("striped","repeat_header"),font_size = 14)%>% column_spec(1, bold = T) %>% row_spec(0, bold = T) %>% row_spec((nrow(strickf)-1):nrow(strickf), bold = T, color = "black", background = "#ADD8E6")%>% footnote(alphabet = "数值越大表明评定越宽松。 ")```$$ $$# 反馈分布图\Large 下列图片为本轮所有评分者将不同作文进行分类后的结果分布图。若一篇作文两个及以上类别的直方图高度接近,表明评分者评定较不一致,需进行讨论。```{r, echo=FALSE, message = FALSE, warning = FALSE,out.width ='\\textwidth',fig.align ='center'}library('ggplot2')data=t(back1[,-c(7:10)])rownames(data) = c("Level","Total","Below","Basic", "Mid","High")colnames(data) = data[1,]dat=data[-c(1,2),]dat = as.data.frame(dat) Level <- factor(rownames(dat), levels=c("Below","Basic", "Mid","High"), ordered=TRUE) for (i in 1:ncol(dat)){ p = ggplot(dat, aes(x=Level, y=as.numeric(dat[,i]))) + geom_histogram(stat="identity", fill=ifelse(sd(as.numeric(dat[,i]))<=4, "red","lightblue"), colour="black")+ labs(y='Number of raters', x=colnames(data)[i])+ scale_y_continuous(breaks=seq(0,15,1))+ theme(axis.text.x = element_text(size = 14,color="black"), axis.text.y = element_text(size = 15,color="black"), axis.title.x=element_text(size=14)) print(p)}```# 原始评分表(附表)```{r, echo=FALSE, message = FALSE, warning = FALSE}kable(dat1,"latex", booktabs = T,longtable = TRUE, align='cccccccccccccc')%>% kable_styling(latex_options = c("striped","repeat_header"),font_size = 8)```