标准设定(Standard setting)是定义标准界限的方法和过程,在标准参照测验开发过程中起着重要作用。通过标准设定,可以得到标准参照测验的临界分数,据此测试者将被划分为“通过”或“未通过”,或者是被划分为更多的有序表现类别。
Body of Work(BoW)方法是Holistic standard-setting methods中的一种,主要特征是由专家对不同水平的考生作答进行评估,并直接对每个作答样本做出(类别型)评判。该方法主要用于写作、口语、结构性作答等题型。
在标准设定过程中,评分员的评定数据需要即刻分析并以报告的形式呈现。本文为使用BoW方法进行标准设定的实例,运用Rmarkdown ,对12个评分员参与的写作(50篇样文)标准设定结果给出实时分析报告。
以下为代码:
---
title: "写作标准设定(Body of work)"
author:
- BENE
documentclass: ctexart
output:
rticles::ctex:
fig_caption: yes
number_sections: yes
toc: yes
classoption: "hyperref"
---
$$ $$
# 反馈卡
```{r setup, results='hide',include=FALSE}
#####设置全局选项,echo = TRUE表示在输出文件中显示代码#####
knitr::opts_chunk$set(echo = TRUE)
```
```{r, include=FALSE}
###设置全局选项,tinytex.verbose = TRUE表示在输出中显示详细信息###
options(tinytex.verbose = TRUE)
```
```{r, echo=FALSE, message = FALSE, warning = FALSE}
##清除内存##
rm(list = ls())
##载入相关R包##
library(readxl)
library(knitr)
library("kableExtra")
library("tinytex")
library(reshape)
library(plyr)
library(dplyr)
library('scales')
##载入分析文件##
dat1 = read_excel("dat1.xlsx",col_names = TRUE)
dat11 = dat1[,-c(1,2)]
ff1 = function(x) (length(which(x==1)))
ff2 = function(x) (length(which(x==2)))
ff3 = function(x) (length(which(x==3)))
ff4 = function(x) (length(which(x==4)))
###评分有四类,计算所有评分员在各类上的评分数量###
one = apply(dat11,1,ff1)
two = apply(dat11,1,ff2)
thr = apply(dat11,1,ff3)
fou = apply(dat11,1,ff4)
cc = cbind(one,two,thr,fou)##四类时使用
#cc = cbind(one,two)##两类时使用
tt = apply(cc,1,sum)
ccc = cbind(dat1[,2],one,two,thr,fou,tt)##四类时使用
#ccc = cbind(dat1[,2],one,two,tt)##两类时使用
###计算二、三、四类的评分比例###
lp1 = apply(ccc[,c(3:5)],1,sum)/ccc$tt##四类时使用
lp2 = apply(ccc[,c(4:5)],1,sum)/ccc$tt##四类时使用
lp3 = apply(as.data.frame(ccc[,5]),1,sum)/ccc$tt##四类时使用
#lp = round(apply(as.data.frame(ccc[,3]),1,sum)/ccc$tt,2)##两类时使用
####low#####
lpp1 = cbind(dat1[,2],tt,lp1)
lpp1[lpp1==0]<-NA
lpp1[lpp1==1]<-NA
lpp1<-na.omit(lpp1)
####mid#####
lpp2 = cbind(dat1[,2],tt,lp2)
lpp2[lpp2==0]<-NA
lpp2[lpp2==1]<-NA
lpp2<-na.omit(lpp2)
####high#####
lpp3 = cbind(dat1[,2],tt,lp3)
lpp3[lpp3==0]<-NA
lpp3[lpp3==1]<-NA
lpp3<-na.omit(lpp3)
####定义百分数转换到对数的函数####
fe = function(x) (log(x/(1-x)))
####逻辑回归-计算low-cut point#####
le1 = apply(as.data.frame(lpp1$lp1),2,fe)
lll1 = glm(lp1~point,family=binomial(link='logit'),weights=tt,data=lpp1)
###
cutp1 = round(abs(-lll1$coefficients[1]/lll1$coefficients[2]),1)
rr1 = summary(lll1)
###
pval1 = round(rr1$coef[8],4)
####逻辑回归-计算mid-cut point#####
le2 = apply(as.data.frame(lpp2$lp2),2,fe)
lll2 = glm(lp2~point,family=binomial(link='logit'),weights=tt,data=lpp2)
###
cutp2 = round(abs(-lll2$coefficients[1]/lll2$coefficients[2]),1)
rr2 = summary(lll2)
###
pval2 = round(rr2$coef[8],4)
####逻辑回归-计算high-cut point#####
le3 = apply(as.data.frame(lpp3$lp3),2,fe)
lll3 = glm(lp3~point,family=binomial(link='logit'),weights=tt,data=lpp3)
###
cutp3 = round(abs(-lll3$coefficients[1]/lll3$coefficients[2]),1)
rr3 = summary(lll3)
###
pval3 = round(rr3$coef[8],4)
###计算评分信度###
v1 = melt(as.data.frame(dat11))
Tvar1 = var(v1$value)
item_mean11 = apply(dat11, 1, mean)
Ivar1 = as.numeric(var(item_mean11))
rater_consis1 = Ivar1/Tvar1 ###ICC信度###
rater_consis1a = (1- (sum(apply(dat11, 2, var)))/var(apply(dat11, 1, sum))) ###Cronbach a信度###
###形成评分反馈表###
ddd = ccc[3:5]/length(colnames(dat11))
back1 = cbind(dat1[1],ccc,ddd)
colnames(back1)=c("ID","point","onen","twon","thrn","foun","tt","two","thr","fou")
back1[,which(colnames(back1)=="two")] = percent(back1$two,accuracy = TRUE)
back1[,which(colnames(back1)=="thr")] = percent(back1$thr,accuracy = TRUE)
back1[,which(colnames(back1)=="fou")] = percent(back1$fou,accuracy = TRUE)
ss1 = apply(dat11,2,ff1)
ss2 = apply(dat11,2,ff2)
ss3 = apply(dat11,2,ff3)
ss4 = apply(dat11,2,ff4)
ave1234 = cbind(ss1,ss2*2,ss3*3,ss4*4)
aves = apply(ave1234,1,mean)
sdp = sd(aves)*2
sdn = sd(aves)*(-2)
means = mean(aves)
upp = sdp+means
downn = sdn+means
ssst = ifelse(aves<upp&aves>downn,"NO","YES")
strick = cbind(aves,ssst)
maxs = max(aves)
mins = min(aves)
maxmin = c(maxs,mins)
meansd = c(round(mean(aves),2),round(sd(aves),2))
###形成评分员严厉度表###
strickf = rbind(strick,maxmin,meansd)
rownames(strickf) = c(colnames(dat11),"最大值&最小值","平均值&标准差")
```
\Large
本轮的评分员一致性**ICC信度**为**`r round(rater_consis1,2)`**, **Cronbach a信度**为**`r round(rater_consis1a,2)`**。
划定的合格Cut score为**`r cutp1`**分,计算pvalue为**`r round(pval1,3)`**(小于0.05表明整个评分过程无明显异常)。
划定的低中Cut score为**`r cutp2`**分,计算pvalue为**`r round(pval2,3)`**(小于0.05表明整个评分过程无明显异常)。
划定的中高Cut score为**`r cutp3`**分,计算pvalue为**`r round(pval3,3)`**(小于0.05表明整个评分过程无明显异常)。
$$ $$
```{r, echo=FALSE, message = FALSE, warning = FALSE}
###评分反馈表格化###
kable(back1,"latex", booktabs = T,longtable = TRUE,
col.names=c("ID","Point","Below-N","Low-N","Mid-N","High-N","Rater-N","Low-P","Mid-P","High-P"),
caption = "评定反馈表", align='ccccccccccc')%>%
kable_styling(latex_options = c("striped","repeat_header","scale_down",full_width = T
,position = "center", font_size =3))%>%
column_spec(1, bold = T)%>%
column_spec(2:length(colnames(back1)), width = "0.8cm")%>%
row_spec(0, bold = T,italic = T,font_size =8)%>%
row_spec(1:length(rownames(back1)), font_size =10)%>%
footnote(general = "本表列名依次为:作文标号、作文得分、评定未达标数量、低水平数量、中水平数量、高水平数量、评分员数量、低水平比例、中水平比例、高水平比例。",threeparttable = T,fixed_small_size = 3)
###评分员严厉度表格化###
kable(strickf,"latex", booktabs = T,longtable = TRUE,
col.names=c("严厉度","是否异常"), caption = "评分员严厉度",
align='ccc')%>%
kable_styling(latex_options = c("striped","repeat_header"),font_size = 14)%>%
column_spec(1, bold = T) %>%
row_spec(0, bold = T) %>%
row_spec((nrow(strickf)-1):nrow(strickf), bold = T, color = "black", background = "#ADD8E6")%>%
footnote(alphabet = "数值越大表明评定越宽松。 ")
```
$$ $$
# 反馈分布图
\Large
下列图片为本轮所有评分者将不同作文进行分类后的结果分布图。
若一篇作文两个及以上类别的直方图高度接近,表明评分者评定较不一致,需进行讨论。
```{r, echo=FALSE, message = FALSE, warning = FALSE,out.width ='\\textwidth',fig.align ='center'}
library('ggplot2')
data=t(back1[,-c(7:10)])
rownames(data) = c("Level","Total","Below","Basic", "Mid","High")
colnames(data) = data[1,]
dat=data[-c(1,2),]
dat = as.data.frame(dat)
Level <- factor(rownames(dat), levels=c("Below","Basic", "Mid","High"), ordered=TRUE)
for (i in 1:ncol(dat)){
p = ggplot(dat, aes(x=Level, y=as.numeric(dat[,i]))) +
geom_histogram(stat="identity",
fill=ifelse(sd(as.numeric(dat[,i]))<=4, "red","lightblue"),
colour="black")+
labs(y='Number of raters', x=colnames(data)[i])+
scale_y_continuous(breaks=seq(0,15,1))+
theme(axis.text.x = element_text(size = 14,color="black"),
axis.text.y = element_text(size = 15,color="black"),
axis.title.x=element_text(size=14))
print(p)
}
```
# 原始评分表(附表)
```{r, echo=FALSE, message = FALSE, warning = FALSE}
kable(dat1,"latex", booktabs = T,longtable = TRUE,
align='cccccccccccccc')%>%
kable_styling(latex_options = c("striped","repeat_header"),font_size = 8)
```