标准设定(Body of work方法)—R语言应用范例(Rmarkdown & Logistic回归)

标准设定(Standard setting)是定义标准界限的方法和过程,在标准参照测验开发过程中起着重要作用。通过标准设定,可以得到标准参照测验的临界分数,据此测试者将被划分为“通过”或“未通过”,或者是被划分为更多的有序表现类别。

Body of Work(BoW)方法是Holistic standard-setting methods中的一种,主要特征是由专家对不同水平的考生作答进行评估,并直接对每个作答样本做出(类别型)评判。该方法主要用于写作、口语、结构性作答等题型

在标准设定过程中,评分员的评定数据需要即刻分析并以报告的形式呈现。本文为使用BoW方法进行标准设定的实例,运用Rmarkdown ,对12个评分员参与的写作(50篇样文)标准设定结果给出实时分析报告。

以下为代码:

---
title: "写作标准设定(Body of work)"
author:
  - BENE
documentclass: ctexart
output:
  rticles::ctex:
    fig_caption: yes
    number_sections: yes
    toc: yes
classoption: "hyperref"
---
 
$$ $$
 
# 反馈卡
 
 
```{r setup, results='hide',include=FALSE}
 
#####设置全局选项,echo = TRUE表示在输出文件中显示代码#####
knitr::opts_chunk$set(echo = TRUE)
 
```
 
```{r, include=FALSE}
 
###设置全局选项,tinytex.verbose = TRUE表示在输出中显示详细信息###
options(tinytex.verbose = TRUE)
 
```
 
 
```{r, echo=FALSE, message = FALSE, warning = FALSE}
 
##清除内存##
rm(list = ls()) 
 
##载入相关R包##
library(readxl)
library(knitr)
library("kableExtra")
library("tinytex")
library(reshape)
library(plyr)
library(dplyr)
library('scales')
 
##载入分析文件##
dat1 = read_excel("dat1.xlsx",col_names = TRUE)
 
dat11 = dat1[,-c(1,2)]
 
ff1 =  function(x) (length(which(x==1)))
ff2 =  function(x) (length(which(x==2)))
ff3 =  function(x) (length(which(x==3)))
ff4 =  function(x) (length(which(x==4)))
 
###评分有四类,计算所有评分员在各类上的评分数量###
one = apply(dat11,1,ff1)
two = apply(dat11,1,ff2)
thr = apply(dat11,1,ff3)
fou = apply(dat11,1,ff4)
 
cc = cbind(one,two,thr,fou)##四类时使用
#cc = cbind(one,two)##两类时使用
tt = apply(cc,1,sum)
ccc = cbind(dat1[,2],one,two,thr,fou,tt)##四类时使用
#ccc = cbind(dat1[,2],one,two,tt)##两类时使用
 
###计算二、三、四类的评分比例###
lp1 = apply(ccc[,c(3:5)],1,sum)/ccc$tt##四类时使用
lp2 = apply(ccc[,c(4:5)],1,sum)/ccc$tt##四类时使用
lp3 = apply(as.data.frame(ccc[,5]),1,sum)/ccc$tt##四类时使用
#lp = round(apply(as.data.frame(ccc[,3]),1,sum)/ccc$tt,2)##两类时使用
 
####low#####
lpp1 = cbind(dat1[,2],tt,lp1)
lpp1[lpp1==0]<-NA
lpp1[lpp1==1]<-NA
lpp1<-na.omit(lpp1)
 
####mid#####
lpp2 = cbind(dat1[,2],tt,lp2)
lpp2[lpp2==0]<-NA
lpp2[lpp2==1]<-NA
lpp2<-na.omit(lpp2)
 
####high#####
lpp3 = cbind(dat1[,2],tt,lp3)
lpp3[lpp3==0]<-NA
lpp3[lpp3==1]<-NA
lpp3<-na.omit(lpp3)
 
####定义百分数转换到对数的函数####
fe =  function(x) (log(x/(1-x)))
 
 
####逻辑回归-计算low-cut point#####
le1 = apply(as.data.frame(lpp1$lp1),2,fe)
lll1 = glm(lp1~point,family=binomial(link='logit'),weights=tt,data=lpp1)
 
###
cutp1 = round(abs(-lll1$coefficients[1]/lll1$coefficients[2]),1)
rr1 = summary(lll1)
###
pval1 = round(rr1$coef[8],4)
 
####逻辑回归-计算mid-cut point#####
le2 = apply(as.data.frame(lpp2$lp2),2,fe)
lll2 = glm(lp2~point,family=binomial(link='logit'),weights=tt,data=lpp2)
###
cutp2 = round(abs(-lll2$coefficients[1]/lll2$coefficients[2]),1)
rr2 = summary(lll2)
###
pval2 = round(rr2$coef[8],4)
 
####逻辑回归-计算high-cut point#####
le3 = apply(as.data.frame(lpp3$lp3),2,fe)
lll3 = glm(lp3~point,family=binomial(link='logit'),weights=tt,data=lpp3)
###
cutp3 = round(abs(-lll3$coefficients[1]/lll3$coefficients[2]),1)
rr3 = summary(lll3)
###
pval3 = round(rr3$coef[8],4)
 
 
###计算评分信度###
v1 = melt(as.data.frame(dat11))
Tvar1 = var(v1$value)
item_mean11 = apply(dat11, 1, mean)
Ivar1 = as.numeric(var(item_mean11)) 
 
rater_consis1 = Ivar1/Tvar1 ###ICC信度###
rater_consis1a = (1- (sum(apply(dat11, 2, var)))/var(apply(dat11, 1, sum))) ###Cronbach a信度###
 
 
###形成评分反馈表###
ddd = ccc[3:5]/length(colnames(dat11))
back1 = cbind(dat1[1],ccc,ddd)
colnames(back1)=c("ID","point","onen","twon","thrn","foun","tt","two","thr","fou")
back1[,which(colnames(back1)=="two")] = percent(back1$two,accuracy = TRUE)
back1[,which(colnames(back1)=="thr")] = percent(back1$thr,accuracy = TRUE)
back1[,which(colnames(back1)=="fou")] = percent(back1$fou,accuracy = TRUE)
 
ss1 = apply(dat11,2,ff1)
ss2 = apply(dat11,2,ff2)
ss3 = apply(dat11,2,ff3)
ss4 = apply(dat11,2,ff4)
 
ave1234 = cbind(ss1,ss2*2,ss3*3,ss4*4)
aves = apply(ave1234,1,mean)
sdp = sd(aves)*2
sdn = sd(aves)*(-2)
means = mean(aves)
upp = sdp+means
downn = sdn+means
 
ssst = ifelse(aves<upp&aves>downn,"NO","YES")
 
strick = cbind(aves,ssst)
maxs = max(aves)
mins = min(aves)
maxmin = c(maxs,mins)
meansd = c(round(mean(aves),2),round(sd(aves),2))
 
###形成评分员严厉度表###
strickf = rbind(strick,maxmin,meansd)
rownames(strickf) = c(colnames(dat11),"最大值&最小值","平均值&标准差")
 
```
 
 
\Large 
本轮的评分员一致性**ICC信度**为**`r round(rater_consis1,2)`**, **Cronbach a信度**为**`r round(rater_consis1a,2)`**。
 
 
 
划定的合格Cut score为**`r cutp1`**分,计算pvalue为**`r round(pval1,3)`**(小于0.05表明整个评分过程无明显异常)。
 
 
划定的低中Cut score为**`r cutp2`**分,计算pvalue为**`r round(pval2,3)`**(小于0.05表明整个评分过程无明显异常)。
 
 
划定的中高Cut score为**`r cutp3`**分,计算pvalue为**`r round(pval3,3)`**(小于0.05表明整个评分过程无明显异常)。
 
$$ $$
 
 
```{r, echo=FALSE, message = FALSE, warning = FALSE}
 
 
###评分反馈表格化###
kable(back1,"latex", booktabs = T,longtable = TRUE,
      col.names=c("ID","Point","Below-N","Low-N","Mid-N","High-N","Rater-N","Low-P","Mid-P","High-P"),
      caption = "评定反馈表", align='ccccccccccc')%>%
  kable_styling(latex_options = c("striped","repeat_header","scale_down",full_width = T
                                  ,position = "center", font_size =3))%>%
  column_spec(1, bold = T)%>%
  column_spec(2:length(colnames(back1)), width = "0.8cm")%>%
  row_spec(0, bold = T,italic = T,font_size =8)%>%
  row_spec(1:length(rownames(back1)), font_size =10)%>%
  footnote(general  = "本表列名依次为:作文标号、作文得分、评定未达标数量、低水平数量、中水平数量、高水平数量、评分员数量、低水平比例、中水平比例、高水平比例。",threeparttable = T,fixed_small_size = 3)
 
 
###评分员严厉度表格化###
kable(strickf,"latex", booktabs = T,longtable = TRUE,
      col.names=c("严厉度","是否异常"), caption = "评分员严厉度", 
      align='ccc')%>%
  kable_styling(latex_options = c("striped","repeat_header"),font_size = 14)%>%
  column_spec(1, bold = T) %>%
  row_spec(0, bold = T) %>%
  row_spec((nrow(strickf)-1):nrow(strickf), bold = T, color = "black", background = "#ADD8E6")%>%
  footnote(alphabet = "数值越大表明评定越宽松。 ")
 
 
 
```
 
$$ $$
 
# 反馈分布图
 
\Large 
下列图片为本轮所有评分者将不同作文进行分类后的结果分布图。
 
若一篇作文两个及以上类别的直方图高度接近,表明评分者评定较不一致,需进行讨论。
 
 
 
```{r, echo=FALSE, message = FALSE, warning = FALSE,out.width ='\\textwidth',fig.align ='center'}
 
 
library('ggplot2')
 
data=t(back1[,-c(7:10)])
rownames(data) = c("Level","Total","Below","Basic", "Mid","High")
colnames(data) = data[1,]
dat=data[-c(1,2),]
 
dat =  as.data.frame(dat)
  
Level <- factor(rownames(dat), levels=c("Below","Basic", "Mid","High"), ordered=TRUE)
 
  
for (i in 1:ncol(dat)){
  
  p = ggplot(dat, aes(x=Level, y=as.numeric(dat[,i]))) + 
    geom_histogram(stat="identity",
                   fill=ifelse(sd(as.numeric(dat[,i]))<=4, "red","lightblue"), 
                   colour="black")+
    labs(y='Number of raters', x=colnames(data)[i])+
    scale_y_continuous(breaks=seq(0,15,1))+
    theme(axis.text.x = element_text(size = 14,color="black"),
          axis.text.y = element_text(size = 15,color="black"),
          axis.title.x=element_text(size=14))
  
  print(p)
}
 
 
```
 
 
# 原始评分表(附表)
 
```{r, echo=FALSE, message = FALSE, warning = FALSE}
 
kable(dat1,"latex", booktabs = T,longtable = TRUE,
      align='cccccccccccccc')%>%
  kable_styling(latex_options = c("striped","repeat_header"),font_size = 8)
 
```
 
 
 

最终生成的PDF如下:

 
 
 

分析数据表:

 

发表评论

您的电子邮箱地址不会被公开。