Title: | Information Analysis for Test and Rating Scale Data |
---|---|
Description: | Develop, evaluate, and score multiple choice examinations, psychological scales, questionnaires, and similar types of data involving sequences of choices among one or more sets of answers. This version of the package should be considered as brand new. Almost all of the functions have been changed, including their argument list. See the file NEWS.Rd in the Inst folder for more information. Using the package does not require any formal statistical knowledge beyond what would be provided by a first course in statistics in a social science department. There the user would encounter the concept of probability and how it is used to model data and make decisions, and would become familiar with basic mathematical and statistical notation. Most of the output is in graphical form. |
Authors: | James Ramsay [aut, cre], Juan Li [ctb], Marie Wiberg [ctb], Joakim Wallmark [ctb], Spencer Graves [ctb] |
Maintainer: | James Ramsay <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.3.3 |
Built: | 2024-11-15 05:22:51 UTC |
Source: | https://github.com/jamesramsay5/testgardener |
dataList
.
The test or rating scale data have already been processed by function make_dataList
or
or other code to produce the list object dataList
. The user defines a list vector
ParameterList
which stores results from a set of cycles of estimating surprisal curves
followed by estimating optimal score index values for each examinee or respondent. These
score index values are within the interval [0,100]. The number of analysis cycles is the
length of the parmList
list vector.
Analyze(index, indexQnt, dataList, NumDensBasis=7, ncycle=10, itdisp=FALSE, verbose=FALSE)
Analyze(index, indexQnt, dataList, NumDensBasis=7, ncycle=10, itdisp=FALSE, verbose=FALSE)
index |
A vector of |
indexQnt |
A vector of length |
dataList |
A list that contains the objects needed to analyse the test or rating scale with the following fields:
|
NumDensBasis |
The number of basis functions for representing the score density. |
ncycle |
The number of cycles executed by function |
itdisp |
If TRUE, the progress of the iterations within each cycle for estimating index are reported. |
verbose |
If TRUE, the stages of analysis within each cycle for estimating index are reported. |
The cycling process is described in detail in the references, and displayed in R code
in the vignette SweSATQuantitativeAnalysis
.
The list vector parmList
where each member is a named list object containing
the results of an analysis cycle. These results are:
index: |
The optimal estimates of the score index values for the
examinees/respondents. This is a vector of length |
indexQnt: |
A vector of length 2*nbin+1 containing bin boundaries alternating with bin edges. |
SfdList: |
A list vector containing results from the estimation of surprisal
curves. The list vector is of length |
meanF: |
For each person, the mean of the optimal fitting function values. |
binctr: |
A vector of length |
bdry: |
A vector of length |
freq: |
A vector of length |
pdf_fd |
Functional probability curves |
logdensfd: |
A functional data object defining the estimate of the log of the probability density function for the distribution of the score index values. |
C: |
The normalizing value for probability density functions. A density value is computed by dividing the exponential of the log density value by this constant. |
denscdf: |
The values over a fine mesh of the cumulative probability
distribution function. These values start at 0 and end with 1 and are increasing.
Ties are often found at the upper boundary, so that using these values for
interpolation purposes may require using the vector |
indcdf |
Equally spaced index values to match the number in denscdf. |
Qvec |
Locations of the marker percents. |
index |
The positions of each test taker on the score index continuum. |
Fval: |
A vector of length N containing the values of the negative log likelihood fitting criterion. |
DFval: |
A vector of length N containing the values of the first derivative of the negative log likelihood fitting criterion. |
D2Fval: |
A vector of length N containing the values of the second derivative of the negative log likelihood fitting criterion. |
active: |
A vector of length N of the activity status of the values of index. If convergence was not achieved, the value is TRUE, otherwise FALSE. |
infoSurp: |
The length of the space curve defined by the surprisal curves. |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
make_dataList,
TG_analysis,
index_distn,
index2info,
index_fun,
Sbinsmth
## Not run: # Example 1: Input choice data and key for the short version of the # SweSAT quantitative multiple choice test with 24 items and 1000 examinees # input the choice data as 1000 strings of length 24 # setup the input data list object dataList <- Quant_13B_problem_dataList # define the initial examinee indices and bin locations index <- dataList$percntrnk indexQnt <- dataList$indexQnt # Set the number of cycles (default 10 but here 5) ncycle <- 5 parmListvec <- Analyze(index, indexQnt, ncycle=ncycle, dataList, verbose=TRUE) # two column matrix containing the mean fit and arclength values # for each cycle HALsave <- matrix(0,ncycle,2) for (icycle in 1:ncycle) { HALsave[icycle,1] <- parmListvec[[icycle]]$meanF HALsave[icycle,2] <- parmListvec[[icycle]]$infoSurp } # plot the progress over the cycles of mean fit and arc length par(mfrow=c(2,1)) plot(1:ncycle, HALsave[,1], type="b", lwd=2, xlab="Cycle Number",ylab="Mean H") plot(1:ncycle, HALsave[,2], type="b", lwd=2, xlab="Cycle Number", ylab="Arc Length") ## End(Not run)
## Not run: # Example 1: Input choice data and key for the short version of the # SweSAT quantitative multiple choice test with 24 items and 1000 examinees # input the choice data as 1000 strings of length 24 # setup the input data list object dataList <- Quant_13B_problem_dataList # define the initial examinee indices and bin locations index <- dataList$percntrnk indexQnt <- dataList$indexQnt # Set the number of cycles (default 10 but here 5) ncycle <- 5 parmListvec <- Analyze(index, indexQnt, ncycle=ncycle, dataList, verbose=TRUE) # two column matrix containing the mean fit and arclength values # for each cycle HALsave <- matrix(0,ncycle,2) for (icycle in 1:ncycle) { HALsave[icycle,1] <- parmListvec[[icycle]]$meanF HALsave[icycle,2] <- parmListvec[[icycle]]$infoSurp } # plot the progress over the cycles of mean fit and arc length par(mfrow=c(2,1)) plot(1:ncycle, HALsave[,1], type="b", lwd=2, xlab="Cycle Number",ylab="Mean H") plot(1:ncycle, HALsave[,2], type="b", lwd=2, xlab="Cycle Number", ylab="Arc Length") ## End(Not run)
Used in dataSimulation
, this function sets up an N by n matrix of
index values that specify the index of the option chosen by an examinee or
respondent for a specific question.
chcemat_simulate(index.pop, SfdList)
chcemat_simulate(index.pop, SfdList)
index.pop |
A vector containing population score index values at which data are to be simulated. |
SfdList |
A numbered list object produced by a TestGardener analysis of a test. Its length
is equal to the number of items in the test or questions in the scale.
Each member of
|
For each question and each examinee a vector of random multinomial integer values is generated using the probability transforms of the surprisal curves and the examinee's score index value.
An N by n matrix of integer index values.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315. s
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
Estimate sum score,s score index values index and test information values bias and mean squared errors using simulated data.
dataSimulation(dataList, parmList, nsample = 1000)
dataSimulation(dataList, parmList, nsample = 1000)
dataList |
The list object set up by function |
parmList |
The list object containing objects computed by function
|
nsample |
The number of simulated samples. |
A named list object containing objects produced from analyzing the simulations, one set for each simulation:
sumscr: |
Sum score estimates |
index: |
Score index estimates |
mu: |
Expected sum score estimates |
info: |
Total arc length estimates |
index.pop: |
True or population score index values |
mu.pop: |
Expected sum score population values |
info.pop: |
Total test length population values |
n: |
Number of items |
nindex: |
Number of index values |
indfine: |
Fine mesh over score index range |
Qvec: |
Five marker percentages: 5, 25, 50, 75 and 95 |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
Plots the probability density function of a set of score values that are not at the score boundaries as a smooth curve, and also plots the proportions of score values at both boundaries as points. The score values are typically either the values of the score index values index or the infoSurp or information score values.
density_plot(scrvec, scrrng, Qvec, xlabstr=NULL, titlestr=NULL, scrnbasis=15, nfine=101)
density_plot(scrvec, scrrng, Qvec, xlabstr=NULL, titlestr=NULL, scrnbasis=15, nfine=101)
scrvec |
A vector of N score values |
scrrng |
A vector of length 2 containing boundary values |
Qvec |
A vector of length 5 containing the score values corresponding to the marker percentages 5, 25, 50, 75 and 95. |
xlabstr |
Label for abscissa |
titlestr |
Label for plot |
scrnbasis |
The number of spline basis functions used for representing the smooth density function |
nfine |
Number of plotting points |
A plot of the density function and a list vector densfine
containing:
densfine: |
Density values over a mesh of equally-spaced values of length 101. |
N_min: |
The number of examinees estimated to have zero information. |
N_max: |
The number of examinees estimated to have full information. |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# Example 1. Display probability density curve for the # score index values for the short SweSAT multiple choice test with # 24 items and 1000 examinees index <- Quant_13B_problem_parmList$index Qvec <- Quant_13B_problem_parmList$Qvec # plot the density for the score indices within interval c(0,100) oldpar <- par(no.readonly=TRUE) on.exit(oldpar) par(mfrow=c(2,1)) density_plot(index, c(0,100), Qvec, xlabstr="Score index", titlestr="SweSAT 13B Theta Density", scrnbasis=11, nfine=101) # arc length or information values scopevec <- Quant_13B_problem_infoList$scopevec Qinfovec <- Quant_13B_problem_infoList$Qinfovec infoSurp <- Quant_13B_problem_infoList$infoSurp # plot the density for the score indices within interval c(0,infoSurp) density_plot(scopevec, c(0,infoSurp), Qinfovec, xlabstr="Score index", titlestr="SweSAT 13B Theta Density", scrnbasis=11, nfine=101)
# Example 1. Display probability density curve for the # score index values for the short SweSAT multiple choice test with # 24 items and 1000 examinees index <- Quant_13B_problem_parmList$index Qvec <- Quant_13B_problem_parmList$Qvec # plot the density for the score indices within interval c(0,100) oldpar <- par(no.readonly=TRUE) on.exit(oldpar) par(mfrow=c(2,1)) density_plot(index, c(0,100), Qvec, xlabstr="Score index", titlestr="SweSAT 13B Theta Density", scrnbasis=11, nfine=101) # arc length or information values scopevec <- Quant_13B_problem_infoList$scopevec Qinfovec <- Quant_13B_problem_infoList$Qinfovec infoSurp <- Quant_13B_problem_infoList$infoSurp # plot the density for the score indices within interval c(0,infoSurp) density_plot(scopevec, c(0,infoSurp), Qinfovec, xlabstr="Score index", titlestr="SweSAT 13B Theta Density", scrnbasis=11, nfine=101)
DFfun computes the first and second derivatives of the negative log likelihoods for a set of examinees. Items can be either binary or multi-option. The analysis is within the closed interval [0,100].
DFfun(index, SfdList, chcemat)
DFfun(index, SfdList, chcemat)
index |
Initial values for score indices in [0,n]/[0,100]. Vector of size N. |
SfdList |
A numbered list object produced by a TestGardener analysis of
a test. Its length is equal to the number of items in the test or questions
in the scale. Each member of |
chcemat |
An |
A named list for results DF
and D2F
:
DF: |
First derivatives of the negative log likelihood values, vector of size N |
D2F: |
Second derivatives of the negative log likelihood values, vector of size N |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
make_dataList,
index_fun,
Ffun,
Ffuns_plot
# Example 1: # Compute the first and second derivative values of the objective function # for locating each examinee for the 24-item short form of the # SweSAT quantitative test on the percentile score index continuum. # Use only the first five examinees. chcemat <- Quant_13B_problem_dataList$chcemat SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index DFfunResult <- DFfun(index[1:5], SfdList, chcemat[1:5,]) DFval <- DFfunResult$DF D2Fval <- DFfunResult$D2F
# Example 1: # Compute the first and second derivative values of the objective function # for locating each examinee for the 24-item short form of the # SweSAT quantitative test on the percentile score index continuum. # Use only the first five examinees. chcemat <- Quant_13B_problem_dataList$chcemat SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index DFfunResult <- DFfun(index[1:5], SfdList, chcemat[1:5,]) DFval <- DFfunResult$DF D2Fval <- DFfunResult$D2F
Entropy is a scalar measure of how much information is required to predict
the outcome of a choice number 1 exactly, and consequently is a measure of item effectiveness suitable for multiple choice tests and rating scales.
Joint entropy
is a scalar measure of the cross-product of multinomial
vectors 1 and 2. Mutual entropy
is a measure
of the co-dependency of items 1 and 2, and thus the analogue of the negative
log of a squared correlation
. this function computes all four types
of entropies for two specificed items.
entropies(index, m, n, chcemat, noption)
entropies(index, m, n, chcemat, noption)
index |
A vector of length N containing score index values for each test taker. |
m |
The index of the first choice. |
n |
The index of the second choice. |
chcemat |
The data matrix containing the indices of choisen options for each test taker. |
noption |
A vector containing the number of options for all items. |
A named list object containing objects produced from analyzing the simulations, one set for each simulation:
I_m: |
The entropy of item m. |
I_n: |
The entropy of item n. |
J_nm: |
The joint entropy of items m and n. |
I_nm: |
The mutual entropy of items m and n. |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# Load needed objects chcemat <- Quant_13B_problem_dataList$chcemat index <- Quant_13B_problem_parmList$index noption <- matrix(5,24,1) # compute mutual entropies for all pairs of the first 6 items Mvec <- 1:6 Mlen <- length(Mvec) Hmutual <- matrix(0,Mlen,Mlen) for (i1 in 1:Mlen) { for (i2 in 1:i1) { Result <- entropies(index, Mvec[i1], Mvec[i2], chcemat, noption) Hmutual[i1,i2] = Result$Hmutual Hmutual[i2,i1] = Result$Hmutual } } print("Matrix of mutual entries (off-digagonal) and self-entropies (diagonal)") print(round(Hmutual,3))
# Load needed objects chcemat <- Quant_13B_problem_dataList$chcemat index <- Quant_13B_problem_parmList$index noption <- matrix(5,24,1) # compute mutual entropies for all pairs of the first 6 items Mvec <- 1:6 Mlen <- length(Mvec) Hmutual <- matrix(0,Mlen,Mlen) for (i1 in 1:Mlen) { for (i2 in 1:i1) { Result <- entropies(index, Mvec[i1], Mvec[i2], chcemat, noption) Hmutual[i1,i2] = Result$Hmutual Hmutual[i2,i1] = Result$Hmutual } } print("Matrix of mutual entries (off-digagonal) and self-entropies (diagonal)") print(round(Hmutual,3))
Item the value of the entropy curve at a point theta
is the expected
value of the surprisal curve values. Entropy is a measure of the randomness
of the surprisal value, which is maximized when all the surprisal curves
have the same value and has a minimum of zero if all but a single curve has
probability zero. This is unattainable in the calculation, but can be
arbitrarily close to this state.
Entropy_plot(scrfine, SfdList, Qvec, dataList, plotindex=1:n, plotrange=c(min(scrfine),max(scrfine)), height=1.0, value=0, ttlsz=NULL, axisttl=NULL, axistxt=NULL)
Entropy_plot(scrfine, SfdList, Qvec, dataList, plotindex=1:n, plotrange=c(min(scrfine),max(scrfine)), height=1.0, value=0, ttlsz=NULL, axisttl=NULL, axistxt=NULL)
scrfine |
A vector of length nfine (usually 101) containing equally spaced points spanning the plotrange. Used for plotting. |
SfdList |
A numbered list object produced by a TestGardener analysis of
a test. Its length is equal to the number of items in the test or questions
in the scale. Each member of |
Qvec |
The five marker percentile values. |
dataList |
A list vector containing objects essential to an analysis. |
plotindex |
A set of integers specifying the numbers of the items or questions to be displayed. |
plotrange |
A vector of length 2 containing the plot boundaries within or over the score index interval c(0,100). |
height |
A positive real number defining the upper limit on the ordinate for the plots. |
value |
Number required by ggplot2. Defaults to 0. |
ttlsz |
Title font size. |
axisttl |
Axis title font size. |
axistxt |
Axis text(tick label) font size. |
An entropy curve for each question indexed in the index
argument.
A request for a keystroke is made for each question. The answer to question
strongly defines the optimal position of an estimated score index value where
the curve is high value. Values of entropy curves typically range over [0,1].
The plots of the entropy curves specified in plotindex
are produced as a side
effect.
If saveplot
is TRUE, the plots of item entropy curves specified in
plotindex
are bundled into a single postscript or .pdf file and the file name
is defined by paste(dataList$titlestr,i,'-entropy.pdf',sep="")
.
The file is then output as a returned value.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
Sensitivity_plot
,
Power_plot
,
Ffuns_plot
,
ICC_plot
# Example 1. Display the item entropy curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees # plot the entropy curve for the first item dataList <- Quant_13B_problem_dataList SfdList <- Quant_13B_problem_parmList$SfdList Qvec <- Quant_13B_problem_parmList$Qvec scrfine <- seq(0,100,len=101) oldpar <- par(no.readonly=TRUE) Entropy_plot(scrfine, SfdList, Qvec, dataList, plotindex=1) par(oldpar)
# Example 1. Display the item entropy curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees # plot the entropy curve for the first item dataList <- Quant_13B_problem_dataList SfdList <- Quant_13B_problem_parmList$SfdList Qvec <- Quant_13B_problem_parmList$Qvec scrfine <- seq(0,100,len=101) oldpar <- par(no.readonly=TRUE) Entropy_plot(scrfine, SfdList, Qvec, dataList, plotindex=1) par(oldpar)
A surprisal vector of length M
is minus the log to a positive integer
base M
of a set of M
multinomial probabilities. Surprisal
curves are functions of a one-dimensional index set, such that at any
value of the index set the values of the curves are a surprisal vector.
See Details below for further explanations.
eval.surp(evalarg, Sfdobj, Zmat, nderiv = 0)
eval.surp(evalarg, Sfdobj, Zmat, nderiv = 0)
evalarg |
a vector or matrix of argument values at which the functional data object is to be evaluated. |
Sfdobj |
a functional data object of dimension |
Zmat |
An |
nderiv |
An integer defining a derivatve of |
A surprisal M
-vector is information measured in M
-bits.
Since a multinomial probability vector must sum to one, it follows that the
surprisal vector S
must satisfy the constraint
log_M(sum(M^(-S)) = 0.
That is, surprisal vectors lie within a
curved M-1
-dimensional manifold.
Surprisal curves are defined by a set of unconstrained M-1
B-spline
functional data objects defined over an index set that are transformed into
surprisal curves defined over the index set.
Let C
be a K
by M-1
coefficient matrix defining the
B-spline curves, where K
is the number of B-spline basis functions.
Let a M
by M-1
matrix Z
have orthonormal columns.
Matrices satisfying these constraints are generated by function
zerobasis()
.
Let N
by K
matrix be a matrix of B-spline basis values
evaluated at N
evaluation points using function
eval.basis()
.
Let N
by M
matrix X
= B * C * t(Z)
.
Then the N
by M
matrix S
of surprisal values is
S
= -X + outer(log(rowSums(M^X))/log(M),rep(1,M))
.
A N
by M
matrix S
of surprisal values at points
evalarg
, or their first or second derivatives.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# see example in man/smooth.surp.Rd
# see example in man/smooth.surp.Rd
A fast grid of values of the fitting function or one of its first two
derivatives is constructed for use in function indexsearch
.
Fcurve(SfdList, chcevec, nderiv=0)
Fcurve(SfdList, chcevec, nderiv=0)
SfdList |
A list vector containing specifications of surprisal curves for each item. |
chcevec |
A N by n matrix containing indices of chosen items for each test taker. |
nderiv |
Integer 0, 1 or 2 to indicate which level of derivative to use. |
A vector of length 101 containing grid values of a derivative of the fitting function
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# Compute a grid of values of the objective function for locating each # examinee or respondent for the 24-item short form of the SweSAT # quantitative test on the percentile score index continuum [0,100]. chcemat <- Quant_13B_problem_dataList$chcemat SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index n <- ncol(chcemat) # Fitting function for the first examinee j <- 1 chcevec <- as.numeric(chcemat[j,]) Fcurve1 <- Fcurve(SfdList, chcevec, 0) # First derivative of the fitting function for the first examinee DFcurve1 <- Fcurve(SfdList, chcevec, 1) # Second derivative of the fitting function for the first examinee D2Fcurve1 <- Fcurve(SfdList, chcevec, 2) oldpar <- par(no.readonly=TRUE) par(mfrow=c(3,1)) indfine <- seq(0,100,len=101) plot(indfine, Fcurve1, type="l", xlab="", ylab="Fitting curve", main="Examinee 1") plot(indfine, DFcurve1, type="l", xlab="", ylab="First derivative") points(index[1], 0, pch="o") abline(0,0,lty=2) plot(indfine, D2Fcurve1, type="l", xlab="Score index", ylab="Second derivative") abline(0,0,lty=2) points(index[1], 0, pch="o") par(oldpar)
# Compute a grid of values of the objective function for locating each # examinee or respondent for the 24-item short form of the SweSAT # quantitative test on the percentile score index continuum [0,100]. chcemat <- Quant_13B_problem_dataList$chcemat SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index n <- ncol(chcemat) # Fitting function for the first examinee j <- 1 chcevec <- as.numeric(chcemat[j,]) Fcurve1 <- Fcurve(SfdList, chcevec, 0) # First derivative of the fitting function for the first examinee DFcurve1 <- Fcurve(SfdList, chcevec, 1) # Second derivative of the fitting function for the first examinee D2Fcurve1 <- Fcurve(SfdList, chcevec, 2) oldpar <- par(no.readonly=TRUE) par(mfrow=c(3,1)) indfine <- seq(0,100,len=101) plot(indfine, Fcurve1, type="l", xlab="", ylab="Fitting curve", main="Examinee 1") plot(indfine, DFcurve1, type="l", xlab="", ylab="First derivative") points(index[1], 0, pch="o") abline(0,0,lty=2) plot(indfine, D2Fcurve1, type="l", xlab="Score index", ylab="Second derivative") abline(0,0,lty=2) points(index[1], 0, pch="o") par(oldpar)
Ffun computes the negative log likelihoods for a set of examinees, each at a single value index.
Ffun(index, SfdList, chcemat)
Ffun(index, SfdList, chcemat)
index |
A vector of size |
SfdList |
A numbered list object produced by a TestGardener analysis of
a test. Its length is equal to the number of items in the test or questions
in the scale. Each member of |
chcemat |
An |
A vector of length N
of negative log likelihood values.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
make_dataList,
index_fun,
Ffun,
Ffuns_plot
# Example 1: Compute the values of the objective function for locating each # examinee or respondent for the 24-item short form of the SweSAT quantitative # test on the percentile score index continuum [0,100]. # Use only the first five examinees chcemat <- Quant_13B_problem_dataList$chcemat SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index Fval <- Ffun(index[1:5], SfdList, chcemat[1:5,])
# Example 1: Compute the values of the objective function for locating each # examinee or respondent for the 24-item short form of the SweSAT quantitative # test on the percentile score index continuum [0,100]. # Use only the first five examinees chcemat <- Quant_13B_problem_dataList$chcemat SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index Fval <- Ffun(index[1:5], SfdList, chcemat[1:5,])
These plots indicate whether an appropriate minimum of the fitting criterion was found.
The value of index should be at the function minimum, the first derivative be close to
zero there, and the second derivative should be positive. If these conditions are not
met, it may be worthwhile to use function indexfun
initialized with an approximate
minimum value of score index index to re-estimate the value of index.
Ffuns_plot(evalarg, index, SfdList, chcemat, plotindex=1)
Ffuns_plot(evalarg, index, SfdList, chcemat, plotindex=1)
evalarg |
A vector containingg the sore index values to be evaluated. |
index |
The vector of of length |
SfdList |
The list vector of length |
chcemat |
The entire |
plotindex |
A subset of the integers |
The curves are displayed in three vertically organized panels along with values of index and the values and first two derivative values of the fit criterion. If more than one index value is used, a press of the Enter or Return key moves to the next index value.
A list vector is returned which is of the length of argument plotindex
. Each
member of the vector is a gg
or ggplot
object for the associated
plotindex value
. Each plot can be displayed using the print
command.
The plots of item power are produced as a side value even if no output object is
specified in the call to the function.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# Example 1. Display fit criterion values and derivatives for the # short SweSAT multiple choice test with 24 items and 1000 examinees chcemat <- Quant_13B_problem_dataList$chcemat index <- Quant_13B_problem_parmList$index SfdList <- Quant_13B_problem_parmList$SfdList plotindex <- 1:3 indfine <- seq(0,100,len=101) Ffuns_plot(indfine, index, SfdList, chcemat, plotindex)
# Example 1. Display fit criterion values and derivatives for the # short SweSAT multiple choice test with 24 items and 1000 examinees chcemat <- Quant_13B_problem_dataList$chcemat index <- Quant_13B_problem_parmList$index SfdList <- Quant_13B_problem_parmList$SfdList plotindex <- 1:3 indfine <- seq(0,100,len=101) Ffuns_plot(indfine, index, SfdList, chcemat, plotindex)
This is an S3 object that contains information essential plotting probability and surprisal curves for a single multiple choice or rating question. Bin probabilities and surprisal values can also be plotted.
ICC(x, M, Sfd, Zmat, Pbin, Sbin, Pmatfine, Smatfine, DSmatfine, D2Smatfine, PStdErr, SStdErr, ItemArcLen, itemStr=NULL, optStr=NULL)
ICC(x, M, Sfd, Zmat, Pbin, Sbin, Pmatfine, Smatfine, DSmatfine, D2Smatfine, PStdErr, SStdErr, ItemArcLen, itemStr=NULL, optStr=NULL)
x |
An item number. |
M |
The number of options for this item, including an option for missing or illegal values if required. |
Sfd |
A functional surprisal curve object defined by |
Zmat |
An |
Pbin |
A |
Sbin |
A |
Pmatfine |
A 101 by |
Smatfine |
A 101 by |
DSmatfine |
A 101 by |
D2Smatfine |
A 101 by |
PStdErr |
A 101 by |
SStdErr |
A 101 by |
ItemArcLen |
The scope or arc length of the item curve. |
itemStr |
A string that is the name of the item. |
optStr |
A character vector containing labels for the item options. |
The name ICC for this object is an acronym for the term "item characteristic curve" widely used in the psychometric commuunity.
Function ICC is set up after the initialization process in function
make_dataList()
has created the members of dataList
.
Within this list is object SfdList
, which cintains a functional data
object Sfd
for each item. Both the intial coefficient matrices and
the subsequent estimates of them are available from Sfd$coefs
, and
therefore are available in the ICC object. These coefficient matrices are
K
by M-1
where K
is the number of basis functions and
M
is the number of options for asn item.
The values returned are simply those in the argument list. The S3
ICC object checks each of these and makes available the S3 commands or methods
str
, print
and plot
that apply the corresponding
ICC
versions of these opterations.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
ICC_plot plots each item in argument plotindex
in turn after
functionSbinsmth()
has used spline smoothing to estimate item and
option characteristic curves.
ICC_plot(scrfine, SfdList, dataList, Qvec, binctr=NULL, data_point = FALSE, ci = FALSE, plotType="S", Srng=c(0,5), DSrng=c(-0.2, 0.2), plotindex=1:n, titlestr = NULL, itemscopevec = rep(0, length(plotindex)), plotTitle = TRUE,autoplot = FALSE, plotMissing = TRUE, plotrange=c(min(scrfine),max(scrfine)), shaderange = NULL, ttlsz = NULL, axisttl = NULL, axistxt = NULL, lgdlab = NULL, lgdpos = "bottom")
ICC_plot(scrfine, SfdList, dataList, Qvec, binctr=NULL, data_point = FALSE, ci = FALSE, plotType="S", Srng=c(0,5), DSrng=c(-0.2, 0.2), plotindex=1:n, titlestr = NULL, itemscopevec = rep(0, length(plotindex)), plotTitle = TRUE,autoplot = FALSE, plotMissing = TRUE, plotrange=c(min(scrfine),max(scrfine)), shaderange = NULL, ttlsz = NULL, axisttl = NULL, axistxt = NULL, lgdlab = NULL, lgdpos = "bottom")
scrfine |
A vector of 101 plotting points. |
SfdList |
A numbered list object produced by a TestGardener analysis of
a test. Its length is equal to the number of items in the test or
questions in the scale. Each member of |
dataList |
A list that contains the objects needed to analyse the test or rating scale. |
Qvec |
A vector of five marker percentile values. For plotting over information, this is replaced by Qinfovec returned as parmList$Qinfovec. |
binctr |
A vector of bin center values. If the plot is over
arc length or information, |
data_point |
A logical value indicating whether to plot the data points. |
ci |
A logical value indicating whether to plot the confidence limits. |
plotType |
Type(s) of plot, default as "P" for probability, can also be "S" for surprisal, "DS" for sensitivity, and any combination of the three |
Srng |
A vector of length 2 specifing the plotting range for surprisal values. |
DSrng |
A vector of length 2 specifing the plotting range for sensitivity values. |
plotindex |
A vector of indices of items to be plotted. |
titlestr |
plot title |
itemscopevec |
A numeric vector containing item scope values. |
plotTitle |
indicator of showing the plot title, default as TRUE |
autoplot |
indicator for plotting all items in a batch |
plotMissing |
Determine if plot the extra option for missing/spoiled responses. |
plotrange |
A vector of length 2 containing the plot boundaries of the score index interval. |
shaderange |
a list of length 2 vector(s); set if users want to gray out specific score range(s) |
ttlsz |
Title font size. |
axisttl |
Axis title font size. |
axistxt |
Axis text(tick label) font size. |
lgdlab |
Legend label font size. |
lgdpos |
legend position, could be set as "None" to remove the legend. |
A list vector is returned which is of the length of argument plotindex
.
Each member of the vector is a gg
or ggplot
object for the
associated plotindex value
. Each plot can be displayed using the
print
command.
The plots of item power are produced as a side value even if no output object
is specified in the call to the function.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
ICC
,
Sensitivity_plot
,
Power_plot
,
Entropy_plot
,
Sbinsmth
# Example 1. Display the item surprisal curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees dataList <- Quant_13B_problem_dataList SfdList <- Quant_13B_problem_parmList$SfdList Qvec <- Quant_13B_problem_parmList$Qvec binctr <- Quant_13B_problem_parmList$binctr infoSurpvec <- Quant_13B_problem_infoList$infoSurpvec Qinfovec <- Quant_13B_problem_infoList$Qinfovec bininfoctr <- Quant_13B_problem_infoList$bininfoctr titlestr <- "Quant_13B_problem" # plot the curves for the first question over the score index oldpar <- par(no.readonly=TRUE) indfine <- seq(0,100,len=101) ICC_plot(indfine, SfdList, dataList, Qvec, binctr, data_point = TRUE, plotType = c("S", "P"), Srng=c(0,4), plotindex=1) # plot the curves for the first question over test information ICC_plot(infoSurpvec, SfdList, dataList, Qinfovec, bininfoctr, data_point = TRUE, plotType = c("S", "P"), Srng=c(0,4), plotindex=1) par(oldpar)
# Example 1. Display the item surprisal curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees dataList <- Quant_13B_problem_dataList SfdList <- Quant_13B_problem_parmList$SfdList Qvec <- Quant_13B_problem_parmList$Qvec binctr <- Quant_13B_problem_parmList$binctr infoSurpvec <- Quant_13B_problem_infoList$infoSurpvec Qinfovec <- Quant_13B_problem_infoList$Qinfovec bininfoctr <- Quant_13B_problem_infoList$bininfoctr titlestr <- "Quant_13B_problem" # plot the curves for the first question over the score index oldpar <- par(no.readonly=TRUE) indfine <- seq(0,100,len=101) ICC_plot(indfine, SfdList, dataList, Qvec, binctr, data_point = TRUE, plotType = c("S", "P"), Srng=c(0,4), plotindex=1) # plot the curves for the first question over test information ICC_plot(infoSurpvec, SfdList, dataList, Qinfovec, bininfoctr, data_point = TRUE, plotType = c("S", "P"), Srng=c(0,4), plotindex=1) par(oldpar)
Computes the cumulated density for distribution function,
the probability density function, and the log probability density function
as fd objects by spline smoothing of the score values indexdens
using
the basis object logdensbasis
. The norming constant C
is also
output.
The score values may score index values index
, expected test score
values mu
, or arc length locations on the test information or scale
curve. The argument functional data object logdensfd
should have a
range that is appropriate for the score values being represented:
For score indices, [0,100], for expected test scores, the range of observed or
expected scores; and for test information curve locations in the interval
[0,infoSurp
].
index_distn(indexdens, logdensbasis, pvec=c(0.05, 0.25, 0.50, 0.75, 0.95), nfine = 101)
index_distn(indexdens, logdensbasis, pvec=c(0.05, 0.25, 0.50, 0.75, 0.95), nfine = 101)
indexdens |
A vector of score index, test score, or arc length values. In the score index case, these are usually only the values in the interior of the interval [0,100]. |
logdensbasis |
A functional basis object for representing the log density
function. The argument may also be a functional data object
( |
pvec |
A vector length NL containing the marker percentages. |
nfine |
The number of values in a fine grid, default as 101. |
A named list containing:
pdf_fd: |
An fd object for the probability density function values over the fine mesh. |
cdffine: |
A vector of cumulative probability values beginning with zero and ending with 1. It must not have ties. |
pdffine: |
A vector of probability values. |
logdensfd: |
A functional data object ( |
C: |
The normalization constant for computing the probability
density function with the command |
denscdf: |
A set of unique values of the cumulative probability
function defined over an equally spaced mesh of score index values
of the same length as |
indcdf: |
A vector of values within [0,100] corresponding to the
values in |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
index_fun
,
index2info
,
mu
,
scoreDensity
# Example 1. Display the item power curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees # Assemble information for estimating index density indfine <- seq(0,100,len=101) SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index N <- length(index) # Define the density for only interior index values inside <- index > 0 & index < 100 indexdens <- index[inside] logdensbasis <- fda::create.bspline.basis(c(0,100), 15) index_distnList <- index_distn(index[inside], logdensbasis) denscdf <- as.numeric(index_distnList$denscdf) indcdf <- as.numeric(index_distnList$indcdf) # adjusted marker score index values are computed by interpolation markers <- c(.05, .25, .50, .75, .95) Qvec <- pracma::interp1(denscdf, indcdf, markers) result <- density_plot(indexdens, c(0,100), Qvec)
# Example 1. Display the item power curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees # Assemble information for estimating index density indfine <- seq(0,100,len=101) SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index N <- length(index) # Define the density for only interior index values inside <- index > 0 & index < 100 indexdens <- index[inside] logdensbasis <- fda::create.bspline.basis(c(0,100), 15) index_distnList <- index_distn(index[inside], logdensbasis) denscdf <- as.numeric(index_distnList$denscdf) indcdf <- as.numeric(index_distnList$indcdf) # adjusted marker score index values are computed by interpolation markers <- c(.05, .25, .50, .75, .95) Qvec <- pracma::interp1(denscdf, indcdf, markers) result <- density_plot(indexdens, c(0,100), Qvec)
The percentile score index values are estimated for each person. The estimates minimize the negative log likelihoods, which are a type of surprisal. The main optimization method is a safe-guarded Newton-Raphson method.
For any iteration the method uses only those scores that are within the interior
of the interval [0,100] or at a boundary with a first derivative that would
take a step into the interior, and have second derivative values exceeding the
value of argument crit
. Consequently the number of values being
optimized decrease on each iteration, and iterations cease when either
all values meet the convergence criterion or are optimized on a
boundary, or when the number of iterations reaches itermax
.
At that point, if there are any interior scores still associated with
either non-positive second derivatives or values that exceed
crit
, the minimizing value along a fine mesh is used.
If itdisp
is positive, the number of values to be estimated
are printed for each iteration.
index_fun(index, SfdList, chcemat, itermax = 20, crit = 0.001, itdisp = FALSE)
index_fun(index, SfdList, chcemat, itermax = 20, crit = 0.001, itdisp = FALSE)
index |
A vector of size |
SfdList |
A list vector of length equal to the number of questions. Each member contains eight results for the surprisal curves associated with a question. |
chcemat |
A matrix number of rows equal to the number of examinees or respondents, and number of columns equal to number of items. The values in the matrix are indices of choices made by each respondent to each question. |
itermax |
Maximum number of iterations for computing the optimal index values. Default is 20. |
crit |
Criterion for convergence of optimization. Default is 1e-8. |
itdisp |
If TRchcematE, results are displayed for each iteration. |
A named list with these members:
index_out: |
A vector of optimized score index value. |
Fval: |
The negative log likelihood criterion. |
DFval: |
The first derivative of the negative likelihood. |
D2Fval: |
The second derivative of the negative likelihood. |
iter: |
The number iterations used. |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
index_distn
,
Ffun
,
DFfun
,
index2info
,
scoreDensity
# Optimize the indices defining the data fits for the first five examinees # input the choice indices in the 1000 by 24 choice index matrix chcemat <- Quant_13B_problem_chcemat # First set up the list object for surprisal curves computed from # initial index estimates. SfdList <- Quant_13B_problem_dataList$SfdList # Their initial values are the percent rank values ranging over [0,100] index_in <- Quant_13B_problem_dataList$percntrnk[1:5] # set up choice indices for first five examinees chcemat_in <- chcemat[1:5,] # optimize the initial indices indexfunList <- index_fun(index_in, SfdList, chcemat_in) # optimal index values index_out <- indexfunList$index_out # The surprisal data fit values Fval_out <- indexfunList$Fval # The surprisal data fit first derivative values DFval_out <- indexfunList$DFval # The surprisal data fit second derivative values D2Fval_out <- indexfunList$D2Fval # The number of index values that have not reached the convergence criterion active_out <- indexfunList$active
# Optimize the indices defining the data fits for the first five examinees # input the choice indices in the 1000 by 24 choice index matrix chcemat <- Quant_13B_problem_chcemat # First set up the list object for surprisal curves computed from # initial index estimates. SfdList <- Quant_13B_problem_dataList$SfdList # Their initial values are the percent rank values ranging over [0,100] index_in <- Quant_13B_problem_dataList$percntrnk[1:5] # set up choice indices for first five examinees chcemat_in <- chcemat[1:5,] # optimize the initial indices indexfunList <- index_fun(index_in, SfdList, chcemat_in) # optimal index values index_out <- indexfunList$index_out # The surprisal data fit values Fval_out <- indexfunList$Fval # The surprisal data fit first derivative values DFval_out <- indexfunList$DFval # The surprisal data fit second derivative values D2Fval_out <- indexfunList$D2Fval # The number of index values that have not reached the convergence criterion active_out <- indexfunList$active
Multiple minima are found quite often in the data fitting function that is
minimized using function indexfun
, and in roughly 10 percent of
the estimates there is a minimum that it lower than that detected. The
function searches a mesh of 101 points for minima, computes the fitting
function at the minima, and assigns the location of the global minimum as
the replacement index if the location differs by more than 0.5 from
the value identified by index_fun
. The function values and their
first two derivatives are also replaced.
index_search(SfdList, chcemat, index, Fval, DFval, D2Fval, indexind=1:N)
index_search(SfdList, chcemat, index, Fval, DFval, D2Fval, indexind=1:N)
SfdList |
A list vector containing specifications of surprisal curves for each item. |
chcemat |
An |
index |
A vector containing all the score index values. |
Fval |
A vector containing the |
DFval |
A vector containing the |
D2Fval |
A vector containing the |
indexind |
A vector containing indices of values to be processed. |
A named list object containing objects produced from analyzing the simulations, one set for each simulation:
index: |
A vector containing all the score index values including those that are altered. |
Fval: |
A vector containing the N function values included those that are altered. |
DFval: |
A vector containing the N first derivative values included those that are altered. |
D2Fval: |
A vector containing the N second derivative values included those that are altered. |
changeindex: |
Indices of the index values that are altered |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# Search for values of index that are not at the global minimum of the # fitting function and replace them as well as their function and # derivative values associated with the fine grid value nearest the # the global minimum. chcemat <- Quant_13B_problem_chcemat key <- Quant_13B_problem_key SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index Fval <- Quant_13B_problem_parmList$Fval DFval <- Quant_13B_problem_parmList$DFval D2Fval <- Quant_13B_problem_parmList$D2Fval Result <- index_search(SfdList, chcemat, index, Fval, DFval, D2Fval) changeindex <- Result$changeindex print(paste("Number changed =",length(changeindex))) change <- index[changeindex] - Result$index[changeindex]
# Search for values of index that are not at the global minimum of the # fitting function and replace them as well as their function and # derivative values associated with the fine grid value nearest the # the global minimum. chcemat <- Quant_13B_problem_chcemat key <- Quant_13B_problem_key SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index Fval <- Quant_13B_problem_parmList$Fval DFval <- Quant_13B_problem_parmList$DFval D2Fval <- Quant_13B_problem_parmList$D2Fval Result <- index_search(SfdList, chcemat, index, Fval, DFval, D2Fval) changeindex <- Result$changeindex print(paste("Number changed =",length(changeindex))) change <- index[changeindex] - Result$index[changeindex]
The one-dimensional psychometric model defines a space curve within the vector space defined by the total collection of option surprisal curves. This curve is a valuable resource since positions along the curve are defined in bits and positions on the curve are subject to the same strict properties that apply to physical measurements.
Function index2info
is required to convert
objects defined over the score index continuum c(0,100)
to
the same objects over the arc length continuum c(0,infoSurp)
, and also
vice versa. Since the arc length or information continuum is along
a space curve that is invariant under strictly monotone transformations
of the score index index
, and is also a metric, it is an ideal
choice for the abscissa in all plots.
index2info(index, Qvec, SfdList, binctr, itemindex=1:n, plotrng=c(0,100), shortwrd)
index2info(index, Qvec, SfdList, binctr, itemindex=1:n, plotrng=c(0,100), shortwrd)
index |
A vector of score index, test score, or arc length values, one for each examinee or respondent. |
Qvec |
A vector of locations of the five marker percentages. |
SfdList |
A numbered list object produced by a TestGardener analysis of
a test. Its length is equal to the number of items in the test or questions
in the scale. Each member of |
binctr |
A vector of locations of the bin centers. |
itemindex |
A vector containing the indices of the items to be used. |
plotrng |
A vector of length 2 containing the starting score index and end score index values of the range to be plotted. |
shortwrd |
If TRUE only vectors infoSurp and infoSurpvec are returned in
order to speed up the computation within cycles in function |
A named list object containing these results of the analysis:
infoSurp |
The length of the test information or scale curve. |
infoSurpvec |
Positions on the test information or scale curve corresponding to a fine mesh of score index values (typically 101 values between 0 and 100). |
infoSurpfd |
Functional data object representing the relation between the score index abscissa and the infoSurp or information ordinate. |
scopevec |
A vector of positions on the test information or scale curve
corresponding to the input score index values in argument |
Qvec_al |
Values in arc length of the five marker percentages. |
binctr_al |
Values in arc length of the bin centers. |
Sfd.info |
A functional data object representing the relation between the infoSurp or information abscissa and the score index ordinate. |
Sdim.index |
The dimension of the overspace, which equal to sum of the
number of options in the items specified in |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# Example 1. Display the scope or information curve for the # short SweSAT multiple choice test with 24 items and 1000 examinees. # The scope curve is constructed using the complete analysis cycles. # Set up the required arguments using the converged parmList object. indfine <- seq(0,100,len=101) index <- Quant_13B_problem_parmList$index Qvec <- Quant_13B_problem_parmList$Qvec SfdList <- Quant_13B_problem_parmList$SfdList binctr <- Quant_13B_problem_parmList$binctr # Carry out the construction of the information results. infoList <- index2info(index, Qvec, SfdList, binctr) # Plot the shape of the information curve oldpar <- par(no.readonly=TRUE) Scope_plot(infoList$infoSurp, infoList$infoSurpvec) par(oldpar)
# Example 1. Display the scope or information curve for the # short SweSAT multiple choice test with 24 items and 1000 examinees. # The scope curve is constructed using the complete analysis cycles. # Set up the required arguments using the converged parmList object. indfine <- seq(0,100,len=101) index <- Quant_13B_problem_parmList$index Qvec <- Quant_13B_problem_parmList$Qvec SfdList <- Quant_13B_problem_parmList$SfdList binctr <- Quant_13B_problem_parmList$binctr # Carry out the construction of the information results. infoList <- index2info(index, Qvec, SfdList, binctr) # Plot the shape of the information curve oldpar <- par(no.readonly=TRUE) Scope_plot(infoList$infoSurp, infoList$infoSurpvec) par(oldpar)
The list object dataList
contains 22 objects that supply all of the
information required to analyze the data.
Initial values of the score indices in object theta
and the bin
boundaries and centres in object thetaQnt
.
The returned named list object contains 22 named members, which are described
in the value section below.
make_dataList(chcemat, scoreList, noption, sumscr_rng=NULL, titlestr=NULL, itemlabvec=NULL, optlabList=NULL, nbin=nbinDefault(N), NumBasis=7, jitterwrd=TRUE, PcntMarkers=c( 5, 25, 50, 75, 95), verbose=FALSE)
make_dataList(chcemat, scoreList, noption, sumscr_rng=NULL, titlestr=NULL, itemlabvec=NULL, optlabList=NULL, nbin=nbinDefault(N), NumBasis=7, jitterwrd=TRUE, PcntMarkers=c( 5, 25, 50, 75, 95), verbose=FALSE)
chcemat |
An N by n matrix. Column |
scoreList |
Either a list of length n, each containing a vector of
length |
noption |
A numeric vector of length |
sumscr_rng |
A numeric vector of length two containing the initial and final values for the interval over which test scores are to be plotted. Default is minimum and maximum sum score. |
titlestr |
A title string for the data and their analyses. Default is NULL. |
itemlabvec |
A character value containing labels for the items. Default is NULL and item position numbers are used. |
optlabList |
A list vector of length |
nbin |
The number of bins for containing proportions of examinees choosing options. The default is computed by a function that uses the number of examinees. |
NumBasis |
The number of spline basis functions used to represent surprisal curves. The default is computed by a function that uses the number of examinees. |
jitterwrd |
A boolian constant: TRUE implies adding a small random value to each sum score value prior to computing percent rank values. |
PcntMarkers |
Used in plots of curves to display marker or reference percentage points for abscissa values in plots. |
verbose |
If TRUE details of calculations are displayed. |
The score range defined scrrng
should contain all of the sum score
values, but can go beyond their boundaries if desired. For example,
it may be that no examinee gets a zero sum score, but for reporting and
display purposes using zero as the lower limit seems desirable.
The number of bins is chosen so that a minimum of at least about 25 initial
percentage ranks fall within a bin. For larger samples, the number per bin
is also larger, making the proportions of choice more accurate. The number
bins can be set by the user, or by a simple algorithm used to adjust the
number of bins to the number N
or examinees.
The number of spline basis functions used to represent a surprisal curve should be small for small sample sizes, but can be larger when larger samples are involved.
There must be at least two basis functions, corresponding to two straight lines. The norder of this simple spline would not exceed 1, corresponding to taking only a single derivative of the resulting spline. But this rule is bent here to allow higher higher derivatives, which will autmatically have values of zero, in order to allow these simple linear basis functions to be used. This permits direct comparisons of TestGardener models with the many classic item response models that use two or less parameters per item response curve.
Adding a small value to discrete values before computing ranks is considered a useful way of avoiding any biasses that might arise from the way the data are stored. The small values used leave the rounded jittered values fixed, but break up ties for sum scores.
It can be helpful to see in a plot where special marker percentages 5, 25, 50, 75 and 95 percent of the interval [0,100] are located. The median abscissa value is at 50 per cent for initial percent rank values, for example, but may not be located at the center of the interval after iterations of the analysis cycle.
A named list with named members as follows:
chcemat: |
A matrix of response data with N rows and n columns where
N is number of examinees or respondents and n is number of items.
Entries in the matrices are the indices of the options chosen.
Column i of chcemat is expected to contain only the integers
|
optList: |
A list vector containing the numerical score values assigned to the options for this question. |
key: |
If the data are from a test of the multiple choices type where the right answer is scored 1 and the wrong answers 0, this is a numeric vector of length n containing the indices the right answers. Otherwise, it is NULL. |
Sfd: |
A fd object for the defining the surprisal curves. |
noption: |
A numeric vector of length n containing the numbers of options for each item. |
nbin: |
The number of bins for binning the data. |
scrrng: |
A vector of length 2 containing the limits of observed sum scores. |
scrfine: |
A fine mesh of test score values for plotting. |
scrvec: |
A vector of length N containing the examinee or respondent sum scores. |
itemvec: |
A vector of length n containing the question or item sum scores. |
percntrnk: |
A vector length N containing the sum score percentile ranks. |
thetaQnt: |
A numeric vector of length 2*nbin + 1 containing
the bin boundaries alternating with the bin centers. These are initially
defined as |
Sdim: |
The total dimension of the surprisal scores. |
PcntMarkers: |
The marker percentages for plotting: 5, 25, 50, 75 and 95. |
grbg: |
A logical vector of length number of questions. TRUE for an item indicates that a garbage option must be added to the score values, and FALSE indicates that there are no illegal or missing responses and the number of options is equal to number of score values. |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
TG_analysis,
Analyze,
index_distn,
index2info,
index_fun,
Sbinsmth
# Example 1: Input choice data and key for the short version of the # SweSAT quantitative multiple choice test with 24 items and 1000 examinees # input the choice data as 1000 strings of length 24 # set up index and key data chcemat <- Quant_13B_problem_chcemat key <- Quant_13B_problem_key # number of examinees and of items N <- nrow(chcemat) n <- ncol(chcemat) # number of options per item and option weights noption <- rep(0,n) for (i in 1:n) noption[i] <- 4 scoreList <- list() # option scores for (item in 1:n){ scorei <- rep(0,noption[item]) scorei[Quant_13B_problem_key[item]] <- 1 scoreList[[item]] <- scorei } # Use the input information to define the # big three list object containing info about the input data dataList <- make_dataList(chcemat, scoreList, noption)
# Example 1: Input choice data and key for the short version of the # SweSAT quantitative multiple choice test with 24 items and 1000 examinees # input the choice data as 1000 strings of length 24 # set up index and key data chcemat <- Quant_13B_problem_chcemat key <- Quant_13B_problem_key # number of examinees and of items N <- nrow(chcemat) n <- ncol(chcemat) # number of options per item and option weights noption <- rep(0,n) for (i in 1:n) noption[i] <- 4 scoreList <- list() # option scores for (item in 1:n){ scorei <- rep(0,noption[item]) scorei[Quant_13B_problem_key[item]] <- 1 scoreList[[item]] <- scorei } # Use the input information to define the # big three list object containing info about the input data dataList <- make_dataList(chcemat, scoreList, noption)
Compute the expected test score by substituting probability of choices for indicator variable 0-1 values. Binary items assumed coded as two choice items.
mu(index, SfdList, scoreList)
mu(index, SfdList, scoreList)
index |
Initial values for score indices in the interval [0,100].
A vector of size |
SfdList |
A numbered list object produced by a TestGardener analysis of a test. Its length
is equal to the number of items in the test or questions in the scale.
Each member of |
scoreList |
A numbered list of length n. Each member contains the weights assigned to each option for that item or question. |
A vector of test score values.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Siberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Siberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# Example 1. Compute expected sum score values for the # short SweSAT multiple choice test with 24 items and 1000 examinees scoreList <- Quant_13B_problem_dataList$scoreList SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index muvec <- mu(index, SfdList, scoreList) par(c(1,1)) hist(muvec,11)
# Example 1. Compute expected sum score values for the # short SweSAT multiple choice test with 24 items and 1000 examinees scoreList <- Quant_13B_problem_dataList$scoreList SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index muvec <- mu(index, SfdList, scoreList) par(c(1,1)) hist(muvec,11)
The expected score mu(index)
is a function of the score index index. A diagonal
dashed line is displayed to show the linear relationship to the score range interval.
mu_plot(mufine, scrrng, titlestr)
mu_plot(mufine, scrrng, titlestr)
mufine |
A mesh of 101 equally spaced values of mu as a function of index. |
scrrng |
A vector of length 2 containing the score range. |
titlestr |
A string containing the title of the data. |
A gg
or ggplot
object defining the plot of the expected test score
mu
as a function of the score index index
. This is displayed by
the print
command.
The plot is automatically displayed as a side value even if no return object is
specified in the calling statement.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
Item surprisal power curves are the square root of the sum over options of the squared surprisal sensitivity curves.
Power_plot(scrfine, SfdList, Qvec, dataList, plotindex=1:n, plotrange=c(min(scrfine),max(scrfine)), height=0.5, value=0, ttlsz=NULL, axisttl=NULL, axistxt=NULL)
Power_plot(scrfine, SfdList, Qvec, dataList, plotindex=1:n, plotrange=c(min(scrfine),max(scrfine)), height=0.5, value=0, ttlsz=NULL, axisttl=NULL, axistxt=NULL)
scrfine |
A vector of length nfine (usually 101) containing equally spaced points spanning the plotrange. Used for plotting. |
SfdList |
A numbered list object produced by a TestGardener analysis of a test.
Its length is equal to the number of items in the test or questions in
the scale. Each member of |
Qvec |
The five marker percentile values. |
dataList |
A list vector containing objects essential to an analysis. |
plotindex |
A set of integers specifying the numbers of the items or questions to be displayed. |
plotrange |
A vector of length 2 containing the plot boundaries within or over the score index interval c(0,100). |
height |
A positive real number defining the upper limit on the ordinate for the plots. |
value |
Number required by ggplot2. Defaults to 0. |
ttlsz |
Title font size. |
axisttl |
Axis title font size. |
axistxt |
Axis text(tick label) font size. |
A surprisal power curve for each question indexed in the index
argument.
A request for a keystroke is made for each question. The answer to question
strongly defines the optimal position of an estimated score index value where
the curve is high value. Values of power curves typically range over [0,0.5].
The plots of the power curves specified in plotindex
are produced as
a side effect.
If saveplot
is TRUE, the plots of item power curves specified in
plotindex
are bundled into a single postscript or .pdf file and the
file name is defined by paste(dataList$titlestr,i,'-power.pdf',sep="")
.
The file is then outputas a returned value.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
Sensitivity_plot
,
Entropy_plot
,
Ffuns_plot
,
ICC_plot
# Example 1. Display the item power curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees # plot the power curve for the first item dataList <- Quant_13B_problem_dataList SfdList <- Quant_13B_problem_parmList$SfdList Qvec <- Quant_13B_problem_parmList$Qvec scrfine <- seq(0,100,len=101) oldpar <- par(no.readonly=TRUE) Power_plot(scrfine, SfdList, Qvec, dataList, plotindex=1) par(oldpar)
# Example 1. Display the item power curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees # plot the power curve for the first item dataList <- Quant_13B_problem_dataList SfdList <- Quant_13B_problem_parmList$SfdList Qvec <- Quant_13B_problem_parmList$Qvec scrfine <- seq(0,100,len=101) oldpar <- par(no.readonly=TRUE) Power_plot(scrfine, SfdList, Qvec, dataList, plotindex=1) par(oldpar)
These data are for a randomly selected subset of 1000 examinees.
Quant_13B_problem_chcemat
Quant_13B_problem_chcemat
A matrix object with 1000 rows and 24 columns. The integers indicate which answer was chosen for each question by the examinee associated with the row.
The data are for 1000 randomly selected examinees taking 24 math analysis multiple choice questions.
Quant_13B_problem_dataList
Quant_13B_problem_dataList
A named list.
A named list with 19 members:
A matrix of response data with N rows and n columns where
N is the number of examinees or respondents and n is the number of items.
Entries in the matrices are the indices of the options chosen.
Column i of chcemat is expected to contain only the integers
1,...,noption
.
If the data are from a test of the multiple choices type where the right answer is scored 1 and the wrong answers 0, this is a numeric vector of length n containing the indices the right answers. Otherwise, it is NULL.
A string containing a title for the analysis.
The number of persons tested
The number of questons or items
A numeric vector of length n containing the numbers of options for each item.
The total dimension of the surprisal scores.
A vector of length indicating which option for each item contains missing or illegal choice values. If 0, there is no such option.
A list vector or length n with each object a numeric vector of weights assigned to each option for each item.
The number of bins for binning the data.
The number of spline basis functions.
An basisfd object for the defining the surprisal curves.
A character vector with a title string for each item.
A list vector of length n with a character vector of labels for each object within each item.
A vector of length N containing the examinee or respondent sum scores.
A vector of length n containing the item sum scores.
A numeric vector of length N containing small jitters to each sum score to break up ties,
A vector of length 2 containing the limits of observed sum scores.
A list vector containing essential objects for each item.
A fine mesh of test score values for plotting.
A numeric vector of length 2*nbin + 1 containing
the bin boundaries alternating with the bin centers. These are initially
defined as seq(0,100,len=2*nbin+1)
.
A vector length N containing the sum score percentile ranks.
The marker percentages for plotting: 5, 25, 50, 75 and 95.
The data are for 1000 examinees randomly selected from those who took the 2013
quantitative subtest of the SweSAT university entrance exam.
The questions are only the 24 math analysis questions, and each question has
four options.
The analysis results are after 10 cycles of alternating between estimating
surprisal curves and estimating percentile score index values.
The objects in list object Quant_13B_problem_infoList
are required for plotting
results over the arc length or information domain rather the score index domain.
This domain is preferred because such plots are invariant with respect to changes
in the score index domain. It also has a metric structure so that differences
are comparable no matter where they fall within the information domain.
Quant_13B_problem_infoList
Quant_13B_problem_infoList
A named list containing eight objects.
The object Quant_13B_problem_parmList
is a named list with these members:
The total length of the information domain measured in
M-bits, where M
is the number of options for a question.
The log derivative functional data object defining a strictly increasing set of arc length values corresponding to set of score index values.
A mesh of equally-spaced values of indefinite integrals of sum of norms of surprisal derivatives.
The N arc length values corresponding to the N estimated score index values assigned to N examinees.
The arc length positions corresponding to the marker percentages 5, 25, 50, 75 and 95.
A vector of score index values resulting from using
function monfd
with equally spaced arc length values and
Sfd.info
.
The dimension of the over space containing the surprisal pcurves.
A vector that contains the indices of the right answers among the options for the 24 questions
Quant_13B_problem_key
Quant_13B_problem_key
The data are for 1000 examinees randomly selected from those who took the 2013 quantitative subtest of the SweSAT university entrance exam. The questions are only the 24 math analysis questions, and each question has four options. The analysis results are after 10 cycles of alternating between estimating surprisal curves and estimating percentile score index values.
Quant_13B_problem_parmList
Quant_13B_problem_parmList
A named list.
The object Quant_13B_problem_parmList
is a named list with these members:
index: |
A vector of length |
indexQnt: |
A vector of length 2*nbin + 1 containing bin boundaries alternating with bin centres. |
SfdList: |
A list vector of length equal to the number of questions. Each member contains eight results for the surprisal curves associated with a question. |
logdensfd: |
A functional data object representing the logarithm of the density of the percentile rank score index values. |
C: |
The norming constant: the density function is |
densfine: |
A fine mesh of probability density values of the percentile rank score index. |
denscdf: |
A fine mesh of cumulative probability distribution values used for interpolating values. |
Qvec: |
The score index values associated with the five marker percentages 5, 25, 50, 75 and 95. |
binctr: |
A vector of length nbin containing the centres of the bins. |
bdry: |
A vector of length nbin+1 containing the boundaries of the bins. |
freq: |
An nbin by M matrix of frequencies with which options are chosen. |
Smax: |
A maximum surprisal value used for plotting purposes. |
Hval: |
The value of the fitting criterion |
DHval: |
The value of the first derivative of the fitting criterion
|
D2Hval: |
The value of the second derivative of the fitting criterion
|
active: |
A logical vector of length N indicating which estimates of index are converged (FALSE) or not converged (TRUE). |
infoSurp: |
The length in bits of the test information curve. |
infofine: |
A mesh of 101 equally spaced positions along the test information curve. |
Qinfovec: |
The positions of the five marker percentages on the test information curve. |
scopevec: |
A vector of length N containing the positions of each examinee or respondent on the test information curve. |
The surprisal curves for each item are fit to the surprisal
transforms of choice probabilities for each of a set of bins of current
performance values index
. The error sums of squares are minimized
by the surprisal optimization smooth.surp
in the fda
package. The output is a list vector of length n
containing the
functional data objects defining the curves.
Sbinsmth(index, dataList, indexQnt=seq(0,100, len=2*nbin+1), wtvec=matrix(1,n,1), iterlim=20, conv=1e-4, dbglev=0)
Sbinsmth(index, dataList, indexQnt=seq(0,100, len=2*nbin+1), wtvec=matrix(1,n,1), iterlim=20, conv=1e-4, dbglev=0)
index |
A vector of length N containing current values of score index percentile values. |
dataList |
A list that contains the objects needed to analyse the test or rating scale. |
indexQnt |
A vector of length 2*n+1 containing the sequence of bin boundary and bin centre values. |
wtvec |
A vector of length |
iterlim |
The maximum number of iterations used in optimizing surprisal curves. Defaults to 20. |
conv |
Convergence tolerance. Defaults to 0.0001. |
dbglev |
Level of output within |
The function first bins the data in order to achieve rapid estimation of the
option surprisal curves. The argument indexQnt
contains the sequence
of bin boundaries separated by the bin centers, so that it is of length
2*nbin + 1
where nbin
is the number of bins.
These bin values are distributed over the percentile interval
[0,100] so that the lowest boundary is 0 and highest 100.
Prior to the call to Sbinsmth
these boundaries are computed so that
the numbers of values of index
falling in the bins are roughly equal.
It is important that the number of bins be chosen so that the bins contain
at least about 25 values.
After the values of index
are binned, the proportions that the bins
are chosen for each question and each option are computed. Proportions of
zero are given NA values.
The positive proportions are then converted to surprisal values where surprisal = -log_M (proportion) where log_M is the logarithm with base M, the number of options associated with a question. Bins with zero proportions are assigned a surprisal that is appropriately large in the sense of being in the range of the larger surprisal values associated with small but positive proportions. This surprisal value is usually about 4.
The next step is to fit the surprisal values for each question by a
functional data object that is smooth, passes as closely as possible to an
option's surprisal values, and has values consistent with being a surprisal
value. The function smooth.surp()
is used for this purpose. The
arc length of thme item information curve is also computed.
Finally the curves and other results for each question are saved in object
SfdList
, a list vector of length n, and the list vector is returned.
The optimized numbered list object SfdList
with length n
that provides data on the probability and surprisal data and curves.
The 12 objects for each item are as follows:
Sfd: |
A surprisal functional data object that is used for plotting. It also contains the coefficient matrix and functional data basis that define the object. |
M: |
The number of options, including if needed a final option which is for the missing and illegitimate responses. |
Pbin: |
A |
Sbin: |
A |
indfine: |
A fine mesh of 101 equally spaced score index values over the interval [0,1]. |
Pmatfine: |
A 101 by |
Smatfine: |
A 101 by |
DSmatfine: |
A 101 by |
D2Smatfine: |
A 101 by |
PSrsErr: |
The standard error for probability over the fine mesh. |
PSrsErr: |
The standard error for surprisal over the fine mesh. |
itemScope: |
The length of the item info curve. |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# Example 1. Display the initial probability and surprisal curves for the # first item in the short SweSAT multiple choice test with 24 items and # 1000 examinees. # Note: The scope is 0 at this point because it is computed later # in the analysis. dataList <- Quant_13B_problem_dataList index <- dataList$percntrnk # Carry out the surprisal smoothing operation SfdResult <- Sbinsmth(index, dataList) ## Not run: # Set up the list object for the estimated surprisal curves SfdList <- SfdResult$SfdList # The five marker percentage locations for (5, 25, 50, 75, 95) binctr <- dataList$binctr Qvec <- dataList$PcntMarkers # plot the curves for the first question scrfine <- seq(0,100,len=101) ICC_plot(scrfine, SfdList, dataList, Qvec, binctr, data_point = TRUE, plotType = c("S", "P"), Srng=c(0,3), plotindex=1) ## End(Not run)
# Example 1. Display the initial probability and surprisal curves for the # first item in the short SweSAT multiple choice test with 24 items and # 1000 examinees. # Note: The scope is 0 at this point because it is computed later # in the analysis. dataList <- Quant_13B_problem_dataList index <- dataList$percntrnk # Carry out the surprisal smoothing operation SfdResult <- Sbinsmth(index, dataList) ## Not run: # Set up the list object for the estimated surprisal curves SfdList <- SfdResult$SfdList # The five marker percentage locations for (5, 25, 50, 75, 95) binctr <- dataList$binctr Qvec <- dataList$PcntMarkers # plot the curves for the first question scrfine <- seq(0,100,len=101) ICC_plot(scrfine, SfdList, dataList, Qvec, binctr, data_point = TRUE, plotType = c("S", "P"), Srng=c(0,3), plotindex=1) ## End(Not run)
Set up objects needed for analyses of nominal data.
Sbinsmth_nom(bdry_nom, SfdList_nom)
Sbinsmth_nom(bdry_nom, SfdList_nom)
bdry_nom |
Vector of length two containing the initial and final values of the scofre index. |
SfdList_nom |
A list vector of length equal to number of items. Each object is a list object containing the containing number of options and the nominal parameter matrix estimated by the mirt package. |
Called twice.
index
as a function of arc length.
Arc length or scope is the distance along the space curved traced
out as score index index
increases from 0 to 100.
It is measured in bits and is remains
unchanged if the score index continuum is modified.
Scope_plot(infoSurp, infoSurpvec, titlestr=NULL)
Scope_plot(infoSurp, infoSurpvec, titlestr=NULL)
infoSurp |
This is a positive real number indicating the total length of the space curve. It is expressed in terms of numbers of bits. |
infoSurpvec |
A vector of length 101 containing equally-spaced arc-length distances along the test information curve. |
titlestr |
A string for the title of the data. |
A gg
or ggplot
object defining the plot of infoSurp along the test information
curve as a function of the score index index
. This is displayed by the print
command.
The plot is automatically displayed as a side value even if no return object is specified in
the calling statement.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# # Example 1. Display the arc length curve for the # SweSAT multiple choice test with 24 items and 1000 examinees # infoSurpvec <- Quant_13B_problem_infoList$infoSurpvec infoSurp <- Quant_13B_problem_infoList$infoSurp oldpar <- par(no.readonly=TRUE) Scope_plot(infoSurp, infoSurpvec) on.exit(oldpar)
# # Example 1. Display the arc length curve for the # SweSAT multiple choice test with 24 items and 1000 examinees # infoSurpvec <- Quant_13B_problem_infoList$infoSurpvec infoSurp <- Quant_13B_problem_infoList$infoSurp oldpar <- par(no.readonly=TRUE) Scope_plot(infoSurp, infoSurpvec) on.exit(oldpar)
The tasks of function index.density()
and plotting the density are combined.
The score density is plotted both as a histogram and as a smooth curve.
All the score types may be plotted: sum scores, expected test scores,
percentile score index values, and locations on the test information or
scale curve. The plot is output as a ggplot2 plot object, which is actually
plotted using the print
command.
scoreDensity(scrvec, scrrng=c(0,100), ndensbasis=15, ttlstr=NULL, pltmax=0)
scoreDensity(scrvec, scrrng=c(0,100), ndensbasis=15, ttlstr=NULL, pltmax=0)
scrvec |
A vector of strictly increasing bin boundary values, with the first at the lowest plotting value and the last at the upper boundary. The number of bins in the histogram is one less than the number of bndry values. |
scrrng |
A vector of length 2 containing lower and upper boundaries on scores, which defaults to c(0,100). |
ndensbasis |
The number of spline basis functions to be used to represent the smooth density curve. |
ttlstr |
A string object used as a title for the plot. Defaults to none. |
pltmax |
An upper limit on the vertical axis for plotting. Defaults to the maximum curve value. |
A ggplot2 plot object dens.plot
that can be displayed using command
print(dens.plot)
.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
index_fun
,
index2info
,
mu
,
index_distn
# Example 1. Display probability density curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index Qvec <- Quant_13B_problem_parmList$Qvec # plot the density for the score indices within interval c(0,100) index_int <- index[0 < index & index < 100] oldpar <- par(no.readonly=TRUE) scoreDensity(index_int) par(oldpar)
# Example 1. Display probability density curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index Qvec <- Quant_13B_problem_parmList$Qvec # plot the density for the score indices within interval c(0,100) index_int <- index[0 < index & index < 100] oldpar <- par(no.readonly=TRUE) scoreDensity(index_int) par(oldpar)
After the simulated data matrices have been
analyzed, prepare the objects necessary for the
performance plots produced by functions RMSEbias1.plot
and
RMSEbias2.plot
.
scorePerformance(dataList, simList)
scorePerformance(dataList, simList)
dataList |
A list that contains the objects needed to analyse the test or rating scale with the following fields:
|
simList |
A named list containing these objects:
|
A named list containing these objects:
A matrix with row dimension nchcemat
, the number of
population score index values and column dimension nsample
, the
number of simulated samples.
An nchcemat
by nsample
matrix of estimated score
index values.
An nchcemat
by nsample
matrix of estimated expected
score values.
An nchcemat
by nsample
matrix of estimated test
information curve values.
A vector of population score index values.
A vector of expected scores computed from the population score index values.
A vector of test information values computed from the population score index values.
The number of questions.
The five marker percentile values.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
A sensitivity curve for an option is the first derivative of the corresponding surprisal curve. Its values can be positive or negative, and the size of the departure from zero at any point on the curve is the amount information contributed by that curve to locating the value of an examinee or respondent on the score index continuum.
Sensitivity_plot(scrfine, SfdList, Qvec, dataList, plotindex=1:n, plotrange=c(min(scrfine),max(scrfine)), key=NULL, titlestr=NULL, saveplot=FALSE, width=c(-0.2,0.2), ttlsz=NULL, axisttl=NULL, axistxt=NULL, lgdlab=NULL)
Sensitivity_plot(scrfine, SfdList, Qvec, dataList, plotindex=1:n, plotrange=c(min(scrfine),max(scrfine)), key=NULL, titlestr=NULL, saveplot=FALSE, width=c(-0.2,0.2), ttlsz=NULL, axisttl=NULL, axistxt=NULL, lgdlab=NULL)
scrfine |
A vector of length nfine (usually 101) containing equally spaced points spanning the plotrange. Used for plotting. |
SfdList |
A numbered list object produced by a TestGardener analysis of a test.
Its length is equal to the number of items in the test or questions in
the scale. Each member of |
Qvec |
The values of the five marker percentiles. |
dataList |
A list that contains the objects needed to analyse the test or rating scale. |
plotindex |
A set of integers specifying the numbers of the items or questions to be displayed. |
plotrange |
A vector of length 2 containing the plot boundaries within or over the score index interval c(0,100). |
key |
A integer vector of indices of right answers. If the data are rating scales, this can be NULL. |
titlestr |
A title string for plots. |
saveplot |
A logical value indicating whether the plot should be saved to a pdf file. |
width |
A vector of length 2 defining the lower and upper limits on the ordinate for the plots. |
ttlsz |
Title font size. |
axisttl |
Axis title font size. |
axistxt |
Axis text(tick label) font size. |
lgdlab |
Legend label font size. |
Sensitivity curves for each question indexed in the index
argument.
A request for a keystroke is made for each question.
A list vector is returned which is of the length of argument plotindex
.
Each member of the vector is a gg
or ggplot
object for the associated
plotindex value
. Each plot can be displayed using the print
command.
The plots of item power are produced as a side value even if no output object is
specified in the call to the function.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
Power_plot
,
Entropy_plot
,
ICC_plot
# Example 1. Display the option sensitivity curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees. dataList <- Quant_13B_problem_dataList SfdList <- Quant_13B_problem_parmList$SfdList Qvec <- Quant_13B_problem_parmList$Qvec scrfine <- seq(0,100,len=101) oldpar <- par(no.readonly=TRUE) Sensitivity_plot(scrfine, SfdList, Qvec, dataList, plotindex=1) par(oldpar)
# Example 1. Display the option sensitivity curves for the # short SweSAT multiple choice test with 24 items and 1000 examinees. dataList <- Quant_13B_problem_dataList SfdList <- Quant_13B_problem_parmList$SfdList Qvec <- Quant_13B_problem_parmList$Qvec scrfine <- seq(0,100,len=101) oldpar <- par(no.readonly=TRUE) Sensitivity_plot(scrfine, SfdList, Qvec, dataList, plotindex=1) par(oldpar)
Simulation of data using a previous analysis requires only an ICC vector and two objects computed by function theta.distn
along with a specification of the number of simulated the simulated persons.
SimulateData(nsim, indfine, denscdf, SfdList)
SimulateData(nsim, indfine, denscdf, SfdList)
nsim |
Number of persons having simulated choices. |
indfine |
The score index values within [0,100] that are
associated with the cumulative probability values in
|
denscdf |
The cumulative probability values within [0,1]. The values have to be discrete, begin with 0 and end with 1. |
SfdList |
List vector of length n of list vectors for item objects. |
Arguments indfine
and denscdf
can be obtained from
the original analysis, but also can be specified to describe
a different distribution of score index values.
An nsim by n matrix of integers including 1 and 2 that specify each person's option choice for each item.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
dataSimulation
,
chcemat_simulate
# example code to be set up
# example code to be set up
ICC
object.
An N by n matrix of positive integer choice index values is transformed to an nbin by M matrix of probability values by iteravely minimizing the sum of squared errors for bin values.
smooth.ICC(x, item, index, dataList, indexQnt=seq(0,100, len=2*nbin+1), wtvec=matrix(1,n,1), iterlim=20, conv=1e-4, dbglev=0)
smooth.ICC(x, item, index, dataList, indexQnt=seq(0,100, len=2*nbin+1), wtvec=matrix(1,n,1), iterlim=20, conv=1e-4, dbglev=0)
x |
An ICC object |
item |
Index of item being set up. |
index |
A vector of length N containing score index values for each person. |
dataList |
A list object set up by function |
indexQnt |
A vector of length 2*nbin + 1 containing, in sequence, the lower boundary of a bin, its midpoint, and the upper boundary. |
wtvec |
A vector of length n containing wseights for items. |
iterlim |
An integer specifying the maximum number of optimizations. |
conv |
A convergence criterion a little larger than 0. |
dbglev |
One of integers 0 (no optimization information), 1 (one line per optimization) or 2 (complete optimization display). |
An S3 class ICC object for a single item.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# example code to be set up
# example code to be set up
Surprisal is -log(probability) where the logarithm is to the base being the
dimension M
of the multinomial observation vector. The surprisal
curves for each question are estimated by fitting the surprisal values of
binned data using curves whose values are within the M-1
-dimensional
surprisal subspace that is within the space of non-negative M-dimensional
vectors.
smooth.surp(argvals, y, Bmat0, Sfd, Zmat, wtvec=NULL, conv=1e-4, iterlim=50, dbglev=0)
smooth.surp(argvals, y, Bmat0, Sfd, Zmat, wtvec=NULL, conv=1e-4, iterlim=50, dbglev=0)
argvals |
Argument value array of length N, where N is the number of observed curve values for each curve. It is assumed that that these argument values are common to all observed curves. If this is not the case, you will need to run this function inside one or more loops, smoothing each curve separately. |
y |
A |
Bmat0 |
A |
Sfd |
A functional data object used toinitialize the optimization process. |
Zmat |
An |
wtvec |
A vector of weights to be used in the smoothing. |
conv |
A convergence criterion. |
iterlim |
the maximum number of iterations allowed in the minimization of error sum of squares. |
dbglev |
Either 0, 1, or 2. This controls the amount information printed out on each iteration, with 0 implying no output, 1 intermediate output level, and 2 full output. If either level 1 or 2 is specified, it can be helpful to turn off the output buffering feature of S-PLUS. |
A named list of class surpFd
with these members:
PENSSE |
The final value of the penalized fitting criterion. |
DPENSSE |
The final gradient of the penalized fitting criterion. |
D2PENSSE |
The final hessian of the fitting criterion. |
SSE |
The final value of the error sum of squares. |
DSSE |
The final gradient of the error sum of squares. |
D2SSE |
The final hessian of the error sum of squares. |
DvecSmatDvecB |
The final cross derivative DvecSmatDvecX times DvecXmatDvecB of the surprisal curve and the basis coordinates. |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
oldpar <- par(no.readonly=TRUE) # evaluation points x <- seq(-2,2,len=11) # evaluate a standard normal distribution function p <- pnorm(x) # combine with 1-p mnormp <- cbind(p,1-p) M <- 2 # convert to surprisal values mnorms <- -log2(mnormp) # plot the surprisal values matplot(x, mnorms, type="l", lty=c(1,1), col=c(1,1), ylab="Surprisal (2-bits)") # add some log-normal error mnormdata <- exp(log(mnorms) + rnorm(11)*0.1) # set up a b-spline basis object nbasis <- 7 sbasis <- create.bspline.basis(c(-2,2),nbasis) # define an initial coefficient matrix cmat <- matrix(0,7,1) # set up a fd object for suprisal smoothing Sfd <- fd(cmat, sbasis) Zmat <- matrix(c(1,-1),2,1) # smooth the noisy data result <- smooth.surp(x, mnormdata, cmat, Sfd, Zmat) # plot the data and the fits of the two surprisal curves xfine <- seq(-2,2,len=51) sfine <- eval.surp(xfine, result$Sfd, Zmat) matplot(xfine, sfine, type="l", lty=c(1,1), col=c(1,1)) points(x, mnormdata[,1]) points(x, mnormdata[,2]) # convert the surprisal fit values to probabilities pfine <- 2^(-sfine) # check that they sum to one apply(pfine,1,sum) par(oldpar)
oldpar <- par(no.readonly=TRUE) # evaluation points x <- seq(-2,2,len=11) # evaluate a standard normal distribution function p <- pnorm(x) # combine with 1-p mnormp <- cbind(p,1-p) M <- 2 # convert to surprisal values mnorms <- -log2(mnormp) # plot the surprisal values matplot(x, mnorms, type="l", lty=c(1,1), col=c(1,1), ylab="Surprisal (2-bits)") # add some log-normal error mnormdata <- exp(log(mnorms) + rnorm(11)*0.1) # set up a b-spline basis object nbasis <- 7 sbasis <- create.bspline.basis(c(-2,2),nbasis) # define an initial coefficient matrix cmat <- matrix(0,7,1) # set up a fd object for suprisal smoothing Sfd <- fd(cmat, sbasis) Zmat <- matrix(c(1,-1),2,1) # smooth the noisy data result <- smooth.surp(x, mnormdata, cmat, Sfd, Zmat) # plot the data and the fits of the two surprisal curves xfine <- seq(-2,2,len=51) sfine <- eval.surp(xfine, result$Sfd, Zmat) matplot(xfine, sfine, type="l", lty=c(1,1), col=c(1,1)) points(x, mnormdata[,1]) points(x, mnormdata[,2]) # convert the surprisal fit values to probabilities pfine <- 2^(-sfine) # check that they sum to one apply(pfine,1,sum) par(oldpar)
A test or scale analysis produces a space curve that varies with in the space of
possible option curves of dimension Sdim
. Fortunately, it is usual that most
of the shape variation in the curve is within only two or three dimensions, and these
can be fixed by using functional principal components analysis.
Spca(SfdList, nharm=2, Sdim=NULL, rotate=TRUE)
Spca(SfdList, nharm=2, Sdim=NULL, rotate=TRUE)
SfdList |
A numbered list object produced by a TestGardener analysis
of a test. Its length is equal to the number of items in the test or
questions in the scale. Each member of |
Sdim |
Interval over which curve is plotted. All if Sdim == NULL. |
nharm |
The number of principal components of the test information or scale curve to be used to display the curve. Must be either 2 or 3. |
rotate |
If true, rotate principal components of the test information or scale curve to be used to display the curve to VARIMAX orientation. |
A named list with these members:
harmvarmxfd |
Functional data objects for the principal components of the curve shape. |
varpropvarmx |
Proportions of variance accounted for by the principal components |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# Example 1. Display the test information curve for the # short SweSAT multiple choice test with 24 items and 1000 examinees # plot a two-dimension version of manifold curve Sdim <- Quant_13B_problem_dataList$Sdim SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index infoSurp <- Quant_13B_problem_parmList$infoSurp # <- Quant_13B_problem_dataList$Sdim oldpar <- par(no.readonly=TRUE) on.exit(oldpar) Results <- Spca(SfdList, nharm=2, rotate=FALSE) varprop <- Results$varpropvarmx print("Proportions of variance accounted for and their sum:") print(round(100*c(varprop,sum(varprop)),1)) # plot a three-dimension version of manifold curve SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index infoSurp <- Quant_13B_problem_parmList$infoSurp Results <- Spca(SfdList, nharm=3, rotate=FALSE) varprop <- Results$varpropvarmx print("Proportions of variance accounted for and their sum:") print(round(100*c(varprop,sum(varprop)),1))
# Example 1. Display the test information curve for the # short SweSAT multiple choice test with 24 items and 1000 examinees # plot a two-dimension version of manifold curve Sdim <- Quant_13B_problem_dataList$Sdim SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index infoSurp <- Quant_13B_problem_parmList$infoSurp # <- Quant_13B_problem_dataList$Sdim oldpar <- par(no.readonly=TRUE) on.exit(oldpar) Results <- Spca(SfdList, nharm=2, rotate=FALSE) varprop <- Results$varpropvarmx print("Proportions of variance accounted for and their sum:") print(round(100*c(varprop,sum(varprop)),1)) # plot a three-dimension version of manifold curve SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index infoSurp <- Quant_13B_problem_parmList$infoSurp Results <- Spca(SfdList, nharm=3, rotate=FALSE) varprop <- Results$varpropvarmx print("Proportions of variance accounted for and their sum:") print(round(100*c(varprop,sum(varprop)),1))
A test or scale analysis produces a space curve that varies with in the space of
possible option curves of dimension Sdim
. Fortunately, it is usual that most
of the shape variation in the curve is within only two or three dimensions, and these
can be fixed by using functional principal components analysis.
Spca_plot(harmvarmxfd, nharm=2, titlestr=NULL)
Spca_plot(harmvarmxfd, nharm=2, titlestr=NULL)
harmvarmxfd |
Functional data objects for the principal components of the curve shape. |
nharm |
Number of principal components. |
titlestr |
A string for the title of the plot. Defaults to NULL. |
Side effect is a two or three-dimensional plot of the principal component
approximation of the information curve using the plotly package. Function
plot_ly
does not return a value, but does render the graphic.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
# Example 1. Display the test information curve for the # short SweSAT multiple choice test with 24 items and 1000 examinees # plot a two-dimension version of manifold curve SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index arclength <- Quant_13B_problem_parmList$arclength Results <- Spca(SfdList, nharm=2, rotate=TRUE) varprop <- Results$varpropvarmx titlestr <- "SweSAT problem items" oldpar <- par(no.readonly=TRUE) on.exit(oldpar) Spca_plot(Results$harmvarmxfd, nharm=2, titlestr) print("Proportions of variance accounted for and their sum:") print(round(100*c(varprop,sum(varprop)),1)) # plot a three-dimension version of manifold curve SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index arclength <- Quant_13B_problem_parmList$arclength Results <- Spca(SfdList, nharm=3, rotate=TRUE) varprop <- Results$varpropvarmx Spca_plot(Results$harmvarmxfd, nharm=3, titlestr) print("Proportions of variance accounted for and their sum:") print(round(100*c(varprop,sum(varprop)),1))
# Example 1. Display the test information curve for the # short SweSAT multiple choice test with 24 items and 1000 examinees # plot a two-dimension version of manifold curve SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index arclength <- Quant_13B_problem_parmList$arclength Results <- Spca(SfdList, nharm=2, rotate=TRUE) varprop <- Results$varpropvarmx titlestr <- "SweSAT problem items" oldpar <- par(no.readonly=TRUE) on.exit(oldpar) Spca_plot(Results$harmvarmxfd, nharm=2, titlestr) print("Proportions of variance accounted for and their sum:") print(round(100*c(varprop,sum(varprop)),1)) # plot a three-dimension version of manifold curve SfdList <- Quant_13B_problem_parmList$SfdList index <- Quant_13B_problem_parmList$index arclength <- Quant_13B_problem_parmList$arclength Results <- Spca(SfdList, nharm=3, rotate=TRUE) varprop <- Results$varpropvarmx Spca_plot(Results$harmvarmxfd, nharm=3, titlestr) print("Proportions of variance accounted for and their sum:") print(round(100*c(varprop,sum(varprop)),1))
TestGardener is designed to permit the analysis of choice data from multiple choice tests and rating scales using information as an alternative to the usual models based on probability of choice.
Probabability and information are related by the simple transformation "information = -log probability". Another term for information is "surprisal."
The advantage of information methodology, often used in the engineering and physical sciences, is that measurabe, and therefore is on what is called a "ratio scale" in the social sciences. That is, information or rurprisal has a lower limit of zero, is unbounded above, and can be added, subtracted and rescaled with a positive multiplier.
The disadvantage of probability as a basis for representing choice is that differences near its two boundaries are on very different scales than those near 0.5, and our visual and other sensory systems, which are adapted to mangitudes, have many problems in assessing the nonlinear probability continuum.
TestGardener uses highly adaptable and computationally efficient spline basis functions to represent item characteristic curves for both probability and surprisal. Splines bases permit as much flexibility as the task requires, and also can control the smoothness and the order of differentiation.
The higher variability revealed by information or surprisal curves reveals many more insights into choice behavior than the usual simple curve employed in standard probability-based item response theory.
The use of information as a measure also implies a measure of inter-item covariation called mutual entropy. Entropy a function whose value at any point is the average across surprisal curves produced by summing over curves for a given item of the product of probability and surprisal.
Graphical display is a large part of the TestGardener capacity, with extensive use of the ggplot2 and plotly packages.
The test information curve is the trajectory of joint variation of all the surprisal curves within the ambient space of dimension the total number of curves. But usually a very high percent of the shape variation in the curve can be represented in either two or three dimensions using the singular value decomposition of a matrix of total curve values over a fine mesh. The resulting approximation is converted to a set of surprisal curve values.
TestInfo_svd(scrfine, SfdList, itemindex=1:n, nharm=2)
TestInfo_svd(scrfine, SfdList, itemindex=1:n, nharm=2)
scrfine |
A fine mesh of values over which the image is plotted. This is usually either the score index theta or the test arc length. |
SfdList |
A list vector of length n, the number of test items. Each list in the vector contains values of the surprisal curves for that item. |
itemindex |
A vector of item indices to be used in the approximation. |
nharm |
The number of dimension in the approximation, usually either two or three. |
The approximation is returned as a surprisal functional data object, and so are the percentages of the total variation fit by each dimension in the approximation.
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
Given an choice ingter-valued index matrix and a vector of numbers of item options, the function cycles through a set of iterations involving surprisal curve estimation followed by test taker index values.
TG_analysis(chcemat, scoreList, noption, sumscr_rng=NULL, titlestr=NULL, itemlabvec=NULL, optlabList=NULL, nbin=nbinDefault(N), NumBasis=7, NumDensBasis=7, jitterwrd=TRUE, PcntMarkers=c( 5, 25, 50, 75, 95), ncycle=10, itdisp=FALSE, verbose=FALSE)
TG_analysis(chcemat, scoreList, noption, sumscr_rng=NULL, titlestr=NULL, itemlabvec=NULL, optlabList=NULL, nbin=nbinDefault(N), NumBasis=7, NumDensBasis=7, jitterwrd=TRUE, PcntMarkers=c( 5, 25, 50, 75, 95), ncycle=10, itdisp=FALSE, verbose=FALSE)
chcemat |
An N by n matrix. Column i must contain the integers from 1 to M_i, where M_i is the number of options for item i. If missing or illegitimate responses exist for item i, the column must also contain an integer greater than M_i that is used to identify such responoses. Alternatively, the column use NA for this purpose. Because missing and illegible responses are normally rare, they are given a different and simpler estimation procedure for their surprisal values. U is mandatory. |
scoreList |
Either a list of length n, each containing a vector of length M_i that assigns numeric weights to the options for that item. In the special case of multiple choice items where the correct option has weight 1 and all others weight 0, a single integer can identify the correct answer. If all the items are of the multiple type, scoreList may be a numeric vector of length n containing the right answer indices. List object scoreList is mandatory because these weights define the person scores for the surprisal curve estimation process. |
noption |
A numeric vector of length |
sumscr_rng |
A vector of length 2 indicating the initial and final sum score values. Default is NULL the whole sum score is used. |
titlestr |
A title string for the data and their analyses. Default is NULL. |
itemlabvec |
A character value containing labels for the items. Default is NULL and item position numbers are used. |
optlabList |
A list vector of length n, each element i of which is a character vector of length M_i. Default is NULL, and option numbers are used. |
nbin |
The number of bins containing proportions of choices. |
NumBasis |
The number of spline basis functions to use for surprisal values. Defaults to 7. |
NumDensBasis |
The number of spline basis functions to use for score probability density function. Defaults to 7. |
jitterwrd |
A logical object indicating whether a small jittering perturbation should be used to break up ties. Defaults to TRUE. |
PcntMarkers |
A vector of percentages inside of [0,100] that appear in plots. Defaults to c(5, 25, 50, 75, 95). Extra displays are provided. Defaults to FALSE. |
ncycle |
The number of cycles in the analysis. Defaults to 10. |
itdisp |
Display results for function |
verbose |
Extra displays are provided. Defaults to FALSE. |
This function in package TestGardener processes at a mininum two objects:
(1) A matrix chcemat
that contains indices of choices made in a
sequence of choice situations (its number columns n
) by a set of
persons making the choices (its number of rows N
); and
(2) A list vector scoreList
of length n
containing numerical
weights or scores for each choice available with in each of n
choice
situations (referred to as items
).
The function returns three large lists containing objects that can be used
to assess:
(1) the probability that a choice will be made, and
(2) the quantity of information, called surprisal
, that the choice
made reveals about the performance or experience of the person making the
choice.
Four list objects, each containing objects that are required for various displays, tables and other results:
parmList |
A list object containing objects useful for displaying results that involve the score index cotninuum: |
SfdList: A list object of length n, each containing
objects for an item for displaying that item's surprisal curves
as defined by the score index values after the analysis.
See the help page for function Analyze
for a
description of these objects.
Qvec: A vector containing the positions on the score index
continuum of the marker percentages defined in the arguments of
function make_dataList()
.
binctr: A vector of length nbin
containing the positions
on the score index continuum of the bin centres.
indexScore: A vector of length N
containing
the positions on the score index contiuum of each person.
infoSurp: The length of the test or scale information continuum in M-bits.
infoList |
A list object containing objects useful for displaying results that involve the scale information cotninuum: |
infofine: A fine mesh of 101 values that is used to plot the scale information comntinuum.
scopevec: A vector of length N
containing
the positions on the scale information contiuum of each person.
Qinfovec: A vector containing the positions on the scale
information continuum of the marker percentages defined in the
arguments of function make_dataList()
.
infobinctr: A vector of length nbin
containing the positions
on the scale information continuum of the bin centres.
HALsave |
A |
analysisListvec |
A list of length |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.
make_dataList,
Analyze,
index_distn,
index2info,
index_fun,
Sbinsmth
## Not run: # Example 1: Input choice data and key for the short version of the # SweSAT quantitative multiple choice test with 24 items and 1000 examinees # input the choice indices in the 1000 by 24 choice index matrix chcemat <- Quant_13B_problem_chcemat # set up the key data key <- Quant_13B_problem_key # number of examinees and of items N <- nrow(chcemat) n <- ncol(chcemat) # number of options per item and option weights noption <- rep(0,n) for (i in 1:n) noption[i] <- 4 scoreList <- list() # option scores for (item in 1:n){ scorei <- rep(0,noption[item]) scorei[key[item]] <- 1 scoreList[[item]] <- scorei } # Analyze the data and return the big three list objects: # dataList (input info), parmList (score index info) and # infoList (information or scope info) TGresult <- TG_analysis(chcemat, scoreList, noption, NumBasis=4, ncycle=10, verbose=TRUE) ## End(Not run)
## Not run: # Example 1: Input choice data and key for the short version of the # SweSAT quantitative multiple choice test with 24 items and 1000 examinees # input the choice indices in the 1000 by 24 choice index matrix chcemat <- Quant_13B_problem_chcemat # set up the key data key <- Quant_13B_problem_key # number of examinees and of items N <- nrow(chcemat) n <- ncol(chcemat) # number of options per item and option weights noption <- rep(0,n) for (i in 1:n) noption[i] <- 4 scoreList <- list() # option scores for (item in 1:n){ scorei <- rep(0,noption[item]) scorei[key[item]] <- 1 scoreList[[item]] <- scorei } # Analyze the data and return the big three list objects: # dataList (input info), parmList (score index info) and # infoList (information or scope info) TGresult <- TG_analysis(chcemat, scoreList, noption, NumBasis=4, ncycle=10, verbose=TRUE) ## End(Not run)
Like the regular S-PLUS function density
, this function
computes a probability density function for a sample of values of a
random variable. However, in this case the density function is
defined by a functional parameter object logdensfdPar
along with a
normalizing constant C
.
The density function $p(indexdens)$ has the form
p(indexdens) = C exp[W(indexdens)]
where function $W(indexdens)$ is defined by the functional data object
logdensfdPar
.
## S3 method for class 'fd' TG_density(indexdens, logdensfd, conv=0.0001, iterlim=20, active=1:nbasis, dbglev=0)
## S3 method for class 'fd' TG_density(indexdens, logdensfd, conv=0.0001, iterlim=20, active=1:nbasis, dbglev=0)
indexdens |
a set observations, which may be one of two forms:
The first option corresponds to all $f_i = 1$. |
logdensfd |
a functional data object specifying the initial value, basis object, roughness penalty and smoothing parameter defining function $W(t).$ |
conv |
a positive constant defining the convergence criterion. |
iterlim |
the maximum number of iterations allowed. |
active |
a logical vector of length equal to the number of coefficients
defining |
dbglev |
either 0, 1, or 2. This controls the amount information printed out on each iteration, with 0 implying no output, 1 intermediate output level, and 2 full output. If levels 1 and 2 are used, it is helpful to turn off the output buffering option in S-PLUS. |
The goal of the function is provide a smooth density function
estimate that approaches some target density by an amount that is
controlled by the linear differential operator Lfdobj
and
the penalty parameter. For example, if the second derivative of
$W(t)$ is penalized heavily, this will force the function to
approach a straight line, which in turn will force the density function
itself to be nearly normal or Gaussian. Similarly, to each textbook
density function there corresponds a $W(t)$, and to each of these
in turn their corresponds a linear differential operator that will, when
apply to $W(t)$, produce zero as a result.
To plot the density function or to evaluate it, evaluate Wfdobj
,
exponentiate the resulting vector, and then divide by the normalizing
constant C
.
a named list of length 4 containing:
Wfdobj |
a functional data object defining function $W(indexdens)$ that that optimizes the fit to the data of the monotone function that it defines. |
C |
the normalizing constant. |
Flist |
a named list containing three results for the final converged solution: (1) f: the optimal function value being minimized, (2) grad: the gradient vector at the optimal solution, and (3) norm: the norm of the gradient vector at the optimal solution. |
iternum |
the number of iterations. |
iterhist |
a |
Juan Li and James Ramsay
Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.
Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.