Psychometrics by Simulation: Matching Test Information

# Imagine that you are asked to create a test from an item pool that you have.

# The item pool has 100 items and you have to create a test of 20 items in length.

# Easy right?

# The trick is that the test creators want the information on the test to match that of previous tests.

# They give you points on the information curve that your test must approximate in terms of information. These points are given for every half of a point change in theta.

info = c(0, .05, .1, .5, 1.4, 3.5, 5.8, 6, 6.25, 6, 5.8, 5.5,4.5, 3,1.75, .5 , .25)
theta = c(-4, -3.5, -3,-2.5, -2 ,-1.5, -1 ,-.5, 0, .5, 1, 1.5, 2,2.5, 3, 3.5, 4)

plot(theta, info, type="b", main="Test Information Curve to Match", ylab="Information", ylim=c(0,7), lwd=3)

lines(theta, info-.5, col="red", lwd=3)
lines(theta, info+.5, col="red", lwd=3)

# You are allowed a .5 above or below movement in actual information.

# You are given a set of 100 items. I will randomly generate items for a 3PL model, however your items would be given to you.

npool = 100

pool <- data-blogger-escaped-a="abs(rnorm(npool)*.25+1.1)," data-blogger-escaped-b="rnorm(npool)," data-blogger-escaped-c="abs(rnorm(npool)/7.5+.1))</p" data-blogger-escaped-cbind="cbind" data-blogger-escaped-item="1:npool,">summary(pool)

# You have to choose 20 items from these 100 to construct a test which has information that best matches the target information function.

nitems = 20

# Looking good.

# Each item has a item characteristic curve (ICC) of:
PL3 = function(theta,a, b, c) c+(1-c)*exp(a*(theta-b))/(1+exp(a*(theta-b)))

# and information function defined as:
PL3.info = function(theta, a, b, c) a^2 *(PL3(theta,a,b,c)-c)^2/(1-c)^2 * (1-PL3(theta,a,b,c))/PL3(theta,a,b,c)

# There are many potential ways of solving this problem. I propose to devise a maximization routine that evaluates each potential item and how it helps to shrink the difference between the target test information and that of the current. Each item will be added individually until the test has 20 items.

# First I want to calculate the information at each theta for each item.
nthetas = length(theta)

# This matrix will contain the item pool information values
pool.info <- data-blogger-escaped-matrix="matrix" data-blogger-escaped-ncol="nthetas)</p" data-blogger-escaped-nrow="npool,">

colnames(pool.info) <- data-blogger-escaped-heta=",theta,sep=" data-blogger-escaped-p="p" data-blogger-escaped-paste="paste">
# These must be calculated for each item but for all thetas simultaneously.
for(i in 1:npool) {
pool.info[i,] <- data-blogger-escaped-i="i" data-blogger-escaped-p="p" data-blogger-escaped-pl3.info="pl3.info" data-blogger-escaped-pool="pool" data-blogger-escaped-theta="theta">}

criteria <- data-blogger-escaped-function="function" data-blogger-escaped-item.info="item.info" data-blogger-escaped-p="p" data-blogger-escaped-scale="1)" data-blogger-escaped-test.gain="test.gain"> # test.gain is the potential information that the test still needs
# If this information is greater than 0 then this item can add as much as test.gain to the overall test information.
if (test.gain>0) return_score=min(test.gain^scale, item.info^scale)
# If the test has nothing to gain, then this item will add nothing.
if (test.gain<=0) return_score= 0
return_score
}

criteria <- data-blogger-escaped-function="function" data-blogger-escaped-item.info="item.info" data-blogger-escaped-p="p" data-blogger-escaped-scale="1)" data-blogger-escaped-test.gain="test.gain"> if (test.gain-item.info>0) return_score=test.gain^scale
if (test.gain-item.info<0 data-blogger-escaped-return_score="max(test.gain,0)^scale-2*(item.info-test.gain)^scale</p"> return_score
}

# I generated several different sets of criteria for evaluating each item.

# I found the second criteria to be the best because it gives weight to gains approaching the target score but also give double that weight in gains above the target score. This is useful because information can be added but the maximization routine does not subtract.

# I have added a z loop into the program to have the program scan over multiple scale factors for solutions.
zz = zfail = zsum = zvalue=0
zseq = seq(.25,15,.10)
# This is a fairly fine gradient to search over.

results <- data-blogger-escaped-matrix="matrix" data-blogger-escaped-ncol="nitems+3," data-blogger-escaped-nrow="length(zseq))</p">colnames(results) <- data-blogger-escaped-ails="ails" data-blogger-escaped-c="c" data-blogger-escaped-ifference_sum="ifference_sum" data-blogger-escaped-nitems="nitems" data-blogger-escaped-p="p" data-blogger-escaped-paste="paste" data-blogger-escaped-sep="" data-blogger-escaped-value="value">

for (z in zseq) {
zz=zz+1

# Let us create the empty test pool.
test.info <- data-blogger-escaped-matrix="matrix" data-blogger-escaped-ncol="nthetas)</p" data-blogger-escaped-nrow="nitems,">colnames(test.info) <- data-blogger-escaped-c="c" data-blogger-escaped-nfot=",theta, sep=" data-blogger-escaped-p="p" data-blogger-escaped-paste="paste">

# The current copies start out as copies of the pool and the pool's information but they shrink as more items are selected.
pool.current = pool
pool.info.current = pool.info

# Now we are ready to think about starting to select items for our test.
# Initially we have no information. We would like to select the item that satisfies our criteria for best item.

scale.power = z

# Generate an matrix to hold the score for the current pool of items
pool.score <- data-blogger-escaped-p="p" data-blogger-escaped-pool.info.current="pool.info.current">
# First let's figure out how much we need to gain on the test.
test.gain = (info-test.info[1,])

# Calculate the scores for all items
for(i in 1:nrow(pool.current)) for(ii in 1:nthetas) pool.score[i,ii] <- data-blogger-escaped-nbsp="nbsp" data-blogger-escaped-p="p"> criteria(pool.info.current[i,ii], test.gain[ii], scale=scale.power)

# Now we sum across items
sum.score <- data-blogger-escaped-1="1" data-blogger-escaped-apply="apply" data-blogger-escaped-p="p" data-blogger-escaped-pool.score="pool.score" data-blogger-escaped-sum="sum">
# The first item that we pick is the item with the highest sum score
item <- data-blogger-escaped-item="item" data-blogger-escaped-pool.current="pool.current" data-blogger-escaped-sum.score="=max(sum.score)][1]</p"># Placing a [1] ensures that in the unlikely event of a tie, that R picks the first item as the winner.

# We add the information from the item that we have selected to the test.
test.info[1,] <- data-blogger-escaped-item="=item[1],]+0</p" data-blogger-escaped-pool.current="pool.current" data-blogger-escaped-pool.info="pool.info"># Note here we are using the original pool to add information. This is because the restricted pool would not be targeted correctly with item# as the item number decreases in the available pool.

# Now the tricky part is modifying the item pool to remove the first item so that it is not selected again.
pool.info.current = pool.info.current[pool.current$item!=item[1],]
pool.current = pool.current[pool.current$item!=item[1],]

plot(theta, info, type="b", main="Test Information Curve to Match", ylab="Information", ylim=c(0,7), lwd=3)
lines(theta, info-.5, col="red", lwd=3)
lines(theta, info+.5, col="red", lwd=3)

lines(theta, test.info[1,], type="b", col=gray(.45), lwd=1.5)

# Now we repeat the above process 19 more times.

for(v in 2:(nitems)) {
# Generate an matrix to hold the score for the current pool of items
pool.score <- data-blogger-escaped-p="p" data-blogger-escaped-pool.info.current="pool.info.current">
# First let's figure out how much we need to gain on the test.
test.gain = (info-test.info[v-1,])/3
if (v>9) test.gain = (info-test.info[v-1,])/(16-v)
if (v>15) test.gain = (info-test.info[v-1,])

# Calculate the scores for all items
for(i in 1:nrow(pool.current)) for(ii in 1:nthetas) pool.score[i,ii] <- data-blogger-escaped-nbsp="nbsp" data-blogger-escaped-p="p"> criteria(pool.info.current[i,ii], test.gain[ii], scale=scale.power)

# Now we sum across items
sum.score <- data-blogger-escaped-1="1" data-blogger-escaped-apply="apply" data-blogger-escaped-p="p" data-blogger-escaped-pool.score="pool.score" data-blogger-escaped-sum="sum">
# The item that we pick is the item with the highest sum score
item[v] <- data-blogger-escaped-item="item" data-blogger-escaped-pool.current="pool.current" data-blogger-escaped-sum.score="=max(sum.score)]</p">
# We add the information from the item that we have selected to the test.
# We are adding our information to the information from the previous items.
test.info[v,] <- data-blogger-escaped-item="=item[v],]+test.info[v-1,]</p" data-blogger-escaped-pool.current="pool.current" data-blogger-escaped-pool.info.current="pool.info.current">
# We once again shrink the pool
pool.info.current = pool.info.current[pool.current$item!=item[v],]
pool.current = pool.current[pool.current$item!=item[v],]

lines(theta, test.info[v,], type="b",lwd=1.5)
}

zvalue[zz] = z
zsum[zz] = sum(abs(info-test.info[20,]))
zfail[zz] = sum(abs(info-test.info[20,])>.5)

results[zz,] <- data-blogger-escaped-c="c" data-blogger-escaped-item="item" data-blogger-escaped-p="p" data-blogger-escaped-sort="sort" data-blogger-escaped-zfail="zfail" data-blogger-escaped-zsum="zsum" data-blogger-escaped-zvalue="zvalue" data-blogger-escaped-zz="zz">}

results
results[order(results[,3]),]

# Using this search algorithm I was able to find several solutions. However, there is no guarantee that this algorithm would have found soluations. A better algorithm for this type of problem would be one capable of evaluating all of the outcomes simultaneously. A genetic algorithm might be fitting though I know that linear programming is often used to solve this problem.

# Note, also, we are randomly generating the item pools so different pools of will produce different results. Control this by setting the seed at the beginning.

Psychometrics by Simulation

Sunday, November 11, 2012

Matching Test Information

No comments:

Post a Comment