Sunday, November 11, 2012

Matching Test Information

# Imagine that you are asked to create a test from an item pool that you have.

# The item pool has 100 items and you have to create a test of 20 items in length.

# Easy right?

# The trick is that the test creators want the information on the test to match that of previous tests.

# They give you points on the information curve that your test must approximate in terms of information.  These points are given for every half of a point change in theta.

info  = c(0, .05, .1,  .5, 1.4, 3.5, 5.8,  6, 6.25,  6, 5.8, 5.5,4.5,  3,1.75, .5 , .25)
theta = c(-4, -3.5, -3,-2.5, -2 ,-1.5, -1 ,-.5,    0, .5,   1, 1.5,  2,2.5,   3, 3.5,   4)

plot(theta, info, type="b", main="Test Information Curve to Match", ylab="Information", ylim=c(0,7), lwd=3)

 lines(theta, info-.5, col="red", lwd=3)
 lines(theta, info+.5, col="red", lwd=3)

 # You are allowed a .5 above or below movement in actual information.




# You are given a set of 100 items.  I will randomly generate items for a 3PL model, however your items would be given to you.

npool = 100

pool <- data-blogger-escaped-a="abs(rnorm(npool)*.25+1.1)," data-blogger-escaped-b="rnorm(npool)," data-blogger-escaped-c="abs(rnorm(npool)/7.5+.1))</p" data-blogger-escaped-cbind="cbind" data-blogger-escaped-item="1:npool,">summary(pool)


# You have to choose 20 items from these 100 to construct a test which has information that best matches the target information function.

nitems = 20

# Looking good.

# Each item has a item characteristic curve (ICC) of:
PL3 = function(theta,a, b, c) c+(1-c)*exp(a*(theta-b))/(1+exp(a*(theta-b)))

# and information function defined as:
PL3.info = function(theta, a, b, c) a^2 *(PL3(theta,a,b,c)-c)^2/(1-c)^2 * (1-PL3(theta,a,b,c))/PL3(theta,a,b,c)

# There are many potential ways of solving this problem.  I propose to devise a maximization routine that evaluates each potential item and how it helps to shrink the difference between the target test information and that of the current.  Each item will be added individually until the test has 20 items.

# First I want to calculate the information at each theta for each item.
nthetas = length(theta)

# This matrix will contain the item pool information values
pool.info <- data-blogger-escaped-matrix="matrix" data-blogger-escaped-ncol="nthetas)</p" data-blogger-escaped-nrow="npool,">

colnames(pool.info) <- data-blogger-escaped-heta=",theta,sep=" data-blogger-escaped-p="p" data-blogger-escaped-paste="paste">
# These must be calculated for each item but for all thetas simultaneously.
for(i in 1:npool) {
  pool.info[i,] <- data-blogger-escaped-i="i" data-blogger-escaped-p="p" data-blogger-escaped-pl3.info="pl3.info" data-blogger-escaped-pool="pool" data-blogger-escaped-theta="theta">}


criteria <- data-blogger-escaped-function="function" data-blogger-escaped-item.info="item.info" data-blogger-escaped-p="p" data-blogger-escaped-scale="1)" data-blogger-escaped-test.gain="test.gain">  # test.gain is the potential information that the test still needs
  # If this information is greater than 0 then this item can add as much as test.gain to the overall test information.
  if (test.gain&gt;0) return_score=min(test.gain^scale, item.info^scale)
  # If the test has nothing to gain, then this item will add nothing.
  if (test.gain&lt;=0) return_score= 0
  return_score
}

criteria <- data-blogger-escaped-function="function" data-blogger-escaped-item.info="item.info" data-blogger-escaped-p="p" data-blogger-escaped-scale="1)" data-blogger-escaped-test.gain="test.gain">  if (test.gain-item.info&gt;0) return_score=test.gain^scale
  if (test.gain-item.info<0 data-blogger-escaped-return_score="max(test.gain,0)^scale-2*(item.info-test.gain)^scale</p">  return_score
}

# I generated several different sets of criteria for evaluating each item.

# I found the second criteria to be the best because it gives weight to gains approaching the target score but also give double that weight in gains above the target score.  This is useful because information can be added but the maximization routine does not subtract.

# I have added a z loop into the program to have the program scan over multiple scale factors for solutions.
zz = zfail = zsum = zvalue=0
zseq = seq(.25,15,.10)
# This is a fairly fine gradient to search over.

results <- data-blogger-escaped-matrix="matrix" data-blogger-escaped-ncol="nitems+3," data-blogger-escaped-nrow="length(zseq))</p">colnames(results) <- data-blogger-escaped-ails="ails" data-blogger-escaped-c="c" data-blogger-escaped-ifference_sum="ifference_sum" data-blogger-escaped-nitems="nitems" data-blogger-escaped-p="p" data-blogger-escaped-paste="paste" data-blogger-escaped-sep="" data-blogger-escaped-value="value">

for (z in zseq) {
zz=zz+1

# Let us create the empty test pool.
test.info <- data-blogger-escaped-matrix="matrix" data-blogger-escaped-ncol="nthetas)</p" data-blogger-escaped-nrow="nitems,">colnames(test.info) <- data-blogger-escaped-c="c" data-blogger-escaped-nfot=",theta, sep=" data-blogger-escaped-p="p" data-blogger-escaped-paste="paste">

# The current copies start out as copies of the pool and the pool's information but they shrink as more items are selected.
pool.current = pool
pool.info.current = pool.info

# Now we are ready to think about starting to select items for our test.
# Initially we have no information.  We would like to select the item that satisfies our criteria for best item.

scale.power = z

# Generate an matrix to hold the score for the current pool of items
pool.score <- data-blogger-escaped-p="p" data-blogger-escaped-pool.info.current="pool.info.current">
# First let's figure out how much we need to gain on the test.
test.gain = (info-test.info[1,])

# Calculate the scores for all items
for(i in 1:nrow(pool.current)) for(ii in 1:nthetas) pool.score[i,ii] <- data-blogger-escaped-nbsp="nbsp" data-blogger-escaped-p="p">            criteria(pool.info.current[i,ii], test.gain[ii], scale=scale.power)

# Now we sum across items
sum.score <- data-blogger-escaped-1="1" data-blogger-escaped-apply="apply" data-blogger-escaped-p="p" data-blogger-escaped-pool.score="pool.score" data-blogger-escaped-sum="sum">
# The first item that we pick is the item with the highest sum score
item <- data-blogger-escaped-item="item" data-blogger-escaped-pool.current="pool.current" data-blogger-escaped-sum.score="=max(sum.score)][1]</p"># Placing a [1] ensures that in the unlikely event of a tie, that R picks the first item as the winner.

# We add the information from the item that we have selected to the test.
test.info[1,] <- data-blogger-escaped-item="=item[1],]+0</p" data-blogger-escaped-pool.current="pool.current" data-blogger-escaped-pool.info="pool.info"># Note here we are using the original pool to add information.  This is because the restricted pool would not be targeted correctly with item# as the item number decreases in the available pool.

# Now the tricky part is modifying the item pool to remove the first item so that it is not selected again.
pool.info.current = pool.info.current[pool.current$item!=item[1],]
pool.current = pool.current[pool.current$item!=item[1],]

 plot(theta, info, type="b", main="Test Information Curve to Match", ylab="Information", ylim=c(0,7), lwd=3)
 lines(theta, info-.5, col="red", lwd=3)
 lines(theta, info+.5, col="red", lwd=3)



 lines(theta, test.info[1,], type="b", col=gray(.45), lwd=1.5)

# Now we repeat the above process 19 more times.

for(v in 2:(nitems)) {
  # Generate an matrix to hold the score for the current pool of items
  pool.score <- data-blogger-escaped-p="p" data-blogger-escaped-pool.info.current="pool.info.current">
  # First let's figure out how much we need to gain on the test.
            test.gain = (info-test.info[v-1,])/3
  if (v&gt;9) test.gain = (info-test.info[v-1,])/(16-v)
  if (v&gt;15) test.gain = (info-test.info[v-1,])


  # Calculate the scores for all items
  for(i in 1:nrow(pool.current)) for(ii in 1:nthetas) pool.score[i,ii] <- data-blogger-escaped-nbsp="nbsp" data-blogger-escaped-p="p">                       criteria(pool.info.current[i,ii], test.gain[ii], scale=scale.power)

  # Now we sum across items
  sum.score <- data-blogger-escaped-1="1" data-blogger-escaped-apply="apply" data-blogger-escaped-p="p" data-blogger-escaped-pool.score="pool.score" data-blogger-escaped-sum="sum">
  # The item that we pick is the item with the highest sum score
  item[v] <- data-blogger-escaped-item="item" data-blogger-escaped-pool.current="pool.current" data-blogger-escaped-sum.score="=max(sum.score)]</p">
  # We add the information from the item that we have selected to the test.
  # We are adding our information to the information from the previous items.
  test.info[v,] <- data-blogger-escaped-item="=item[v],]+test.info[v-1,]</p" data-blogger-escaped-pool.current="pool.current" data-blogger-escaped-pool.info.current="pool.info.current">
  # We once again shrink the pool
  pool.info.current = pool.info.current[pool.current$item!=item[v],]
  pool.current = pool.current[pool.current$item!=item[v],]

  lines(theta, test.info[v,], type="b",lwd=1.5)
}

zvalue[zz] = z
zsum[zz] = sum(abs(info-test.info[20,]))
zfail[zz] = sum(abs(info-test.info[20,])&gt;.5)

results[zz,] <- data-blogger-escaped-c="c" data-blogger-escaped-item="item" data-blogger-escaped-p="p" data-blogger-escaped-sort="sort" data-blogger-escaped-zfail="zfail" data-blogger-escaped-zsum="zsum" data-blogger-escaped-zvalue="zvalue" data-blogger-escaped-zz="zz">}



results
results[order(results[,3]),]

# Using this search algorithm I was able to find several solutions.  However, there is no guarantee that this algorithm would have found soluations.  A better algorithm for this type of problem would be one capable of evaluating all of the outcomes simultaneously.  A genetic algorithm might be fitting though I know that linear programming is often used to solve this problem.

# Note, also, we are randomly generating the item pools so different pools of will produce different results.  Control this by setting the seed at the beginning.

Saturday, November 10, 2012

Item Information - IRT


# Interestingly, one of the founders of Item Response Theory (Fredrick Lord) developed his own concept of information.  His approach was both unique and interesting but ended up leaving us with the same Fisher Information Matrix.

# The Fisher Information equation is an equation that measures for any particular value of an estimate, what the particular amount of information you have on that estimate is.  It is a difficult subject that I struggle with.

# It might be useful to look at how it can be found.

# First keep in mind that for Fisher Information we are primarily concerned with Maximum Likelihood.

# For maximum likelihood we are primarily concerned in maximizing the ln(pdf(theta)) by choosing theta.

# Define L = ln(pdf(theta))

# The score is equal to the first derivative of the L with respect to thera.  Define Score = S = dL/dTheta = d ln(pdf(theta))/d theta

# We know that if the MLE maximization routine has worked properly then the score is equal to zero.

# Now the Fisher Information is equal to the expected value of the score squared (the second moment) given theta:

# I = E(S^2|theta)

# Now, the tricky part is thinking how to make sense of this thing we call information.

# First let's think how it is used.

# For one thing the standard error of estimation is se=1/(I)^.5
# In a similar vien, it can be shown that the variance of any unbiased estimator (f) of theta has a lower limit of 1/I: Var(f(theta)) >= 1/I

# Thus we can see that there is a direct inverse relationship between information and the variance of our estimator.

# How do we reconcile this?

# Item Response theory has some very nice applications of information that I believe shed light on other uses of information.

# For the Rasch Model the item information for one item is.  I(theta) = p(theta)(1-p(theta)

# Where p is the probability of getting the item correct.

# Let's map this out for a range of thetas assuming the difficulty parameter is one.

theta = seq(-4,6,.1)

# Now let's define the Rasch function:

rasch = function(theta,b) exp(theta-b)/(1+exp(theta-b))

# Let's also define the information value.

rasch.info = function(theta,b) rasch(theta,b)*(1-rasch(theta,b))

plot(theta, rasch.info(theta,1), ylab = "Information", main="The most information is gained at theta=1 (1PL)", type="l")



# We can see that information peaks at theta=1.  What does this mean for practical purposes?  If you want to know the most about a student's ability give them a question in which the difficulty of the item is at their ability level.  If you though give them a question that is far too easy or far too hard then even if they do well on the question or poor on the question you have not learned that much about their ability level on the theta scale.

# Why?  Because the likelihood of someone else doing equally well on that question who is close on the theta scale is equivalent.

# We can see this by looking at the item characteristic curve:

plot(theta, rasch(theta,1), main="Item Characteristic Curve", ylab="Probability of Correct Response", type="l")



# We can see that the change in the probability of getting the item correct is largest at theta=1.  However, as theta gets very large or very small there is very little change in the probabilities (within each prospective range) of getting the item correct as a result of a small change in theta.

# Another way of thinking about this is: if we had two students in the room and both of them were about the same ability and we wanted to know which was stronger, we would want to give them the question which was most closely aligned at the difficulty related to their ability.

# Let's look at a few more different IRT models.

# The 2 parameter logistic model is very similar to the single parameter:

# I(theta) = a^2*p(theta)*(1-p(theta))

# The only difference exists in that the two parameter model allows for their to be more or less information generated from each item as a result of the "discriminatory" power of the item.  Though it is worth noting that a high a parameter model does not strictly dominate a low a parameter model in terms of information for all values of theta.  This is because the p*(1-p) is also a function of theta.  Let's see this in action:

PL2 = function(theta,a, b) exp(a*(theta-b))/(1+exp(a*(theta-b)))

PL2.info = function(theta, a, b) a^2*PL2(theta,a,b)*(1-PL2(theta,a,b))

plot(theta, PL2.info(theta,2,1), ylab = "Information", main="Larger a creates a larger peak but shallower tails", type="l")

lines(theta, PL2.info(theta,.75,1))



# We can see that we greatly prefer a=2 for theta values between -1 and 3.

# However, the item with less discriminatory power may have more information in the tails.  This is somewhat difficult to understand.  On a test, a good example may be imagine two different questions.  One question, is a arithmatic question for which students must demonstrate knowledge of long division.  The alternative, is a question in which students answer a word problem by piecing together components and understanding the concepts behind the math.  The first question may be a better question for identifying if a student either knows or does not know long division.  However, the second question, may be less good at identifying specific mastery at a particular skill level, but rather demonstrates the ability of the student to pull together various math concepts into a coherent answer.  Thus we might not be able to infer much about arithmatic ability if a student answers this question correctly.  But, any student answering this question correctly whatever, their ability level tells us something about their ability.

# We will also look briefly at the 3 parameter Logistic Model:

# I(theta) = a^2 *(p-c)^2/(1-c)^2 * (1-p)/p

# It can be shown that as c increases, the information function monotonically decreases.  This makes sense in that the guessing parameter c is the opposite of information.  As c gets larger, the likelihood of the student getting the question right by pure chance also gets larger.

PL3 = function(theta,a, b, c) c+(1-c)*exp(a*(theta-b))/(1+exp(a*(theta-b)))

PL3.info = function(theta, a, b, c) a^2 *(PL3(theta,a,b,c)-c)^2/(1-c)^2 * (1-PL3(theta,a,b,c))/PL3(theta,a,b,c)

plot(theta, PL3.info(theta,1.2,1,.4), ylab = "Information", main="3PL information a=1.2, b=1, c=.4", type="l")



# We can see that the 3 parameter logistic model has an asymetric form.  This is the result of the guessing parameter.  The problem with respondents having the ability to guess correctly is that we end up having less information on low ability respondents than we would like.