--- title: "Computational Cognitive Science 2018-2019" references: - author: - family: Farrell given: Simon - family: Lewandowsky given: Stephan id: Farrell issued: year: 2018 publisher: Cambridge University Press title: Computational Modeling of Cognition and Behavior type: book - URL: https://doi.org/10.1016/0022-2496(88)90043-0 author: - family: Smith given: Philip L. - family: Vickers given: Douglas container-title: Journal of Mathematical Psychology id: Smith88 issue: 2 issued: year: 1988 page: 135-168 publisher: Elsevier title: The accumulator model of two-choice discrimination type: article-journal volume: 32 output: pdf_document: default word_document: default html_document: default --- --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Tutorial 2: Parameter Estimation ## General Information The exercises in this tutorial are designed so that you can do them by directly editing the R-markdown (.Rmd) file. If you want to execute code in R-markdown you need to put your code in a R-cell like so: ```{r} 2+3 ``` You can run cells in R-markdown using the little green "play" arrow in the cell, the "run" button at the top of the editor window, or keyboard shortcuts visible from the run button's menu. Running a cell will show its output directly below the cell. Alternatively, you can generate pdf or html documents by executing `Knit` (in the toolbar in R-studio), which is a great way to communicate your results. You can also copy your code into a script file and execute those or simply perform one-off calculations in the R console. Be sure you're running R commands from the **console** window, rather than the **terminal** window. ## Fitting Model Parameters Imagine you have conducted an experiment on memory recall. You have developed a novel model that you believe can explain the behavior you found in the experiment, e.g., what words people tend to remember. Instances of your model are specified by sets of parameters. ##### **Question:** Within this context, what is the goal of maximum likelihood estimation (MLE)? #### **Question:** Briefly describe an alternative way of fitting model parameters and compare it to MLE. Why might you choose one over the other? #### **Question:** Could you compute the probability of a particular set of parameter values for your model, given the data, using only its likelihood function? If so, how? If not, why not? ## Fitting Reaction Times We will continue working with our replication of the @Smith88 data. To simplify our task somewhat we will only look at data for the mean angle of 22.5 degrees. We will also ignore if the participants' responses were correct or not. This is a debatable decision, especially if one wants to provide a general model of the cognitive process underlying the task. However, it should be less of an issue for this particular condition, as most responses for this angle should be correct. ```{r load} data <- read.csv("data.csv", header = TRUE) ``` #### *Exercise*: Display the number of judgments for each mean angle (`absMean`). Then confirm that most of the responses in the mean angle 22.5 condition were correct. One nice way to get counts in R is to use the `table()` command. We now store all rows for which the mean angle was 22.5 degrees (both positive and negative angles). This is the data set we will be working with in this tutorial. ```{r data22} data.22 <- data[data$absMean == 22.5,] ``` In this tutorial we will focus exclusively on the decision times (or reaction times [RTs]). Before doing any more advanced analysis it is a good idea to look at the data. #### *Exercise*: Plot the distribution of decision times. As you have seen in tutorial 1, R provides many useful plotting functions. Here a histogram would be appropriate (`hist()`). #### *Exercise*: If our model of RTs predicts a distribution - what are some important qualities that this distribution should have? We discussed the random walk model in tutorial 1, which allowed us to account for both the choice (left or right) and the decision times in speeded-decision tasks. In this tutorial we will use a different approach and model the decision times with a Weibull distribution. The Weibull distribution has a few advantages over the previous random walk model. First, the random walk model assumes discrete time steps - and therefore cannot be directly applied to our data. Second, using a standard statistical distribution allows us to conveniently and directly compute quantities like moments and likelihoods. Finally, a Weibull represents the distribution of the minimum of several Weibull distributions and as such one could argue can account for the speeded-decisions between the two options in our experiment. Let's get an intuition for what the Weibull distribution looks like. The density function of the Weibull, as well as a random number generators and quantiles are available in R via `dweibull()`, `rweibull()` and `qweibull()`. As you can see from the documentation of these functions, two parameters determine the Weibull density - its scale and shape. #### *Exercise*: Plot the density of the Weibull distribution for the following shape and scale parameters: [(shape = 0.1, scale = 1), (1,1), (1.5,1), (1.5, 3), (5,3), (50, 3)]. Plot a line plot (`plot(..., type="l")`) for the density obtained for the range `seq(0,10, by=0.1)`, using `dweibull(range, shape, scale)`. How do the parameters influence the shape of the distribution? ## Fitting Decision Times with the Weibull Distribution We will augment the Weibull distribution by an additional parameter - a shift parameter. The shift parameter shifts the entire distribution, changing its mean but leaving the overall shape (after the point it is shifted to) unchanged. The shift parameter has been used in previous research and it allows to represent an initial interval that might correspond to the time required to encode the stimulus. With this additional parameter the mean of the shifted Weibull is: ```{r wb-mean} weibull.mean <- function(shape, scale, shift=0) { scale * gamma(1+1/shape) + shift } ``` We will start our parameter estimation by fitting the mean of the shifted Weibull to the decision times in our data set using the root-mean-squared error (RMSE) that you saw in the lectures. We will use the default optimization procedure in R `optim()`. Also note that we are using the parameter order `[shape, scale, shift]` and not `[shift, scale, shape]` as in @Farrell. ```{r wb-mean-opt} wb.rmse <- function(data, par) { if (par[1] <= 0 | par[2] <0){ # for negative shape/scale parameters the Weibull is undefined # so we return a high error 10000000 } else { (weibull.mean(par[1], par[2], par[3]) - mean(data$RT))^2 } } ``` #### *Exercise*: Fit the Weibull using the `optim` function for the RMSE function above on the decision time (`RT`) data (`data.22`). The `optim` function requires you to pass starting values to the optimization algorithm, set these to (255, 255, 0). #### *Exercise*: What are the parameters resulting from the optimization? #### *Exercise*: What is the corresponding mean of the Weibull for those parameters? What is the error for the optimized parameters? #### *Exercise*: How well does the mean match the experiment data? Plot the fitted density and the experiment data (you might want to exclude the more extreme RTs of the participants for this plot). Is this a satisfying fit? One way to improve our approach is to take into account the shape of the distribution. Often, this is done by not only fitting the mean, but fitting several quantiles of the data. ```{r wb-fit-qp} q_p <- c(.1, .3,.5, .7,.9) wb.qrmse <- function(data, par, qp) { if(any(par<=0)){ return(10000000) } q_pred <- qweibull(qp, shape=par[1], scale=par[2]) + par[3] sqrt(mean((q_pred-quantile(data, qp))^2)) } ``` #### *Exercise*: Fit the shifted Weibull to the decision-time data. Use the RMSE of the 5 quantiles `q_p` instead of the mean for optimization. Again, use `(225, 255, 0)` as starting values for the optimization. #### *Exercise*: What parameters did the optimization find? Compare them to the parameters we found using only the mean for the RMSE. #### *Exercise*: What is the RMSE for the model trained on the quantiles? Compare it to the RMSE for the model trained only on the mean. #### *Exercise*: Plot the results of the optimization and the participants' decision times - how well does the mean match the experiment data? #### *Exercise*: A nice way to display the accuracy of model predictions is to plot the true values against the predictions. This is usually called a Q-Q plot. Plot the Q-Q plot for the quantiles for our data (`quantile()`) against the quantiles of the fitted Weibull (`qweibull()`). Additionally, plot a line, $f(x) = x$ on top of the data (try `abline()` in R). What does this line correspond to? #### *Exercise*: By default `optim` uses the Nelder-Mead optimization procedure, that you saw in the lecture. The optimization procedure requires us to provide starting guesses for the parameters. Evaluate the results of the optimization (for the quantile optimization) for a range of starting values. For scale and shape generate 250 random uniform starting values between 0.1 and 500 (`runif()`). For the shift generate 250 random uniform values between 0 and 10. Loop through all starting values and calculate the RMSE error of the found parameters and store the errors in an array. Then plot and summarize (`summary()`) the optimized errors - how much do they vary? With three free parameters there are many equivalent solutions for the task and the optimization procedures we have used can lead to different results for different starting values. As you have seen, to avoid sub-optimal parameter estimates, it is a good idea to run optimization procedures for many starting values. When we fit the mean of the Wishart to the data we saw that achieving a low error does not guarantee that our model parameters capture the underlying data. In general, it is a good idea to check how model predictions look and how closely they match the actual data, as well as subgroups in our data. ### Fitting Individual Participants Let's check how well our model captures individual participants. To make our task a bit easier we will exclude participants that did not contribute at least 20 decisions. ```{r ll-d} t22 <- table(data.22$sessionId) #gives us counts for each id t22 <- t22[t22 >= 20] t22ids <- unique(names(t22)) ``` Now we plot the QQ plots for these 5 participants. ```{r ppqqp} plot.participantQQ <- function(x, model_q, q_p, ...) { qe <- quantile(x$RT, q_p) par(pty="s") r0 <- min(qe[1], model_q[1]) r1 <- max(qe[-1], model_q[-1]) plot(qe, model_q, pch=16, xlim = c(r0, r1), ylim = c(r0, r1), main=sprintf("Id: %s, n:%s", substring(x$sessionId[1], 1,5), nrow(x)), xlab = "Empirical Quantiles", ylab= "Model Quantiles" ) abline(0,1) } par(mfrow = c(2, 3)) for (id in t22ids) { pd <- data.22[data.22$sessionId == id, ] #call the plot.participantQQ function with the model quantiles you fitted #e.g. if you stored the quantiles in a variable qm: #plot.participantQQ(pd, qm , q_p) } ``` As you can see, our model does not capture these participants very well. This is not very surprising, since participants' decision times differ and it is challenging to account for this variability if we only fit a single model. Therefore, researchers have attempted to characterize individual differences in their data by modelling each participant individually and then describing the estimated parameters for these individual models. In addition to fitting individual participants, we will now also use maximum likelihood to estimate our parameters. ```{r wb-ll} wb.lldev <- function(par, data) { # We are deviating from F&W's practice of assigning a very high (but finite) # NLL when parameters are invalid or the data will have probability zero. # You may want to consider some of the pros and cons of their approach. if (any (par<=0) || any(data - par[3] < 0)) { return (Inf) } ll <- dweibull(data - par[3], shape=par[1], scale=par[2]) # Sometimes people will compute the "deviance", or twice the NLL. This term is # relevant in some model selection settings, but the difference isn't so # important here, and they MLE is unchanged by this transformation. -2*sum(log(ll)) } ``` #### *Question*: Why do we return the negative log-likelihood? Why do we subtract the shift parameter from the data? We will fit each participant using maximum likelihood and store the optimized parameter values in the `ppRTsParameters` data frame. ```{r fit-ll-wb-pp} shapes <- c() scales <- c() shifts <- c() ids <- c() for (id in t22ids) { d <- data.22[data.22$sessionId == id,] d <- d[d$RT < 1000,] optP <- optim(c(100,1000,.1), fn = function(x) wb.lldev(x, d$RT)) shapes <- c(shapes, optP$par[1]) scales <- c(scales, optP$par[2]) shifts <- c(shifts, optP$par[3]) ids <- c(ids, id) } ppRTsParameters <- data.frame(shapes, scales, shifts, ids) ``` ### *Exercise*: Summarize the participant parameters. How much do the found parameters differ? Is there anything peculiar about the fitted parameters? If so, what might be going on? ### *Exercise*: Plot the fitted parameters and the real data. #### *Question*: Setting aside idiosyncrasies of the current case, what are advantages of fitting models for each participant individually? Are there any disadvantages? What would be the ideal way to account for individual differences? # References