---
title: "Computational  Cognitive  Science 2019-2020"
references:
- author:
  - family: Farrell
    given: Simon
  - family: Lewandowsky
    given: Stephan
  id: Farrell
  issued:
    year: 2018
  publisher: Cambridge University Press
  title: Computational Modeling of Cognition and Behavior
  type: book
- URL: https://doi.org/10.1016/0022-2496(88)90043-0
  author:
  - family: Smith
    given: Philip L.
  - family: Vickers
    given: Douglas
  container-title: Journal of Mathematical Psychology
  id: Smith88
  issue: 2
  issued:
    year: 1988
  page: 135-168
  publisher: Elsevier
  title: The accumulator model of two-choice discrimination
  type: article-journal
  volume: 32
output:
  html_document: default
  word_document: default
  pdf_document: default
---
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Tutorial 2: Parameter Estimation

## General Information

The exercises in this tutorial are designed so that you can do them by directly editing the R-markdown (.Rmd) file. 
If you want to execute code in R-markdown you need to put your code in a R-cell like so:


```{r}
2+3
```

You can run cells in R-markdown using the little green "play" arrow in the cell, the "run" button at the top of the editor window, or keyboard shortcuts visible from the run button's menu. Running a cell will show its output directly below the cell. Alternatively, you can generate pdf or html documents by executing `Knit` (in the toolbar in R-studio), which is a great way to communicate your results.

You can also copy your code into a script file and execute those or simply perform one-off calculations in the R console. In this case, be sure you're running R commands from the **console** window, rather than the **terminal** window.

#### Package: GGplot2 for Data Visualization

This tutorial will use ggplot2, a powerful library for data visualization. It doesn't come with base R, so we'll have to install it -- although if you're using a laboratory computer it may already be there. 

This command will load the library if it's available:
```{r load_ggplot}
require(ggplot2)
```

If that command gave a warning message, you need to install the package.
To do that, run the command below, then try `require(ggplot2)` again.
```{r install_ggplot, eval=FALSE}
install.packages("ggplot2", dependencies = TRUE)
```

GGplot2 follows a specific approach to data visualization known as the 'grammar of graphics.' We will use examples from the library here without going in-depth into the overall principles, but you may find it helpful to refer to the package's [documentation](https://ggplot2.tidyverse.org/).

## Fitting Reaction Times

We will continue working with our replication of the @Smith88 data. To simplify our task somewhat we will only look at data for the mean angle of 22.5 degrees. We will also ignore if the participants' responses were correct or not. This is a debatable decision, especially if one wants to provide a general model of the cognitive process underlying the task. However, it should be less of an issue for this particular condition, as most responses for this angle should be correct.


```{r load}
data <- read.csv("data.csv", header = TRUE)
```

To verify our hunch that most of the 22.5 angle judgments were correct, we can visualize the distribution of correct and incorrect judgments for each angle. The code below produces a histogram of judgments and steps through some key elements of data visualization with ggplot2 along the way.

```{r histogram}
ggplot(data) + # the core 'ggplot' function takes the data as input
  aes( # 'aesthetics' mapping: which aspects of the data will we plot?
    x=absMean, # the values of the 'absMean' column will be plotted on the x axis
    fill=factor(correct)) + # the 'correct' column will set the color fill of the bars
      # ^ the 'factor' function converts from numbers (0,1) to strings ("0","1")
  geom_histogram( # 'geometry' layer: what kind of plot is this?
    bins=10) + # specify a sensible number of bins for this histogram
  ggtitle("Correct and incorrect judgments by mean angle") # add title
  
```

#### *Exercise*: Use exact counts to confirm that most of the responses in the mean angle 22.5 condition were correct. One nice way to get counts in R is to use the `table()` command. 


```{r table-counts1}
table(data$absMean)
```
```{r table-counts2}
table(data$absMean, data$correct)
```


#### *Exercise*: The plot above clearly shows a relationship between accuracy, in terms of number of correct responses, and average angle of the stimuli. Can you think of other factors in the data which might have some relation to mean accuracy? If so, try to make a plot with these factors to check.

**Solution:**  *Orientation of the stimuli, toward left or right, might also have an effect on accuracy, and would be worth visualizing. You could look at accuracy by orientation overall:*

```{r histogram_orientation}
ggplot(data) +
  aes(x=orientation, fill=factor(correct)) + 
  stat_count(geom="bar") + # ggplot won't do histograms for non-continuous variables,
    # ^ this is a workaround to visualize tabular counts
  ggtitle("Correct and incorrect judgments by orientation")
```

*This doesn't look too interesting - accuracy seems roughly the same for both orientations, maybe a bit more on the left, hard to say.* 

*However, the earlier histogram shows that mean angle has a large effect on accuracy, so if we don't incorporate both factors we will likely overlook key interactions. Here's a plot of accuracy by orientation and mean angle:*

```{r histogram_ext}
ggplot(data) +
  aes(x=absMean, fill=factor(correct)) + 
  geom_histogram(bins=10) + 
  ggtitle("Correct and incorrect judgments by mean angle and orientation") +
  facet_grid(orientation ~ .) 
    # ^ facet_grid here divides into a plot for each value of 'orientation'
```

*Clearly, there's an interaction between orientation and mean angle: participants are more likely to give wrong responses when the mean is low, but they're even more likely to give low responses when the mean is low **and** the stimuli are oriented toward the left.*

*Perhaps this is related to other factors not reported in the data, e.g. handedness? Plotting the accuracy rate for individual participants would also be interesting at this stage, left as a further exercise to the reader.*

We now store all rows for which the mean angle was 22.5 degrees (both positive and negative angles). This is the data set we will be working with in this tutorial.


```{r data22}
data.22 <- data[data$absMean == 22.5,]
```

In this tutorial we will focus exclusively on the decision times (or reaction times [RTs]). Before doing any more advanced analysis it is a good idea to look at the data.

#### *Exercise*: Plot the distribution of decision times. You can modify the plotting code above.

```{r rt-dist}
ggplot(data.22) + # mandatory: change data source
  aes(x=RT, # mandatory: change variable plotted on x axis
    fill=factor(correct)) + # optional: keep color fill, although it's not needed here
  geom_histogram(
    bins=50) + # optional: guess a good # of bins for this variable
  ggtitle("Reaction time distribution for judgments at mean 22.5° angle") 
```

*A more useful plot would exclude the outliers, giving a fuller picture of the data:*

```{r rt-dist-trimmed}
ggplot(data.22[data.22$RT < 10000,]) + # mandatory: change data source
  aes(x=RT, # mandatory: change variable plotted on x axis
    fill=factor(correct)) + # optional: keep color fill, although it's not needed here
  geom_histogram(
    bins=50) + # optional: guess a good # of bins for this variable
  ggtitle("Reaction time distribution for judgments at mean 22.5° angle, RT < 10K") 
```

#### *Exercise*: If our model of RTs predicts a distribution - what are some important qualities that this distribution should have? 

**Solution:** *It should have qualities that are shared by real response times, e.g., it should be real-valued, positive, continuous,  and asymmetric. We might also want it to accommodate our theoretical commitments about reaction times, e.g., that there's a floor on reaction times imposed by nerve transmission times (via, say, a shift). It might also be good to account for extreme outliers, but this could be left to a mixture model.*

We discussed the random walk model in tutorial 1, which allowed us to account for both the choice (left or right) and the decision times in speeded-decision tasks. In this tutorial we will use a different approach and model the decision times with a Weibull distribution.

The Weibull distribution has a few advantages over the previous random walk model. First, the random walk model assumes discrete time steps - and therefore cannot be directly applied to our data. Second, using a standard statistical distribution allows us to conveniently and directly compute quantities like moments and likelihoods. Finally, a Weibull represents the distribution of the minimum of several Weibull distributions and as such one could argue can account for the speeded-decisions between the two options in our experiment.

Let's get an intuition for what the Weibull distribution looks like. The density function of the Weibull, as well as a random number generators and quantiles are available in R via `dweibull()`, `rweibull()` and `qweibull()`. As you can see from the documentation of these functions, two parameters determine the Weibull density - its scale and shape. We can visualize the effect of varying these parameters over a range of inputs.

```{r weibull}
x_range <- seq(0,10, by=0.1)

ggplot() +
  aes(x = x_range) + # set same x range for all lines
  geom_line(aes(y = dweibull(x_range, shape=0.1, scale=1)), 
            color="red") + # plot *red* line for weibull with shape & scale (0.1, 1)
  annotate("text", x=8, hjust=0, 
           y=.9, col="red", label="(0.1,1)") + # red annotation for (0.1, 1)
  geom_line(aes(y = dweibull(x_range, shape=1, scale=1)),
            color="blue") + # plot *blue* line for weibull with shape & scale (1, 1)
  annotate("text", x=8, hjust=0, 
           y=.85, col="blue", label="(1,1)") +  # lower, blue annotation for (1, 1)
  ylab("Weibull density") + # label y axis
  theme_bw() # optional: give plot a nicer theme

```

#### *Exercise*: Extend the plot above by adding the density of the Weibull distribution for the following shape and scale parameters: [(shape = 1.5, scale = 1), (1.5, 3), (5,3), (9, 3)], using `dweibull(range, shape, scale)`. How do the parameters influence the shape of the distribution?

**Solution:** *A small value for the shape parameter gives a more skewed distribution (when the shape parameter equals 1, we obtain an exponential), and increasing the shape parameter produces a more normal (bell-shaped) distribution. The scale stretches out the distribution (from F&L).*

```{r wb-plot-dens-A}
# solution A: write it all out as one big plot
# adding layers incrementally like this can be useful during data exploration,
#  although with this many you probably want to do some more code streamlining
#   (see next solution for a better approach overall)

ggplot() +
  aes(x = x_range) + # set same x range for all lines
  geom_line(aes(y = dweibull(x_range, shape=0.1, scale=1)), 
            color="red") + # plot *red* line for weibull with shape & scale (0.1, 1)
  annotate("text", x=8, hjust=0, 
           y=.9, col="red", label="(0.1,1)") + # red annotation for (0.1, 1)
  geom_line(aes(y = dweibull(x_range, shape=1, scale=1)),
            color="blue") + # plot *blue* line for weibull with shape & scale (1, 1)
  annotate("text", x=8, hjust=0, 
           y=.85, col="blue", label="(1,1)") +  # lower, blue annotation for (1, 1)
  geom_line(aes(y = dweibull(x_range, shape=1.5, scale=1)),
            color="orange") + 
  annotate("text", x=8, hjust=0, 
           y=.8, col="orange", label="(1.5,1)") +    
  geom_line(aes(y = dweibull(x_range, shape=1.5, scale=3)),
            color="yellow") + 
  annotate("text", x=8, hjust=0, 
           y=.75, col="yellow", label="(1.5,3)") +  
  geom_line(aes(y = dweibull(x_range, shape=5, scale=3)),
            color="green") + 
  annotate("text", x=8, hjust=0, 
           y=.7, col="green", label="(5,3)") +  
  geom_line(aes(y = dweibull(x_range, shape=9, scale=3)),
            color="purple") + 
  annotate("text", x=8, hjust=0, 
           y=.65, col="purple", label="(9,3)") +  
  ylab("Weibull density") + # label y axis
  theme_bw()  # optional: give plot a nicer theme
```

```{r wb-plot-dens-B}
# solution B: build desired mappings directly into a data frame,
#   and let ggplot do everything else (preferred style)
#   - note coloring and legend are handled automatically, with no repeated code!

# build data frame with specified inputs
n_lines = 6 # number of distributions we're plotting
len_x = length(x_range) # number of steps in the range
wb_df <- data.frame(x = rep(x_range, n_lines),
                    shape = rep(c(0.1, 1, 1.5, 1.5, 5, 9), each=len_x),
                    scale = rep(c(1, 1, 1, 3, 3, 3), each=len_x))
# add derived values
wb_df$shape_scale = with(wb_df, paste0("(", shape, ",", scale, ")"))
wb_df$y = with(wb_df, dweibull(x, shape, scale)) # calculate density for each set of inputs


ggplot(wb_df) +
  aes(x=x, y=y, 
      color=shape_scale) + # setting 'color' aesthetic tells ggplot to group on this
  geom_line() + # so geom layer automatically plots separate lines grouped by color input
  ylab("Weibull density") +  
  theme_bw() 
```


## Fitting Decision Times with the Weibull Distribution

We will augment the Weibull distribution by an additional parameter - a shift parameter. The shift parameter shifts the entire distribution, changing its mean but leaving the overall shape (after the point it is shifted to) unchanged. The shift parameter has been used in previous research and it allows to represent an initial interval that might correspond to the time required to encode the stimulus.

With this additional parameter the mean of the shifted Weibull is:

```{r wb-mean}
weibull.mean <- function(shape, scale, shift=0) {
  scale * gamma(1+1/shape) + shift
}
```

We will start our parameter estimation by fitting the mean of the shifted Weibull to the decision times in our data set using the root-mean-squared error (RMSE) that you saw in the lectures. We will use the default optimization procedure in R `optim()`. Also note that we are using the parameter order `[shape, scale, shift]` and not `[shift, scale, shape]` as in @Farrell.

```{r wb-mean-opt}
wb.rmse <- function(rtData, par) {
  if (par[1] <= 0 | par[2] <0){
    # for negative shape/scale parameters the Weibull is undefined
    # so we return a high error
    10000000
  }
  else {
    sqrt((weibull.mean(par[1], par[2], par[3]) - mean(rtData))^2)
  }
}
```

#### *Exercise*: Fit the Weibull using the `optim` function for the RMSE function above on the decision time (`RT`) data (`data.22`). The `optim` function requires you to pass starting values to the optimization algorithm, set these to (255, 255, 0).

**Solution:**
```{r wb-mean-opt1}
rmseOpt <- optim(par = c(225, 225, 0), fn = wb.rmse, rtData = data.22$RT)
```

#### *Exercise*: What are the parameters resulting from the optimization? 

```{r wb-mean-par}
rmseOpt$par
```

#### *Exercise*: What is the corresponding mean of the Weibull for those parameters? What is the error for the optimized parameters? 

```{r wb-mean-fit1}
weibull.mean(rmseOpt$par[1],rmseOpt$par[2],rmseOpt$par[3])
wb.rmse(data.22, rmseOpt$par)
```

#### *Exercise*: How well does the mean match the experiment data? Plot the fitted density and the experiment data (you might want to exclude the more extreme RTs of the participants for this plot). Is this a satisfying fit? Do you have any intuitions as to how it might be improved?

```{r wb-fit-m-eval}
rts <- data.22[data.22$RT < 10000,]$RT
rt_range <- seq(0, max(rts), by=1) # range of RTs: input for weibull

# here we layer estimated densities over the (rescaled) histogram of RTs
# so we take advantage of the flexibility to set different inputs to each layer
ggplot() + 
  geom_histogram(aes(x=rts, # this layer gets participant RTs
                     y=..density..), # histogram defaults to counts: this scales it for density
                 bins=50) + 
  annotate("text", x=4000, hjust=0, y=.014, label="Participant RTs") +
  geom_line(aes(x=rt_range, # this layer gets the estimated weibull
                y=dweibull(rt_range-rmseOpt$par[3], #shifted inputs
              rmseOpt$par[1],
              rmseOpt$par[2])),
            col="red") +
  annotate("text", x=4000, hjust=0, y=.012, col="red", label="RMSE Mean Fit Weibull") +
  theme_bw() +
  ggtitle("Participant RTs and Wiebull density RMSE-fit to mean")
```

One way to improve our approach is to take into account the shape of the distribution. Often, this is done by not only fitting the mean, but fitting several quantiles of the data.

```{r wb-fit-qp}
q_p <- c(.1,.3,.5,.7,.9)

wb.qrmse <- function(rtData, par, qp) {
  if(any(par<=0)){
    return(10000000)
  }
  q_pred <- qweibull(qp, shape=par[1], scale=par[2]) + par[3]
  sqrt(mean((q_pred-quantile(rtData, qp))^2))
}
```


#### *Exercise*: Fit the shifted Weibull to the decision-time data. Use the RMSE of the 5 quantiles `q_p` instead of the mean for optimization. Again, use `(225, 255, 0)` as starting values for the optimization.

```{r wb-qp-opt}
rmseQuantOpt <- optim(par = c(225, 255, 0), fn = function(x) wb.qrmse(data.22$RT, x, q_p))
```

#### *Exercise*: What parameters did the optimization find? Compare them to the parameters we found using only the mean for the RMSE.

```{r wb-fit-mqe-res}
rmseQuantOpt$par
```

#### *Exercise*: What is the RMSE for the model trained on the quantiles? Compare it to the RMSE for the model trained only on the mean.

```{r wb-rmse}
wb.qrmse(data.22$RT, rmseQuantOpt$par, q_p)
```

#### *Exercise*: Plot the results of the optimization and the participants' decision times - how well does the mean match the experiment data?

```{r wb-fit-mq-eval}
# using rts and rt_range from cell above

ggplot() + 
  geom_histogram(aes(x=rts, y=..density..), bins=50) + # this layer gets participant RTs
  annotate("text", x=4000, hjust=0, y=.0011, label="Participant RTs") +
  geom_line(aes(x=rt_range, # this layer gets the estimated weibull
                y=dweibull(rt_range-rmseQuantOpt$par[3], #shifted inputs
              rmseQuantOpt$par[1],
              rmseQuantOpt$par[2])),
            col="red") +
  annotate("text", x=4000, hjust=0, y=.001, col="red", label="RMSE Quantile Fit Weibull") +
  theme_bw() +
  ggtitle("Participant RTs and Wiebull density RMSE-fit to quantiles")

```

#### *Exercise*: A nice way to display the accuracy of model predictions is to plot the true values against the predictions. This is usually called a Q-Q plot. Plot the Q-Q plot for the quantiles for our data (`quantile()`) against the quantiles of the fitted Weibull (`qweibull()`). Additionally, plot a line, $f(x) = x$ on top of the data (try `geom_abline()` in ggplot2). What does this line correspond to?

```{r wb-fit-qq}
qm <- qweibull(q_p, rmseQuantOpt$par[1],rmseQuantOpt$par[2]) + rmseQuantOpt$par[3]
qe <- quantile(rts)

ggplot() +
  geom_point(aes(qe, qm)) +
  geom_abline(aes(intercept=0, slope=1)) +
  ylim(range(qe)) + # to ensure entire empirical range is plotted
  labs(x="Empirical Quantiles", y="Model Quantiles",
       title="Empirical vs Model Quantiles") +
  theme_bw()
```

#### *Exercise*: By default `optim` uses the Nelder-Mead optimization procedure, that you saw in the lecture. The optimization procedure requires us to provide starting guesses for the parameters. Evaluate the results of the optimization (for the quantile optimization) for a range of starting values. For scale and shape generate 250 random uniform starting values between 0.1 and 500 (`runif()`). For the shift generate 250 random uniform values between 0 and 10. Loop through all starting values and calculate the RMSE error of the found parameters and store the errors in an array. Then plot and summarize (`summary()`) the optimized errors - how much do they vary?

```{r errors}

scaleStart <- runif(250, min=0.1, max=500)
shapeStart <- runif(250, min=0.1, max=500)
shiftStart <- runif(250, min=0, max=10)

params <- cbind(scaleStart, shapeStart, shiftStart) # make a matrix of start parameters

# N.B. this procedure involves iterating the optimization process 
#      over our 250 randomly chosen start parameters
# in other languages, e.g. Python, this would be more naturally implemented in a for loop
#   (and it can be successfully implemented as a for loop here)
#   but R functions are often optimized to run efficiently over vectors,
#     which also makes the code easier to parallelize if needed
#   so R code is often stylistically more functional:
#     the preference is to *apply* a function over a vector or matrix, 
#     then another function to the resulting output,
#     rather than looping over the matrix and 
#     performing several steps in each iteration.
#   the following code reflects this stylistic preference.
#     documentation for the 'apply' family of functions 
#     (i.e. apply, sapply, lapply, tapply) can be a useful reference.

# run optimization for all starting parameter values
opt <- apply(params, 1, # this tells apply to iterate over rows
               optim, fn = function(x) wb.qrmse(data.22$RT, x, q_p))
# extract optimized parameter values from results
opt_par <- sapply(opt, function(x) x[["par"]]) 
# get error for optimized parameters
err <- apply(opt_par, 2, # iterate over columns
             wb.qrmse, rtData=data.22$RT, qp=q_p)

ggplot() +
  geom_point(aes(x=1:length(err), y=err)) +
  labs(title="Optimized Parameter RMSE",
       x = "Optimization", y="RMSE") +
  theme_bw()
```

```{r}
summary(err)
```

With three free parameters there are many equivalent solutions for the task and the optimization procedures we have used  can lead to different results for different starting values. As you have seen, to avoid sub-optimal parameter estimates, it is a good idea to run optimization procedures for many starting values. 

When we fit the mean of the Weibull to the data we saw that achieving a low error does not guarantee that our model parameters capture the underlying data. In general, it is a good idea to check how model predictions look and how closely they match the actual data, as well as subgroups in our data.

### Fitting Individual Participants

Let's check how well our model captures individual participants. To make our task a bit easier we will exclude participants who made fewer than 20 decisions.

```{r ll-d}
t22 <- table(data.22$sessionId) #gives us counts for each id
t22 <- t22[t22 >= 20]

t22ids <- unique(names(t22))
```

Now we plot the QQ plots for these 5 participants.

```{r ppqqp}
# make new data frame with data only from these individuals
data.t22 <- data.22[data.22$sessionId %in% t22ids, ]
# remove factor levels for excluded participants
data.t22$sessionId <- data.t22$sessionId[ , drop=TRUE]
# get RT quantiles for each individual
qe.t22 <- with(data.t22, tapply(RT, sessionId, quantile, probs=q_p))
# turn into data frame
data.qe.t22 <- data.frame(participant = rep(names(qe.t22), each=length(q_p)),
                          p_n = rep(paste0("P", 1:length(t22), ", N=", t22), 
                                    each=length(q_p)), # label N for each subject
                          quantile = rep(q_p, length(qe.t22)),
                          model_q = rep(qm, length(qe.t22)),
                          rt = unlist(qe.t22, use.names = FALSE))

ggplot(data.qe.t22) +
  aes(x=rt, y=model_q, col=participant) +
  geom_point() +
  geom_abline(aes(slope=1, intercept=0)) +
  expand_limits(x=max(data.qe.t22$model_q), # this is a tweak to make sure
                y=min(data.qe.t22$rt)) + # the correct data range is plotted
  theme_bw() +
  guides(col="none") + # hide legend for color
  facet_wrap(p_n ~ .) + # separate plots for each individual, indicating N trials
  ggtitle("QQ plots for 5 individual participants vs model quantiles")
```

As you can see, our model does not capture these participants very well. This is not surprising, since participants' decision times differ and it is challenging to account for this variability if we only fit a single model. Therefore, researchers have attempted to characterize individual differences in their data by modelling each participant individually and then describing the estimated parameters for these individual models.
In addition to fitting individual participants, we will now use maximum likelihood to estimate our parameters.

#### *Exercise*: What is the goal of maximum likelihood estimation? On a high level, how does it differ from our approach up to this point? What are some reasons for and against using MLE?

**Solution:** *The goal of MLE is to estimate optimal values for each of the model's parameters by maximizing the likelihood of the data given the parameters. MLE provides an answer to the question "which model instance best fits the data?" when "best fits" is defined as "gives the most evidence for". If $\theta$ is a particular assignment of values to the model’s parameters and y represents the observed data then the MLE answer to this question is given by:*

$$\hat{\theta} = argmax_{\theta} L(\theta|y) = argmax_{\theta}P(y|\theta)$$

*Until now, our modeling approach has been to find the parameters which minimized the root mean squared error (RMSE) between the model's predictions and the observed values. For MLE, we define a probability distribution and estimate parameters via likelihood maximization.*

*We may prefer to use MLE over error minimization because MLE is rooted in statistical theory and comes with some attractive properties such as parametrization invariance, consistency, efficiency, and asymptotic normality. RMSE might be preferable in cases where it's difficult to define a probability distribution.*


```{r wb-ll}
wb.lldev <- function(par, data)  {
  # We are deviating from F&W's practice of assigning a very high (but finite)
  # NLL when parameters are invalid or the data will have probability zero.
  # You may want to consider some of the pros and cons of their approach.
  if (any (par<=0) || any(data - par[3] < 0)) {
    return (Inf)
  }
  ll <- dweibull(data - par[3], shape=par[1], scale=par[2])
  # Sometimes people will compute the "deviance", or twice the NLL. This term is
  # relevant in some model selection settings, but the difference isn't so
  # important here, and the MLE is unchanged by this transformation.
  -2*sum(log(ll))
}
```

#### *Question*: Why do we return the negative log-likelihood? Why do we subtract the shift parameter from the data?
**Solution:** *The `optim` function minimizes the error. Since the log values will be between $-infty$ and 0 for probabilities 0 and 1 we need the negative log. The shift parameter is a constant which offsets the entire distribution, so it is subtracted from each data point. The density can then be calculated using the shifted values.*

We will fit each participant using maximum likelihood and store the optimized parameter values in the `ppRTsParameters` data frame.

```{r fit-ll-wb-pp}
fit.wb.lldev <- function(p_id, df=data.t22) {
  d <- df[df$sessionId == p_id,]$RT # get RT data for participant
  d <- d[d < 1000] # remove outliers
  optP <- optim(c(100,1000,.1), fn = function(x) wb.lldev(x, d)) # run optimization
  return(optP[["par"]]) # return extracted parameters
}

ppRTsParameters <- t(sapply(t22ids, fit.wb.lldev))
colnames(ppRTsParameters) <- c("shape", "scale", "shift") # we know the parameters are in this order
```

### *Exercise*: Summarize the participant parameters. How much do the found parameters differ? Is there anything peculiar about the fitted parameters? If so, what might be going on?

```{r llppwb}
summary(ppRTsParameters)
```

### *Exercise*: Plot the fitted parameters and the real data.

```{r pp-llplot}
maxRT <- 4000 # we only calculate & plot up to this
# calculate fitted density per participant
calc.wb.dens <- function(pars) dweibull(1:maxRT-pars[["shift"]], pars[["shape"]], pars[["scale"]])
dens <- apply(ppRTsParameters, 1, calc.wb.dens)
ppRTDens <- data.frame(sessionId=rep(colnames(dens), each=maxRT), 
                       range=rep(1:maxRT, length(colnames(dens))),
                       fit=c(dens), 
                       row.names=NULL)

ggplot() + 
  geom_histogram(aes(x=RT, y = ..density..),
               data=data.t22[data.t22$RT < maxRT,],
               bins=70) + 
  geom_line(aes(x=range, 
                y=fit),
            data=ppRTDens,
            col="red") +
  theme_bw() +
  facet_wrap(sessionId ~ .) +
  ggtitle("Fitted density for individual participant RTs and MLE-fit Wiebull")

# not sure what happened to the third participant data there, whoops
```

```{r pp-llplot2}
# another approach using base R & for loops, which somehow keeps the third participant:

pp <- data.frame(id=rownames(ppRTsParameters), ppRTsParameters, row.names=NULL)
par(mfrow = c(3, 2)) 
for (id in t22ids) {
  pd <- data.22[data.22$sessionId == id, ]
  maxRT <- max(pd$RT/3)
  fitP <- pp[pp$id == id,]
  xs <- seq(0, 8000, by = 1)
  ds <- dweibull(xs-fitP$shift, shape=fitP$shape, scale=fitP$scale)
  
  hist(pd$RT, main="Participant RTs", xlab = "RT [ms]",
     breaks =30, freq = FALSE)
  
  lines(xs, ds)
}
```

#### *Exercise*: Setting aside idiosyncrasies of the current case, what are advantages of fitting models for each participant individually? Are there any disadvantages? What would be the ideal way to account for individual differences?

**Solution:** Participants differ considerably - thus having one shared model will not do the data justice. However, we also do not really believe that all participants are different. Therefore, it would be preferable to find sets of participants that share the same behavior.

# References