Towards cognitively plausible data science in language research (2016), Milin, Divjak, Dimitrijevic and Baayen
Identify difficult and easy forms (from lemma to plural form)
Check if human participants also react differently to independently identified difficult and easy forms Compare NDL learning model to TiMBL and human results
MDVM computes the distance between two values of a feature to reflect their patterns of co-occurrence with categories
Using MDVM adds an Unsupervised Learning component to MBL Hoste (2005) because essentially it clusters feature values and uses that information
Using larger values of k with MDVM is helpful
Easy words that are frequent tokens (forms) are reacted to faster
Maybe this interaction doesn’t occur with difficult words because there is less variation in the frequency of the difficult words?
This seems similar to results with regular past tense forms in English:
Strikingly, TiMBL’s inflectional class probabilities turn out to be predictive in production and comprehension, i. e., for lexical decision latencies.
Two Grapheme to Lexeme Measures
Diversity Sum of the absolute values of the activations of all possible outcomes, given a set of input cues.
Input cues that activate many different outcomes give rise to a highly diverse activation vector, which in turn indicates a high degree of Uncertainty about the intended outcome.
G2L-Prior Sum of the absolute values of the weights on the connections from all cues to a given outcome.
independent of the actual cues encountered in the input
reflects the prior availability of an outcome, its entrenchment in the learning network
TiMBL assigns higher probabilities to forms belonging to lemmas with letter trigraphs that yield more diverse activations
Those trigraphs belong to a rich exemplar space in the memory
it would be expected that higher probabilities would result in shorter response latencies
However, NDL’s G2L-Diversity was in fact positively correlated with RTs, indicating inhibition, i. e. slower recognition.
TiMBL probabilities are intended to capture the likelihood of a form’s occurrence in production.
in comprehension (lexicality judgments) high trigraphs diversity may hurt results
Spontaneous recovery from extinction
After a CS is learned to associated with a given Conditioned Response (CR), this association is unlearned
Theoretically, it can not arise again without retraining
But in real life, sometimes seemingly completely forgotten associations are reactivated
shows extinction is not unlearning
responses that disappear are not necessarily forgotten
Suggests loss of activation is not simply the mirror of acquiring associations
Given two conditions stimuli, (CS) where one is more salient, the more salient CS will develop a strong association with the CR (Conditioned Response)
Some linguistic things can be learned with NDL and this might show use something about the problem
What made NDL so nice for animal learning might not scale up to linguistic phenomena