Milin Et Al.

  • Towards cognitively plausible data science in language research (2016), Milin, Divjak, Dimitrijevic and Baayen
  • Identify difficult and easy forms (from lemma to plural form)
  • Check if human participants also react differently to independently identified difficult and easy forms Compare NDL learning model to TiMBL and human results
  • MDVM computes the distance between two values of a feature to reflect their patterns of co-occurrence with categories
  • Using MDVM adds an Unsupervised Learning component to MBL Hoste (2005) because essentially it clusters feature values and uses that information
  • Using larger values of k with MDVM is helpful
  • Easy words that are frequent tokens (forms) are reacted to faster
  • Maybe this interaction doesn’t occur with difficult words because there is less variation in the frequency of the difficult words?
  • This seems similar to results with regular past tense forms in English:
  • Strikingly, TiMBL’s inflectional class probabilities turn out to be predictive in production and comprehension, i. e., for lexical decision latencies.
  • Two Grapheme to Lexeme Measures
  • Diversity Sum of the absolute values of the activations of all possible outcomes, given a set of input cues.
  • Input cues that activate many different outcomes give rise to a highly diverse activation vector, which in turn indicates a high degree of Uncertainty about the intended outcome.
  • G2L-Prior Sum of the absolute values of the weights on the connections from all cues to a given outcome.
  • independent of the actual cues encountered in the input
  • reflects the prior availability of an outcome, its entrenchment in the learning network
  • TiMBL assigns higher probabilities to forms belonging to lemmas with letter trigraphs that yield more diverse activations
  • Those trigraphs belong to a rich exemplar space in the memory
  • it would be expected that higher probabilities would result in shorter response latencies
  • However, NDL’s G2L-Diversity was in fact positively correlated with RTs, indicating inhibition, i. e. slower recognition.
  • TiMBL probabilities are intended to capture the likelihood of a form’s occurrence in production.
  • in comprehension (lexicality judgments) high trigraphs diversity may hurt results
  • Spontaneous recovery from extinction
  • After a CS is learned to associated with a given Conditioned Response (CR), this association is unlearned
  • Theoretically, it can not arise again without retraining
  • But in real life, sometimes seemingly completely forgotten associations are reactivated
  • shows extinction is not unlearning
  • responses that disappear are not necessarily forgotten
  • Suggests loss of activation is not simply the mirror of acquiring associations
  • Given two conditions stimuli, (CS) where one is more salient, the more salient CS will develop a strong association with the CR (Conditioned Response)
  • Some linguistic things can be learned with NDL and this might show use something about the problem
  • What made NDL so nice for animal learning might not scale up to linguistic phenomena
  • Inductive approaches to cognition