Elman 1990

The network learned generalizations
examine hidden unit activation pattern for each word measure distance between each pattern and every other pattern (Euclidean Distance)
Use this to create a hierarchical cluster.
Network learned semantic classes
If the input to a simulation is preselected to avoid problems, one has instantiated an expert Filtering system.
in order to accomplish the goal of creating word classes by surface structure alone, it appears that the input must be filtered in just the right way.
Instead of semantic representations, semantics gets replaced with distributional information
- This is not what humans know about word classes.
- If the simulation’s goals are accomplished by avoiding pronouns then we have the equivalent of a pronoun filter
Some strings in English are both nouns and verbs, e.g. smell, break
The simulation did not learn what children learn
Yes, the input was oversimplified, but it’s not clear that adding these additional Features will make the model perform worse
language is very redundant, so certain simplifications actually remove helpful Features
Categories can ‘emerge’ via statistical regularities
Basic RNN Architectures can find these

Subhaditya's KB