Hashing

  • In machine learning, a mechanism for Bucketing categorical data, particularly when the number of categories is large, but the number of categories actually appearing in the dataset is comparatively small.
  • For example, Earth is home to about 60,000 tree species. You could represent each of the 60,000 tree species in 60,000 separate categorical buckets. Alternatively, if only 200 of those tree species actually appear in a dataset, you could use hashing to divide tree species into perhaps 500 buckets.