Document Triage

  • Characters in file must be MACHINE READABLE (Character Encoding)
  • Character Encoding Identification (ASCII, UNICODE..)
  • Language Identification (English, French,..)
  • Text Sectioning