16 Commits (2944039f73a2d0e22663e9641205a8eee6b59307)

Author SHA1 Message Date
Your Name 77eb51a1bf actorizer: totally revamped way of finding actors
4 years ago
Your Name 38ff4dcbf0 ud_update: small fix to file naming
4 years ago
Your Name 4b4d860235 class_update: remove dfm_gen multicore option
4 years ago
Erik de Vries 1a4ba19546 actorizer: Removed udmodel dependencies, commented code, changed nested lists to flat lists
6 years ago
Erik de Vries 41c86ea116 actorizer, ud_update: Updated ud parsing and actorizer to work based on character positions. This code is used for local testing
6 years ago
Erik de Vries 34531b0da8 out_parser: added option to clean output using regex to remove numbers and non-words
6 years ago
Erik de Vries 0a3bdb630b actorizer, dfm_gen, ud_update: unified output parsing from _source and highlight fields into a single function (out_parser)
6 years ago
Erik de Vries 9e5a1e3354 ud_update: removed mc.preschedule = F
6 years ago
Erik de Vries c7560d7e32 ud_update: Removed . at end of text, and added mc.preschedule = F for testing
6 years ago
Erik de Vries 37df81b8ff ud_update: fixed merged output field to always contain an (extra) dot (period) at the end of the document
6 years ago
Erik de Vries c32c9e5ad3 ud_update: fix to deal with non-existing column names
6 years ago
Erik de Vries 8ffbddc073 actorizer, ud_update: implemented 'ver' variable for keeping track of updates
6 years ago
Erik de Vries ae23456736 actorizer, ud_update: Updated merging of document fields to properly deal with missing punctuation at the end of fields (e.g. a title without punctuation at the end of the string)
6 years ago
Erik de Vries 54dfb6a8ca actorizer: major fix to ud parsing, changed regex to remove html tags to only include tags with a maximum of 20 characters in them
6 years ago
Erik de Vries 34a6adf64e changed udpipe output variable from tokens to ud
6 years ago
Erik de Vries 061da17c2a ud_update: Added function to lemmatize documents
6 years ago