12 Commits (e70b6ccf7a127e436385de102cf8665f2a7f8457)

Author SHA1 Message Date
Erik de Vries e70b6ccf7a actorizer: fixed sentence_count and out_parser calls
6 years ago
Erik de Vries 4407a99774 actorizer: fix to get actual number of sentence occurences of actor
6 years ago
Erik de Vries 96e869fa6b actorizer: previous commit was wrong, only add is an option, removed type variable
6 years ago
Erik de Vries 98219c807c actorizer: Added type option, to choose between setting or adding to the actor variables, defaults to add (should normally not be changed)
6 years ago
Erik de Vries e3b57ed9e3 actorizer: added clean = F to have the exact same behavior in ud_update and actorizer
6 years ago
Erik de Vries 0a3bdb630b actorizer, dfm_gen, ud_update: unified output parsing from _source and highlight fields into a single function (out_parser)
6 years ago
Erik de Vries 8ffbddc073 actorizer, ud_update: implemented 'ver' variable for keeping track of updates
6 years ago
Erik de Vries ae23456736 actorizer, ud_update: Updated merging of document fields to properly deal with missing punctuation at the end of fields (e.g. a title without punctuation at the end of the string)
6 years ago
Erik de Vries 54dfb6a8ca actorizer: major fix to ud parsing, changed regex to remove html tags to only include tags with a maximum of 20 characters in them
6 years ago
Erik de Vries 8caf53b90a actorizer: switched to single core processing for debugging
6 years ago
Erik de Vries c63409238b actorizer: print row numbers for debugging
6 years ago
Erik de Vries 39005c7518 elasticizer: Updated bulk size to 1024 (a power of 2) and set a timeout of 900s every 500000 updates
6 years ago