You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Erik de Vries
ce5f812252
dfm_gen, merger: Added option for generating lemma_upos hybrids for merged field
...
merger: Added custom clean option (sometimes not cleaning is preferred, even with lemmas)
merger, out_parser: Updated regex for filtering out non-words to also include email addresses (containing both @ and .)
6 years ago
..
actorizer.Rd
actorizer, ud_update: implemented 'ver' variable for keeping track of updates
6 years ago
bulk_writer.Rd
actorizer, ud_update: Updated merging of document fields to properly deal with missing punctuation at the end of fields (e.g. a title without punctuation at the end of the string)
6 years ago
class_update.Rd
class_update; dfm_gen; merger: updated functions to accept text parameter for both old style 'lemmas' and new style 'ud'
6 years ago
dfm_gen.Rd
dfm_gen, merger: Added option for generating lemma_upos hybrids for merged field
6 years ago
dupe_detect.Rd
elasticizer: renamed size parameter to batch_size, created max_batch parameter to limit the number of results returned
6 years ago
elastic_update.Rd
Major overhaul to ES bulk update integration. Added support for both setting and appending to variables
6 years ago
elasticizer.Rd
elasticizer: renamed size parameter to batch_size, created max_batch parameter to limit the number of results returned
6 years ago
lemma_writer.Rd
lemma_writer: new function to write raw lemma's (without interpunction) to text file. Is structured as elasticizer update function (despite not updating anything on the server)
6 years ago
merger.Rd
dfm_gen, merger: Added option for generating lemma_upos hybrids for merged field
6 years ago
modelizer.Rd
Updated elasticizer docs, created modelizer and class_update functions
6 years ago
out_parser.Rd
dfm_gen, out_parser: updated documentation
6 years ago
query_gen_actors.Rd
elasticizer: Updated bulk size to 1024 (a power of 2) and set a timeout of 900s every 500000 updates
6 years ago
query_string.Rd
elasticizer: renamed size parameter to batch_size, created max_batch parameter to limit the number of results returned
6 years ago
ud_update.Rd
actorizer, ud_update: implemented 'ver' variable for keeping track of updates
6 years ago