elasticizer: updated dump handling to create a dump for every batch, instead of one big file at the end
out_parser: streamlined code
query_gen_actors: only include relevant fields
ud_update: changed function parameters to work with elasticizer dump function
dfm_gen: remove multicore, update merger() code
elasticizer: changed filenaming scheme for dump option
merger: Fixed bug where an NA lemma would cause the entire document to become NA. Now the NA lemmas are filtered out before merging
ud_update: removed parallel processing, changed script to save bulk updates in .Rds files instead of sending them straight away
class_update: check if there are idf values associated with model, before applying weights
estimator: make use of preproc() function for data preprocessing
preproc: function containing all logic with regards to text data preprocessing and weighting
cv_generator: generate folds for nested cv
dfm_gen: added optional lowercasing parameter
estimator: estimate model and performance based on parameters
feat_select: select features based on textstat_keyness
metric_gen: convert output from estimator to model performance metrics
modelizer: updated for new pipeline
modelizer_old: old model pipeline
out_parser: now correctly exported
merger: Added custom clean option (sometimes not cleaning is preferred, even with lemmas)
merger, out_parser: Updated regex for filtering out non-words to also include email addresses (containing both @ and .)
query_string: renamed x parameter to query, added fields parameter to select what fields to return and random boolean parameter to define whether the returned results should be randomized
out_parser: function to parse raw text output into a single field, either from _source or highlight fields
dupe_detect: updated function to use 'ver' parameter for versioning
modelizer: Minor update to feature keyness, using absolute values now to determine the most informative features for a class (so features that are either strongly postively or negatively related to the class)
bulk_writer: Added the 'ver' parameter to include a short version string with each update. Mostly to deal with updates that do not complete successfully on all data
query_gen_actors: Added an additional generator for the "Institution" type (for EU support)
actorizer: Created an updater function to search for actors and use UDPipe to parse the results