25 Commits (18d47762d2b1866ec38da426deaf527196dcc875)

Author SHA1 Message Date
Your Name 4b4d860235 class_update: remove dfm_gen multicore option
4 years ago
Your Name a3b6e19646 revised modeling pipeline:
5 years ago
Your Name d9f936c566 modelizer: tf-idf application updated, final model now also includes idf values from training set, explicitly setting positive category in binary classification for confusion matrices, minor code fixes
5 years ago
Erik de Vries e594185719 dfm_gen: set default cores to 1
5 years ago
Erik de Vries 28989f2bc4 dfm_gen: yet another fix for codes
5 years ago
Erik de Vries 0757b6bf8b dfm_gen: re-added codes variable
5 years ago
Erik de Vries 2fc48cc2f7 dfm_gen: fixed absence of out$codes field
5 years ago
Erik de Vries b249ff22de dfm_gen.R: fixed junk mutation
5 years ago
Erik de Vries 0d05765ca7 dfm_gen: removed last remains of summer sample exceptions
5 years ago
Erik de Vries e199b23227 dfm_gen: removed exceptions for NO summer codes
5 years ago
Erik de Vries 8051a81b66 actorizer, dfm_gen, modelizer, out_parser: replaced all instances of detectCores by cores parameter (which defaults to detectCores)
5 years ago
Erik de Vries 88fc4ec53c dfm_gen: changed out_parser call to mamlr:::out_parser
6 years ago
Erik de Vries ce5f812252 dfm_gen, merger: Added option for generating lemma_upos hybrids for merged field
6 years ago
Erik de Vries 1955692346 dfm_gen, out_parser: updated documentation
6 years ago
Erik de Vries 34531b0da8 out_parser: added option to clean output using regex to remove numbers and non-words
6 years ago
Erik de Vries 0a3bdb630b actorizer, dfm_gen, ud_update: unified output parsing from _source and highlight fields into a single function (out_parser)
6 years ago
Erik de Vries 9f3418ef37 class_update; dfm_gen; merger: updated functions to accept text parameter for both old style 'lemmas' and new style 'ud'
6 years ago
Erik de Vries 993f39957a dfm_gen: word cutoff now as final step in script, caused bugs with mutating code variables
6 years ago
Erik de Vries 02b8a8c1da dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff).
6 years ago
Erik de Vries 3e66c7e1cd Updated dfm_gen to have all topic vectors as numeric variables
6 years ago
Erik de Vries 413ad02a87 Set default to "lemmas" for dfm_gen
6 years ago
Erik de Vries 311838b34b Updated dfm_gen to only create derivative codes if majorTopic actually exists, and set docvars to NULL when no majorTopic codes
6 years ago
Erik de Vries dc4daf9de4 Added line to replace multiple whitespace characters in full text by a single regular whitespace
6 years ago
Erik de Vries 0e45c0f2d1 Added option for fulltext vs lemmas merged field
6 years ago
Erik de Vries 4bbe84ab83 First release of mamlr package
6 years ago