Erik de Vries
|
9f3418ef37
|
class_update; dfm_gen; merger: updated functions to accept text parameter for both old style 'lemmas' and new style 'ud'
|
6 years ago |
Erik de Vries
|
993f39957a
|
dfm_gen: word cutoff now as final step in script, caused bugs with mutating code variables
|
6 years ago |
Erik de Vries
|
02b8a8c1da
|
dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff).
|
6 years ago |
Erik de Vries
|
3e66c7e1cd
|
Updated dfm_gen to have all topic vectors as numeric variables
|
6 years ago |
Erik de Vries
|
413ad02a87
|
Set default to "lemmas" for dfm_gen
|
6 years ago |
Erik de Vries
|
311838b34b
|
Updated dfm_gen to only create derivative codes if majorTopic actually exists, and set docvars to NULL when no majorTopic codes
|
6 years ago |
Erik de Vries
|
dc4daf9de4
|
Added line to replace multiple whitespace characters in full text by a single regular whitespace
|
6 years ago |
Erik de Vries
|
0e45c0f2d1
|
Added option for fulltext vs lemmas merged field
|
6 years ago |
Erik de Vries
|
4bbe84ab83
|
First release of mamlr package
|
6 years ago |