229 Commits (e3c8d04984ceca6f653d838c6e5e56a61042f144)
 

Author SHA1 Message Date
Erik de Vries 755a58d84d dupe_detect: fix to prevent errors when a query returns no results
6 years ago
Erik de Vries 887f1aa774 dupe_detect: fix for empty results dataframe (no duplicates for given date and newspaper)
6 years ago
Erik de Vries 993f39957a dfm_gen: word cutoff now as final step in script, caused bugs with mutating code variables
6 years ago
Erik de Vries 085252abda documentation: updated dupe_detect and merger
6 years ago
Erik de Vries 02b8a8c1da dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff).
6 years ago
Erik de Vries 4a713ddc23 bulk_writer: setting names(x) <- NULL when there is only one value (list or otherwise) to be updated.
6 years ago
Erik de Vries 6bb8f9b635 class_update: added explicit httr::: references
6 years ago
Erik de Vries f543d658bd Major overhaul to ES bulk update integration. Added support for both setting and appending to variables
6 years ago
Erik de Vries 4adae2bbc6 Fixed bug in dupe_detect caused by switch from cutoff to cutoff_lower/upper
6 years ago
Erik de Vries 4cd46d1a5e dupe_detect: added support for both lower and upper cutoff point
6 years ago
Erik de Vries 11d8b31c60 Added generic actor search query generator. Updated elasticizer and elastic_update to connect either to the remote server, or a local ES instance
6 years ago
Erik de Vries 3e66c7e1cd Updated dfm_gen to have all topic vectors as numeric variables
6 years ago
Erik de Vries 20d7510a89 Merge branch 'master' of https://git.thijsdevries.net/edevries/mamlr
6 years ago
Erik de Vries adc4b3c639 Updated feature selection in modelizer function (see comment on lines 166/167)
6 years ago
Erik de Vries 919e71ac68 Updated feature selection in modelizer function (see comment on lines 166/167)
6 years ago
Erik de Vries 65f8c26ec6 Renamed dupe_detect, and added return output
6 years ago
Erik de Vries db418d7396 Add query_string function for generating query_string queries
6 years ago
Erik de Vries d203de0b2a Updated elasticizer docs, created modelizer and class_update functions
6 years ago
Erik de Vries c815dc7f2b Duplicate detection first commit
6 years ago
Erik de Vries 1f06b0b716 Lowered R version req to 3.3.1
6 years ago
Erik de Vries 015411feaf Added refresh=wait_for to bulk update url. This should make update scripts less demanding on the server side, because the server only replies after refreshing (happens every second)
6 years ago
Erik de Vries 413ad02a87 Set default to "lemmas" for dfm_gen
6 years ago
Erik de Vries 217ee76568 V 0.1 for elasticizer function with updater support
6 years ago
Erik de Vries a273524105 Added support for custom update function to elasticizer
6 years ago
Erik de Vries 311838b34b Updated dfm_gen to only create derivative codes if majorTopic actually exists, and set docvars to NULL when no majorTopic codes
6 years ago
Erik de Vries dc4daf9de4 Added line to replace multiple whitespace characters in full text by a single regular whitespace
6 years ago
Erik de Vries 0e45c0f2d1 Added option for fulltext vs lemmas merged field
6 years ago
Erik de Vries 4cfb508a50 Fix to DESCRIPTION
6 years ago
Erik de Vries 4bbe84ab83 First release of mamlr package
6 years ago