Commit Graph

  • 0e8c127b86 bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname class_update/elastic_update: Moved response checking to elastic_update dupe_detect: Finalized dupe_detect Erik de Vries 2018-11-29 14:19:46 +0100
  • 755a58d84d dupe_detect: fix to prevent errors when a query returns no results Erik de Vries 2018-11-28 16:52:05 +0100
  • 887f1aa774 dupe_detect: fix for empty results dataframe (no duplicates for given date and newspaper) Erik de Vries 2018-11-22 17:29:50 +0100
  • 993f39957a dfm_gen: word cutoff now as final step in script, caused bugs with mutating code variables Erik de Vries 2018-11-16 15:40:05 +0100
  • 085252abda documentation: updated dupe_detect and merger Erik de Vries 2018-11-16 10:14:47 +0100
  • 02b8a8c1da dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff). Erik de Vries 2018-11-15 16:40:15 +0100
  • 4a713ddc23 bulk_writer: setting names(x) <- NULL when there is only one value (list or otherwise) to be updated. Erik de Vries 2018-11-14 20:52:48 +0100
  • 6bb8f9b635 class_update: added explicit httr::: references Erik de Vries 2018-11-13 17:01:56 +0100
  • f543d658bd Major overhaul to ES bulk update integration. Added support for both setting and appending to variables Erik de Vries 2018-11-13 15:03:33 +0100
  • 4adae2bbc6 Fixed bug in dupe_detect caused by switch from cutoff to cutoff_lower/upper Erik de Vries 2018-11-12 14:14:21 +0100
  • 4cd46d1a5e dupe_detect: added support for both lower and upper cutoff point Erik de Vries 2018-11-12 12:58:04 +0100
  • 11d8b31c60 Added generic actor search query generator. Updated elasticizer and elastic_update to connect either to the remote server, or a local ES instance Erik de Vries 2018-11-12 11:31:06 +0100
  • 3e66c7e1cd Updated dfm_gen to have all topic vectors as numeric variables Erik de Vries 2018-11-07 15:39:50 +0100
  • 20d7510a89 Merge branch 'master' of https://git.thijsdevries.net/edevries/mamlr Erik de Vries 2018-11-07 15:14:01 +0100
  • adc4b3c639 Updated feature selection in modelizer function (see comment on lines 166/167) Erik de Vries 2018-11-07 15:10:10 +0100
  • 919e71ac68 Updated feature selection in modelizer function (see comment on lines 166/167) Erik de Vries 2018-11-07 15:10:10 +0100
  • 65f8c26ec6 Renamed dupe_detect, and added return output Erik de Vries 2018-11-06 15:15:45 +0100
  • db418d7396 Add query_string function for generating query_string queries Erik de Vries 2018-11-06 14:17:35 +0100
  • d203de0b2a Updated elasticizer docs, created modelizer and class_update functions NOJunk Erik de Vries 2018-11-06 13:40:31 +0100
  • c815dc7f2b Duplicate detection first commit Erik de Vries 2018-11-05 16:25:02 +0100
  • 1f06b0b716 Lowered R version req to 3.3.1 Erik de Vries 2018-11-05 12:15:38 +0100
  • 015411feaf Added refresh=wait_for to bulk update url. This should make update scripts less demanding on the server side, because the server only replies after refreshing (happens every second) Erik de Vries 2018-10-24 12:15:06 +0200
  • 413ad02a87 Set default to "lemmas" for dfm_gen Erik de Vries 2018-10-23 16:17:17 +0200
  • 217ee76568 V 0.1 for elasticizer function with updater support Erik de Vries 2018-10-23 14:28:37 +0200
  • a273524105 Added support for custom update function to elasticizer Erik de Vries 2018-10-23 14:23:30 +0200
  • 311838b34b Updated dfm_gen to only create derivative codes if majorTopic actually exists, and set docvars to NULL when no majorTopic codes Erik de Vries 2018-10-23 10:40:28 +0200
  • dc4daf9de4 Added line to replace multiple whitespace characters in full text by a single regular whitespace Erik de Vries 2018-10-23 10:27:21 +0200
  • 0e45c0f2d1 Added option for fulltext vs lemmas merged field Erik de Vries 2018-10-23 10:22:23 +0200
  • 4cfb508a50 Fix to DESCRIPTION Erik de Vries 2018-10-22 12:29:31 +0200
  • 4bbe84ab83 First release of mamlr package Erik de Vries 2018-10-22 12:07:53 +0200