Commit Graph

35 Commits (e76a914dd28bec3536385642a3aa7c834c21f937)

Author SHA1 Message Date
Erik de Vries 889e7e92af lemma_writer: updated to provide support for writing raw documents to individual files using utf-8 encoding 6 years ago
Erik de Vries 6f5ace8c52 actor_fetcher: elasticizer batch function to fetch actorsDetail fields from all relevant documents 6 years ago
Erik de Vries 0d81d6fc7a added aggregator and aggregator_elastic functions for aggregating and storing article level actor aggregations 6 years ago
Erik de Vries e3b26c0be3 actor_aggregation: Added function to generate aggregate actor measures at daily, weekly, monthly and yearly level 6 years ago
Erik de Vries 8051a81b66 actorizer, dfm_gen, modelizer, out_parser: replaced all instances of detectCores by cores parameter (which defaults to detectCores) 6 years ago
Erik de Vries a1b6c6a7cb actorizer, query_gen_actors: revamped actor searches entirely 6 years ago
Erik de Vries 5b9793cd8c actorizer: removed nested mclapply 6 years ago
Erik de Vries 9b0ac775af class_update: add ver variable to set version for class updated articles 6 years ago
Erik de Vries 85306007f4 class_update: added words and clean parameters, in addition to text parameter, to be able to set data preprocessing exactly the same as in the trained model 6 years ago
Erik de Vries ce5f812252 dfm_gen, merger: Added option for generating lemma_upos hybrids for merged field 6 years ago
Erik de Vries 386ac42aee lemma_writer: new function to write raw lemma's (without interpunction) to text file. Is structured as elasticizer update function (despite not updating anything on the server) 6 years ago
Erik de Vries 1955692346 dfm_gen, out_parser: updated documentation 6 years ago
Erik de Vries 34531b0da8 out_parser: added option to clean output using regex to remove numbers and non-words 6 years ago
Erik de Vries 4f8b1f2024 elasticizer: renamed size parameter to batch_size, created max_batch parameter to limit the number of results returned 6 years ago
Erik de Vries 0a3bdb630b actorizer, dfm_gen, ud_update: unified output parsing from _source and highlight fields into a single function (out_parser) 6 years ago
Erik de Vries 8ffbddc073 actorizer, ud_update: implemented 'ver' variable for keeping track of updates 6 years ago
Erik de Vries ae23456736 actorizer, ud_update: Updated merging of document fields to properly deal with missing punctuation at the end of fields (e.g. a title without punctuation at the end of the string) 6 years ago
Erik de Vries 9f3418ef37 class_update; dfm_gen; merger: updated functions to accept text parameter for both old style 'lemmas' and new style 'ud' 6 years ago
Erik de Vries 39005c7518 elasticizer: Updated bulk size to 1024 (a power of 2) and set a timeout of 900s every 500000 updates 6 years ago
Erik de Vries 061da17c2a ud_update: Added function to lemmatize documents 6 years ago
Erik de Vries ef51ce60a9 Fixed dupe_detect error on documents with one sentence or less, and a maximum # of words in dfm_gen 6 years ago
Erik de Vries 0e8c127b86 bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname 7 years ago
Erik de Vries 085252abda documentation: updated dupe_detect and merger 7 years ago
Erik de Vries f543d658bd Major overhaul to ES bulk update integration. Added support for both setting and appending to variables 7 years ago
Erik de Vries 4cd46d1a5e dupe_detect: added support for both lower and upper cutoff point 7 years ago
Erik de Vries 11d8b31c60 Added generic actor search query generator. Updated elasticizer and elastic_update to connect either to the remote server, or a local ES instance 7 years ago
Erik de Vries adc4b3c639 Updated feature selection in modelizer function (see comment on lines 166/167) 7 years ago
Erik de Vries 65f8c26ec6 Renamed dupe_detect, and added return output 7 years ago
Erik de Vries db418d7396 Add query_string function for generating query_string queries 7 years ago
Erik de Vries d203de0b2a Updated elasticizer docs, created modelizer and class_update functions 7 years ago
Erik de Vries c815dc7f2b Duplicate detection first commit 7 years ago
Erik de Vries 217ee76568 V 0.1 for elasticizer function with updater support 7 years ago
Erik de Vries a273524105 Added support for custom update function to elasticizer 7 years ago
Erik de Vries 0e45c0f2d1 Added option for fulltext vs lemmas merged field 7 years ago
Erik de Vries 4bbe84ab83 First release of mamlr package 7 years ago