You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
mamlr/R
Erik de Vries 4ad5357e15
elasticizer: Added 900s timeout after every batch of 200000 articles when updating, to allow ES to do some segment merges (and clean up disk space)
6 years ago
..
bulk_writer.R changed udpipe output variable from tokens to ud 6 years ago
class_update.R bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname 6 years ago
dfm_gen.R dfm_gen: word cutoff now as final step in script, caused bugs with mutating code variables 6 years ago
dupe_detect.R Fixed dupe_detect error on documents with one sentence or less, and a maximum # of words in dfm_gen 6 years ago
elastic_update.R bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname 6 years ago
elasticizer.R elasticizer: Added 900s timeout after every batch of 200000 articles when updating, to allow ES to do some segment merges (and clean up disk space) 6 years ago
merger.R dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff). 6 years ago
modelizer.R modelizer: fixed error when only one class is predicted for junk classification (borderline case) 6 years ago
query_gen_actors.R Added generic actor search query generator. Updated elasticizer and elastic_update to connect either to the remote server, or a local ES instance 6 years ago
query_string.R Add query_string function for generating query_string queries 6 years ago
ud_update.R changed udpipe output variable from tokens to ud 6 years ago