You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
mamlr/R
Erik de Vries a3c3651c79
elasticizer: updated scroll time to be longer than the timeouts every 200000 articles (so 20m scroll time, 900s (15m) timeout)
6 years ago
..
bulk_writer.R changed udpipe output variable from tokens to ud 6 years ago
class_update.R bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname 6 years ago
dfm_gen.R dfm_gen: word cutoff now as final step in script, caused bugs with mutating code variables 6 years ago
dupe_detect.R Fixed dupe_detect error on documents with one sentence or less, and a maximum # of words in dfm_gen 6 years ago
elastic_update.R bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname 6 years ago
elasticizer.R elasticizer: updated scroll time to be longer than the timeouts every 200000 articles (so 20m scroll time, 900s (15m) timeout) 6 years ago
merger.R dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff). 6 years ago
modelizer.R modelizer: fixed error when only one class is predicted for junk classification (borderline case) 6 years ago
query_gen_actors.R Added generic actor search query generator. Updated elasticizer and elastic_update to connect either to the remote server, or a local ES instance 6 years ago
query_string.R Add query_string function for generating query_string queries 6 years ago
ud_update.R changed udpipe output variable from tokens to ud 6 years ago