You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
mamlr/R
Erik de Vries 85aab558e0
bulk_writer: added clause to varname==ud update to also remove the tokens variable from source
6 years ago
..
actorizer.R actorizer: major fix to ud parsing, changed regex to remove html tags to only include tags with a maximum of 20 characters in them 6 years ago
bulk_writer.R bulk_writer: added clause to varname==ud update to also remove the tokens variable from source 6 years ago
class_update.R bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname 6 years ago
dfm_gen.R dfm_gen: word cutoff now as final step in script, caused bugs with mutating code variables 6 years ago
dupe_detect.R Fixed dupe_detect error on documents with one sentence or less, and a maximum # of words in dfm_gen 6 years ago
elastic_update.R actorizer: major fix to ud parsing, changed regex to remove html tags to only include tags with a maximum of 20 characters in them 6 years ago
elasticizer.R actorizer: major fix to ud parsing, changed regex to remove html tags to only include tags with a maximum of 20 characters in them 6 years ago
merger.R dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff). 6 years ago
modelizer.R modelizer: fixed error when only one class is predicted for junk classification (borderline case) 6 years ago
query_gen_actors.R elasticizer: Updated bulk size to 1024 (a power of 2) and set a timeout of 900s every 500000 updates 6 years ago
query_string.R Add query_string function for generating query_string queries 6 years ago
ud_update.R actorizer: major fix to ud parsing, changed regex to remove html tags to only include tags with a maximum of 20 characters in them 6 years ago