You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
mamlr/R
Erik de Vries 835d2332bc
actorizer: now uses the original udpipe output for sentence and token ids. When the actorized and original udpipe output do not have the same number of rows, it prints an error and sets err to TRUE in actorDetails
6 years ago
..
actorizer.R actorizer: now uses the original udpipe output for sentence and token ids. When the actorized and original udpipe output do not have the same number of rows, it prints an error and sets err to TRUE in actorDetails 6 years ago
bulk_writer.R actorizer, ud_update: Updated merging of document fields to properly deal with missing punctuation at the end of fields (e.g. a title without punctuation at the end of the string) 6 years ago
class_update.R class_update: add ver variable to set version for class updated articles 6 years ago
dfm_gen.R dfm_gen, merger: Added option for generating lemma_upos hybrids for merged field 6 years ago
dupe_detect.R dupe_detect: fixed error on no duplicates 6 years ago
elastic_update.R actorizer: major fix to ud parsing, changed regex to remove html tags to only include tags with a maximum of 20 characters in them 6 years ago
elasticizer.R elasticizer: renamed size parameter to batch_size, created max_batch parameter to limit the number of results returned 6 years ago
lemma_writer.R lemma_writer: new function to write raw lemma's (without interpunction) to text file. Is structured as elasticizer update function (despite not updating anything on the server) 6 years ago
merger.R merger: idiotic fix for a non-problem, see comment on line 32 6 years ago
modelizer.R actorizer, ud_update: Updated merging of document fields to properly deal with missing punctuation at the end of fields (e.g. a title without punctuation at the end of the string) 6 years ago
out_parser.R actorizer: fixed sentence_count and out_parser calls 6 years ago
query_gen_actors.R elasticizer: Updated bulk size to 1024 (a power of 2) and set a timeout of 900s every 500000 updates 6 years ago
query_string.R query_string: updated check for fields value 6 years ago
ud_update.R out_parser: added option to clean output using regex to remove numbers and non-words 6 years ago