Commit Graph

17 Commits (fc16cc5833cfd187f4d5251ab1c07847328967b7)

Author SHA1 Message Date
Your Name aa6587b204 dupe_detect: fix for quotation marks 5 years ago
Your Name 2a220ded5d dupe_detect: fix to query string for multi-word doctype names 5 years ago
Your Name 5bd36dcb44 dupe_detect: Changed query from json to query_string style, and added filter for already detected duplicates 5 years ago
Erik de Vries 7218f6b8d0 dupe_detect: fixed error on no duplicates 7 years ago
Erik de Vries b9be372543 dupe_detect: fix to get correct colnames from simil (disable stringsAsFactors and convert col values to numeric) 7 years ago
Erik de Vries 1955692346 dfm_gen, out_parser: updated documentation 7 years ago
Erik de Vries d0e9bf565b dupe_detect: Reset the _delete value to 1 7 years ago
Erik de Vries ea8cfb071f dupe_detect: updated _delete var to be 2 when delete is true 7 years ago
Erik de Vries 0a3bdb630b actorizer, dfm_gen, ud_update: unified output parsing from _source and highlight fields into a single function (out_parser) 7 years ago
Erik de Vries ef51ce60a9 Fixed dupe_detect error on documents with one sentence or less, and a maximum # of words in dfm_gen 7 years ago
Erik de Vries 0e8c127b86 bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname 7 years ago
Erik de Vries 755a58d84d dupe_detect: fix to prevent errors when a query returns no results 7 years ago
Erik de Vries 887f1aa774 dupe_detect: fix for empty results dataframe (no duplicates for given date and newspaper) 7 years ago
Erik de Vries 02b8a8c1da dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff). 7 years ago
Erik de Vries 4adae2bbc6 Fixed bug in dupe_detect caused by switch from cutoff to cutoff_lower/upper 7 years ago
Erik de Vries 4cd46d1a5e dupe_detect: added support for both lower and upper cutoff point 7 years ago
Erik de Vries 65f8c26ec6 Renamed dupe_detect, and added return output 7 years ago