mamlr

Commit Graph

Author	SHA1	Message	Date
Erik de Vries	7218f6b8d0	dupe_detect: fixed error on no duplicates	6 years ago
Erik de Vries	b9be372543	dupe_detect: fix to get correct colnames from simil (disable stringsAsFactors and convert col values to numeric)	6 years ago
Erik de Vries	1955692346	dfm_gen, out_parser: updated documentation dupe_detect: major fix to function, no longer using rownames for article ids	6 years ago
Erik de Vries	d0e9bf565b	dupe_detect: Reset the _delete value to 1 out_parser: fix to sentence parsing, add additional (empty) string at end of merged field, to make merged field end on .	6 years ago
Erik de Vries	ea8cfb071f	dupe_detect: updated _delete var to be 2 when delete is true	6 years ago
Erik de Vries	0a3bdb630b	actorizer, dfm_gen, ud_update: unified output parsing from _source and highlight fields into a single function (out_parser) out_parser: function to parse raw text output into a single field, either from _source or highlight fields dupe_detect: updated function to use 'ver' parameter for versioning	6 years ago
Erik de Vries	ef51ce60a9	Fixed dupe_detect error on documents with one sentence or less, and a maximum # of words in dfm_gen	6 years ago
Erik de Vries	0e8c127b86	bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname class_update/elastic_update: Moved response checking to elastic_update dupe_detect: Finalized dupe_detect	6 years ago
Erik de Vries	755a58d84d	dupe_detect: fix to prevent errors when a query returns no results	6 years ago
Erik de Vries	887f1aa774	dupe_detect: fix for empty results dataframe (no duplicates for given date and newspaper)	6 years ago
Erik de Vries	02b8a8c1da	dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff).	6 years ago
Erik de Vries	4adae2bbc6	Fixed bug in dupe_detect caused by switch from cutoff to cutoff_lower/upper	6 years ago
Erik de Vries	4cd46d1a5e	dupe_detect: added support for both lower and upper cutoff point	6 years ago
Erik de Vries	65f8c26ec6	Renamed dupe_detect, and added return output	6 years ago

14 Commits (8eedec8bb5d0647f6f337de264443b5abd8ffacb)