Erik de Vries
|
887f1aa774
|
dupe_detect: fix for empty results dataframe (no duplicates for given date and newspaper)
|
6 years ago |
Erik de Vries
|
02b8a8c1da
|
dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff).
|
6 years ago |
Erik de Vries
|
4adae2bbc6
|
Fixed bug in dupe_detect caused by switch from cutoff to cutoff_lower/upper
|
6 years ago |
Erik de Vries
|
4cd46d1a5e
|
dupe_detect: added support for both lower and upper cutoff point
|
6 years ago |
Erik de Vries
|
65f8c26ec6
|
Renamed dupe_detect, and added return output
|
6 years ago |