Erik de Vries
ae23456736
actorizer, ud_update: Updated merging of document fields to properly deal with missing punctuation at the end of fields (e.g. a title without punctuation at the end of the string)
...
modelizer: Minor update to feature keyness, using absolute values now to determine the most informative features for a class (so features that are either strongly postively or negatively related to the class)
bulk_writer: Added the 'ver' parameter to include a short version string with each update. Mostly to deal with updates that do not complete successfully on all data
6 years ago
Erik de Vries
9f3418ef37
class_update; dfm_gen; merger: updated functions to accept text parameter for both old style 'lemmas' and new style 'ud'
6 years ago
Erik de Vries
39005c7518
elasticizer: Updated bulk size to 1024 (a power of 2) and set a timeout of 900s every 500000 updates
...
query_gen_actors: Added an additional generator for the "Institution" type (for EU support)
actorizer: Created an updater function to search for actors and use UDPipe to parse the results
6 years ago
Erik de Vries
061da17c2a
ud_update: Added function to lemmatize documents
6 years ago
Erik de Vries
ef51ce60a9
Fixed dupe_detect error on documents with one sentence or less, and a maximum # of words in dfm_gen
6 years ago
Erik de Vries
0e8c127b86
bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname
...
class_update/elastic_update: Moved response checking to elastic_update
dupe_detect: Finalized dupe_detect
6 years ago
Erik de Vries
085252abda
documentation: updated dupe_detect and merger
6 years ago
Erik de Vries
f543d658bd
Major overhaul to ES bulk update integration. Added support for both setting and appending to variables
6 years ago
Erik de Vries
4cd46d1a5e
dupe_detect: added support for both lower and upper cutoff point
6 years ago
Erik de Vries
11d8b31c60
Added generic actor search query generator. Updated elasticizer and elastic_update to connect either to the remote server, or a local ES instance
6 years ago
Erik de Vries
adc4b3c639
Updated feature selection in modelizer function (see comment on lines 166/167)
6 years ago
Erik de Vries
65f8c26ec6
Renamed dupe_detect, and added return output
6 years ago
Erik de Vries
db418d7396
Add query_string function for generating query_string queries
6 years ago
Erik de Vries
d203de0b2a
Updated elasticizer docs, created modelizer and class_update functions
6 years ago
Erik de Vries
c815dc7f2b
Duplicate detection first commit
6 years ago
Erik de Vries
217ee76568
V 0.1 for elasticizer function with updater support
6 years ago
Erik de Vries
a273524105
Added support for custom update function to elasticizer
6 years ago
Erik de Vries
0e45c0f2d1
Added option for fulltext vs lemmas merged field
6 years ago
Erik de Vries
4bbe84ab83
First release of mamlr package
6 years ago