Erik de Vries
8e920f5f37
elasticizer: removed idiotic 15min sleep time after 500 batches
6 years ago
Erik de Vries
ac37d836f5
elasticizer: added scroll_clear to null hits as well
6 years ago
Erik de Vries
75623856f7
elasticizer: updated scroll_clear to use conn object
6 years ago
Erik de Vries
c2d666c81d
bogus commit
6 years ago
Erik de Vries
e34460bf0f
elasticizer: clear scroll context when finishing query
6 years ago
Erik de Vries
9bd526fee0
elasticizer: fixed compatibility issues with elastic v1.0.0
6 years ago
Erik de Vries
f2312f65d5
elasticizer: update to account for syntax change in newer package versions
6 years ago
Erik de Vries
a1b6c6a7cb
actorizer, query_gen_actors: revamped actor searches entirely
...
elasticizer: updated script for use with ES 7.x
6 years ago
Erik de Vries
4f8b1f2024
elasticizer: renamed size parameter to batch_size, created max_batch parameter to limit the number of results returned
...
query_string: renamed x parameter to query, added fields parameter to select what fields to return and random boolean parameter to define whether the returned results should be randomized
6 years ago
Erik de Vries
54dfb6a8ca
actorizer: major fix to ud parsing, changed regex to remove html tags to only include tags with a maximum of 20 characters in them
...
ud_update: major fix to ud parsing, changed regex to remove html tags to only include tags with a maximum of 20 characters in them
elastic_update: set the minimum break between retries from 10 to 30 seconds
elasticizer: implementation of retries for elasticizer function, 10 retries with a break of 30 seconds in between
6 years ago
Erik de Vries
39005c7518
elasticizer: Updated bulk size to 1024 (a power of 2) and set a timeout of 900s every 500000 updates
...
query_gen_actors: Added an additional generator for the "Institution" type (for EU support)
actorizer: Created an updater function to search for actors and use UDPipe to parse the results
6 years ago
Erik de Vries
a3c3651c79
elasticizer: updated scroll time to be longer than the timeouts every 200000 articles (so 20m scroll time, 900s (15m) timeout)
6 years ago
Erik de Vries
4ad5357e15
elasticizer: Added 900s timeout after every batch of 200000 articles when updating, to allow ES to do some segment merges (and clean up disk space)
6 years ago
Erik de Vries
f543d658bd
Major overhaul to ES bulk update integration. Added support for both setting and appending to variables
6 years ago
Erik de Vries
11d8b31c60
Added generic actor search query generator. Updated elasticizer and elastic_update to connect either to the remote server, or a local ES instance
6 years ago
Erik de Vries
d203de0b2a
Updated elasticizer docs, created modelizer and class_update functions
6 years ago
Erik de Vries
217ee76568
V 0.1 for elasticizer function with updater support
6 years ago
Erik de Vries
a273524105
Added support for custom update function to elasticizer
6 years ago
Erik de Vries
4bbe84ab83
First release of mamlr package
6 years ago