Erik de Vries
d420b02c20
elasticizer: Added more verbosity to investigate error handling
6 years ago
Erik de Vries
48b589dda0
query_gen_actors: reset to original state
6 years ago
Erik de Vries
7a01a7f18d
query_gen_actors: temporary update for fixing broken shit
6 years ago
Erik de Vries
45da9dd929
aggregator_elastic: revert to single-core lapply, due to sendMaster errors
6 years ago
Erik de Vries
f8e4111e70
aggregator_elastic: correct partyid implementation
6 years ago
Erik de Vries
c047a4a1db
aggregator_elastic: explicit reference to aggregator function
6 years ago
Erik de Vries
0d81d6fc7a
added aggregator and aggregator_elastic functions for aggregating and storing article level actor aggregations
6 years ago
Erik de Vries
2281d11a68
actor_aggregation: fixed filenaming of .Rds files
6 years ago
Erik de Vries
d9f28a46d8
actor_aggregation: small fixes to code
6 years ago
Erik de Vries
a29d04dacd
actorizer: fixed handling of empty results due to regex filtering
6 years ago
Erik de Vries
8e920f5f37
elasticizer: removed idiotic 15min sleep time after 500 batches
6 years ago
Erik de Vries
a11d7728ea
actor_aggregation: only aggregate scores on non-junk articles
6 years ago
Erik de Vries
54a70c47a0
actor_aggregation: removed timeout for parallel processing, requires fix in elasticizer (cannot recycle the same connection)
6 years ago
Erik de Vries
58fce4d560
actor_aggregation: added randomized short sleep, to allow for parallel execution
6 years ago
Erik de Vries
e3b26c0be3
actor_aggregation: Added function to generate aggregate actor measures at daily, weekly, monthly and yearly level
...
query_string: Added default_operator parameter, to define whether whitespaces should be interpreted as AND or OR, defaults to AND
6 years ago
Erik de Vries
28989f2bc4
dfm_gen: yet another fix for codes
6 years ago
Erik de Vries
0757b6bf8b
dfm_gen: re-added codes variable
6 years ago
Erik de Vries
2fc48cc2f7
dfm_gen: fixed absence of out$codes field
6 years ago
Erik de Vries
b249ff22de
dfm_gen.R: fixed junk mutation
6 years ago
Erik de Vries
0d05765ca7
dfm_gen: removed last remains of summer sample exceptions
6 years ago
Erik de Vries
e199b23227
dfm_gen: removed exceptions for NO summer codes
...
modelizer: created exception for outer_folds = 1
query_string: added parameter for default_operator
6 years ago
Erik de Vries
fbd525dc2e
modelizer: updated outer cross validation procedure to output raw prediction and true values, instead of processed and aggregated confusion matrix results
6 years ago
Erik de Vries
6a94bc3ed8
query_gen_actors: removed quotation marks from Minister search part
6 years ago
Erik de Vries
8d19333e59
query_gen_actors: changed script order for belgium exceptions
6 years ago
Erik de Vries
3bfe61e425
query_gen_actors: fixed implementation of Belgian exceptions
6 years ago
Erik de Vries
81697345cb
modelizer: removed breaking code
6 years ago
Erik de Vries
9ca952ca89
elastic_update: removed wait_for from url
6 years ago
Erik de Vries
8051a81b66
actorizer, dfm_gen, modelizer, out_parser: replaced all instances of detectCores by cores parameter (which defaults to detectCores)
6 years ago
Erik de Vries
ac37d836f5
elasticizer: added scroll_clear to null hits as well
6 years ago
Erik de Vries
75623856f7
elasticizer: updated scroll_clear to use conn object
6 years ago
Erik de Vries
c2d666c81d
bogus commit
6 years ago
Erik de Vries
e34460bf0f
elasticizer: clear scroll context when finishing query
6 years ago
Erik de Vries
9bd526fee0
elasticizer: fixed compatibility issues with elastic v1.0.0
6 years ago
Erik de Vries
f2312f65d5
elasticizer: update to account for syntax change in newer package versions
6 years ago
Erik de Vries
f6006eb9ba
actorizer: simplified pre/postfix check, only for NA, replace empty strings by NA beforehand
6 years ago
Erik de Vries
298099a4e6
actorizer: fix to deal with empty updates (ie dont do an update)
6 years ago
Erik de Vries
6961c0b866
query_gen_actors: updated actorid filter to use the keyword subfield
6 years ago
Erik de Vries
703b5e59a4
actorizer: fixed exceptionizer by adding whitespace before and after sentence, which is necessary because of negative regex (match anything before or after the highlight string that is NOT x actually requires something to be in front or after)
6 years ago
Erik de Vries
593d2de6e2
actorizer: add pre_tags and post_tags to argument list
...
bulk_writer: updated to use _doc doctype
query_gen_actors: added NA for all searches that don't have pre- or postfixes
6 years ago
Erik de Vries
a1b6c6a7cb
actorizer, query_gen_actors: revamped actor searches entirely
...
elasticizer: updated script for use with ES 7.x
6 years ago
Erik de Vries
88fc4ec53c
dfm_gen: changed out_parser call to mamlr:::out_parser
6 years ago
Erik de Vries
90fdbcc982
out_parser: parallelized when not in windoze
6 years ago
Erik de Vries
6414f759bd
actorizer: parallelized calculation of marker positions
6 years ago
Erik de Vries
522c872dba
out_parser: moved cleaning regex to end of pipeline, to prevent collissions with other (mandatory) regex cleaning
6 years ago
Erik de Vries
5b9793cd8c
actorizer: removed nested mclapply
6 years ago
Erik de Vries
1a4ba19546
actorizer: Removed udmodel dependencies, commented code, changed nested lists to flat lists
...
bulk_writer: changed handling of single-row dataframe parsing to JSON
elastic_update: changed function to return instead of print appData on error
ud_update: Changed nested lists to flat lists, and added start and end character positions
6 years ago
Erik de Vries
3abc3056e0
actorizer: fix to columns selected for actors variable, removed udmodel requirement
6 years ago
Erik de Vries
41c86ea116
actorizer, ud_update: Updated ud parsing and actorizer to work based on character positions. This code is used for local testing
6 years ago
Erik de Vries
eae1a22609
actorizer: update to use '|||' as highlight indicator, and set up ud output merging accordingly
6 years ago
Erik de Vries
5665b6d622
actorizer: more fixes to punctuation
6 years ago