217 Commits (0f7b1ee5377d7e5161104409866a23f5d5fd407b)

Author SHA1 Message Date
Erik de Vries 0f7b1ee537 Add single_party param
2 years ago
Erik de Vries 5c80d82828 reintroduced certificate checks, linux01 certs work again
2 years ago
Erik de Vries fcdffb6f58 removed default_field, so that all text fields are queried by default (this also includes any coder comments!)
2 years ago
Erik de Vries 9ae2866c41 remove default user
2 years ago
Erik de Vries b130f9c313 added es_user parameter
2 years ago
Erik de Vries 3f268bbf06 Temporarily disable SSL verification
2 years ago
Erik de Vries 2944039f73 test
3 years ago
Erik de Vries 0b17555d99 sent_merger: Correctly add party metadata for _mfsa aggregations
3 years ago
Erik de Vries 108372452c sent_merger: Correctly add party metadata for _mfsa aggregations
3 years ago
Erik de Vries 16d02a055d sent_merger: Updated sentiment aggregation procedure. Now a dedicated actors_final.csv file is used as source of partyIds for individual actors, instead of the (deprecated) [partyId]_a ids that were previously provided as a result of the actor searches, or the (also deprecated) actor metadata provided in the ES actors database.
3 years ago
Erik de Vries 8875630235 fixed actor metadata generation as well, because the same actorId might occur multiple times in a sentence, if that actor has multiple functions during the same period.
4 years ago
Erik de Vries 9419d6dc08 Fixed incorrect mfs and mfsa aggregations. Previously multiple party/actor mentions in the same sentence (e.g. both a *_f and *_s mention) would all be taken into account separately, while the sentence should only be considered once
4 years ago
Erik de Vries 7703a8cd5b query_gen_actors: removed country argument, now reading country directly from actor data
4 years ago
Erik de Vries 64a48e5977 sent_merger: fixed bug with publication_date and grouper()
4 years ago
Erik de Vries f6dfc6711b minor fix
4 years ago
Erik de Vries 09fd8d0cb2 removed some unused aggregations
4 years ago
Erik de Vries 8ff4097304 renamed actor_merger to sent_merger and implemented fixes to work with sentiment data frames without actor ids
4 years ago
Erik de Vries a37fc0410d removed sent_sum_pos/neg
4 years ago
Erik de Vries 153c54b376 reintroduced arousal, but should be warned that arousal performance is not directly evaluated
4 years ago
Erik de Vries cdc78039ed removing text-level output from sentencizer, and optimizing storage by using factors
4 years ago
Erik de Vries 523d86799c removed arousal measures
4 years ago
Erik de Vries 4a0f2206fd removed multicore support, added parameters for dfm_gen
4 years ago
Your Name 274c9179cb remove meta_file argument
4 years ago
Your Name 6e0e693d4e lemma_writer: removed meta csv code
4 years ago
Your Name 4fd9222a2d lemma_writer: updated to write metadata csv when dumping documents in ud format
4 years ago
Your Name 955f034e6a actor_merger: changed computation of arousal, and removed uninformative variables
4 years ago
Your Name 3cdb68b196 out_parser: updated fncols function
4 years ago
Your Name dc40fbbb19 elasticizer: update rbindlist implementation
4 years ago
Your Name 18d47762d2 actor_merger: overhaul to include cutoffs at sentence level as intended, also included options to generate sentiment for text only (don't provide actors_meta or actor_groups)
4 years ago
Your Name 74909ca3a0 sentencizer: removed text sentiment computation from script, because of incorrect implementation
4 years ago
Your Name c99ac23bb5 actor_merger: fixed absence of publication_date in some cases
4 years ago
Your Name cc7fa5bffa actor_merger: added aggregations of all individual actors and all party mentions in an article
4 years ago
Your Name d9d578c06a actor_merger: mult fix
4 years ago
Your Name 771145faf7 actor_merger: added mult='first' to metadata join for parties_actors to deal with duplicate partyIds (see 50Plus, Conservatives and Labour)
4 years ago
Your Name 1c14646e8f actor_merger: dont deselect sent_words and sent_sum columns
4 years ago
Your Name 9bd382f955 actor_merger: fix to generate bogus sentiment columns
4 years ago
Your Name b7f1afddd1 actor_merger: total rewrite based on data.table for performance reasons. Added some exceptions due to non-existing partyIds that some individual actors have in the actor database
4 years ago
Your Name 2c8a88f9a0 elasticizer: switched from bind_rows to rbindlist for composing result
4 years ago
Your Name 559199bb97 sentencizer: totally removed sent_lemmas field
4 years ago
Your Name 36f2b341a8 sentencizer: removed derived output from function
4 years ago
Your Name 80ec0be1f8 actorizer: updated to account for token start offset in udpipe output. Sometimes, the first token in an article doesn't start at character position 1 (or 2 if the article starts with a whitespace), but at position 16 and possibly other positions.
4 years ago
Your Name 336567732c elastic_update: added more debug output
4 years ago
Your Name df7631b9f1 sentencizer: Changed output, removed lemma list and added separate positive and negative sentiment sums
4 years ago
Your Name ecdb5be3b4 actorizer: moved some code
4 years ago
Your Name 69d4b6f5b0 actorizer: updated to data.table for conditional joins
4 years ago
Your Name 085855908c query_gen_actors: switched from Minister to Min
4 years ago
Your Name b406304c80 actorizer: Removed nested parallelization function
4 years ago
Your Name 5de4e1488c estimator, modelizer, preproc: Removed experimental we-vector support, and disabled inefficiently implemented preproc.R
4 years ago
Your Name 77eb51a1bf actorizer: totally revamped way of finding actors
4 years ago
Your Name 0e593075ee query_gen_actors: only retrieve ud field from source
5 years ago