0f7b1ee537Add single_party param Fix actor.first to use min() instead of first()Erik de Vries2023-03-27 17:28:24 +0200
5c80d82828reintroduced certificate checks, linux01 certs work againErik de Vries2022-11-24 13:30:21 +0100
fcdffb6f58removed default_field, so that all text fields are queried by default (this also includes any coder comments!)Erik de Vries2022-11-22 16:45:29 +0100
0b17555d99sent_merger: Correctly add party metadata for _mfsa aggregationsErik de Vries2022-01-25 18:39:27 +0100
108372452csent_merger: Correctly add party metadata for _mfsa aggregationsErik de Vries2022-01-25 18:39:27 +0100
16d02a055dsent_merger: Updated sentiment aggregation procedure. Now a dedicated actors_final.csv file is used as source of partyIds for individual actors, instead of the (deprecated) [partyId]_a ids that were previously provided as a result of the actor searches, or the (also deprecated) actor metadata provided in the ES actors database.Erik de Vries2022-01-25 17:57:53 +0100
8875630235fixed actor metadata generation as well, because the same actorId might occur multiple times in a sentence, if that actor has multiple functions during the same period.Erik de Vries2021-05-08 11:20:20 +0200
9419d6dc08Fixed incorrect mfs and mfsa aggregations. Previously multiple party/actor mentions in the same sentence (e.g. both a *_f and *_s mention) would all be taken into account separately, while the sentence should only be considered onceErik de Vries2021-05-07 15:34:59 +0200
7703a8cd5bquery_gen_actors: removed country argument, now reading country directly from actor dataErik de Vries2021-01-22 19:35:17 +0100
64a48e5977sent_merger: fixed bug with publication_date and grouper()Erik de Vries2021-01-20 18:17:53 +0100
8ff4097304renamed actor_merger to sent_merger and implemented fixes to work with sentiment data frames without actor idsErik de Vries2020-10-21 13:50:15 +0200
4a0f2206fdremoved multicore support, added parameters for dfm_genErik de Vries2020-10-15 16:53:49 +0200
274c9179cbremove meta_file argument
Your Name
2020-08-24 16:10:52 +0200
6e0e693d4elemma_writer: removed meta csv code
Your Name
2020-08-24 16:08:51 +0200
4fd9222a2dlemma_writer: updated to write metadata csv when dumping documents in ud format out_parser: fix for generating empty columns
Your Name
2020-08-24 15:50:10 +0200
955f034e6aactor_merger: changed computation of arousal, and removed uninformative variables
Your Name
2020-07-24 16:09:20 +0200
3cdb68b196out_parser: updated fncols function
Your Name
2020-07-23 13:14:31 +0200
dc40fbbb19elasticizer: update rbindlist implementation
Your Name
2020-07-23 13:04:29 +0200
18d47762d2actor_merger: overhaul to include cutoffs at sentence level as intended, also included options to generate sentiment for text only (don't provide actors_meta or actor_groups)
Your Name
2020-07-22 11:36:12 +0200
74909ca3a0sentencizer: removed text sentiment computation from script, because of incorrect implementation
Your Name
2020-07-22 10:12:01 +0200
c99ac23bb5actor_merger: fixed absence of publication_date in some cases
Your Name
2020-07-21 16:19:28 +0200
cc7fa5bffaactor_merger: added aggregations of all individual actors and all party mentions in an article
Your Name
2020-07-20 15:27:32 +0200
d9d578c06aactor_merger: mult fix
Your Name
2020-07-19 19:11:55 +0200
771145faf7actor_merger: added mult='first' to metadata join for parties_actors to deal with duplicate partyIds (see 50Plus, Conservatives and Labour)
Your Name
2020-07-19 19:08:16 +0200
1c14646e8factor_merger: dont deselect sent_words and sent_sum columns
Your Name
2020-07-19 18:42:47 +0200
9bd382f955actor_merger: fix to generate bogus sentiment columns
Your Name
2020-07-19 18:40:10 +0200
b7f1afddd1actor_merger: total rewrite based on data.table for performance reasons. Added some exceptions due to non-existing partyIds that some individual actors have in the actor database
Your Name
2020-07-19 18:22:35 +0200
2c8a88f9a0elasticizer: switched from bind_rows to rbindlist for composing result actor_merger: added noactor.* sentiment columns, and switched to data.table for matching actor metadata with articles
Your Name
2020-07-17 13:46:31 +0200
559199bb97sentencizer: totally removed sent_lemmas field
Your Name
2020-07-08 16:13:07 +0200
36f2b341a8sentencizer: removed derived output from function
Your Name
2020-07-08 16:09:04 +0200
80ec0be1f8actorizer: updated to account for token start offset in udpipe output. Sometimes, the first token in an article doesn't start at character position 1 (or 2 if the article starts with a whitespace), but at position 16 and possibly other positions.
Your Name
2020-07-06 17:50:04 +0200
336567732celastic_update: added more debug output
Your Name
2020-07-06 11:17:53 +0200
df7631b9f1sentencizer: Changed output, removed lemma list and added separate positive and negative sentiment sums
Your Name
2020-07-05 13:15:02 +0200
ecdb5be3b4actorizer: moved some code
Your Name
2020-07-03 14:06:18 +0200
50f33e78d7DESCRIPTION: updated
Your Name
2020-07-03 14:03:52 +0200
69d4b6f5b0actorizer: updated to data.table for conditional joins DESCRIPTION: added data.table dependency
Your Name
2020-07-03 14:00:43 +0200
085855908cquery_gen_actors: switched from Minister to Min
Your Name
2020-07-02 10:07:58 +0200
b406304c80actorizer: Removed nested parallelization function query_gen_actors: Integrated startDate and endDate for parties, changed party exception method from abbreviation only to both full names and abbreviations for NL and BE
Your Name
2020-07-01 19:25:50 +0200
5de4e1488cestimator, modelizer, preproc: Removed experimental we-vector support, and disabled inefficiently implemented preproc.R
Your Name
2020-06-22 15:07:46 +0200
77eb51a1bfactorizer: totally revamped way of finding actors elasticizer: updated dump handling to create a dump for every batch, instead of one big file at the end out_parser: streamlined code query_gen_actors: only include relevant fields ud_update: changed function parameters to work with elasticizer dump function
Your Name
2020-06-19 11:34:18 +0200
0e593075eequery_gen_actors: only retrieve ud field from source
Your Name
2020-06-15 19:04:26 +0200
6eb405f8bdmerger: selecting only relevant columns
Your Name
2020-06-15 18:30:03 +0200
38ff4dcbf0ud_update: small fix to file naming
Your Name
2020-06-15 18:26:26 +0200
4b4d860235class_update: remove dfm_gen multicore option dfm_gen: remove multicore, update merger() code elasticizer: changed filenaming scheme for dump option merger: Fixed bug where an NA lemma would cause the entire document to become NA. Now the NA lemmas are filtered out before merging ud_update: removed parallel processing, changed script to save bulk updates in .Rds files instead of sending them straight away
Your Name
2020-06-15 18:25:16 +0200
5d99ec9509elasticizer: added option to dump data frames to rds files out_parser: changed to single core, due to performance increase sentencizer: corrected documentation for sent_dict parameter
Your Name
2020-06-10 17:58:12 +0200
aa6587b204dupe_detect: fix for quotation marks
Your Name
2020-06-10 15:22:41 +0200
2a220ded5ddupe_detect: fix to query string for multi-word doctype names
Your Name
2020-06-10 15:06:35 +0200
5bd36dcb44dupe_detect: Changed query from json to query_string style, and added filter for already detected duplicates cv_generator: Changed code to use a generic vector of true values to draw the conditional random sample, instead of dfm/docvars specifically
Your Name
2020-06-09 12:13:37 +0200
e499d70671actor_merger: added ungroup() calls at the start and end of function, to speed up processing sentencizer: added ungroup() call at the end of the function to speed up processing
Your Name
2020-05-27 13:13:21 +0200
8634d549a3sentencizer: updates to collect sentence word counts and number of sentences also when no sent_dict is provided
Your Name
2020-05-26 18:37:26 +0200
61e0581595actor_merger: removed debug line
Your Name
2020-05-26 17:48:10 +0200
11bf71c7ddfixes for removal of actor_fetcher function
Your Name
2020-05-26 17:15:14 +0200
f022312485actor_merger: added function for generating actor-document data frames actor_fetcher: removed from package other: major update to documentation
Your Name
2020-05-26 17:12:22 +0200
4e867214ddsentencizer: commented code
Your Name
2020-05-26 15:33:28 +0200
ec8afc4990sentencizer: fixed actorsDetail coding error
Your Name
2020-05-25 16:16:42 +0200
9ccfd2952esentencizer: minor updates
Your Name
2020-05-25 15:48:46 +0200
98325bde8fsentencizer: added new function for sentiment coding and actor collection
Your Name
2020-05-22 21:43:27 +0200
7f958bbc11actor_fetcher: small fixes
Your Name
2020-05-20 13:56:42 +0200
8eedec8bb5actor_fetcher: added option for using dictionaries with just lemmas, besides the option of using lemma_upos dictionaries
Your Name
2020-05-20 12:44:09 +0200
057d225a7aactor_fetcher: Allow generation of actor df containing only specified actor ids and aggregations
Your Name
2020-05-20 12:29:26 +0200
9eae486a80separated data preprocessing routines class_update: check if there are idf values associated with model, before applying weights estimator: make use of preproc() function for data preprocessing preproc: function containing all logic with regards to text data preprocessing and weighting
Your Name
2020-04-09 15:32:07 +0200
a3b6e19646revised modeling pipeline: cv_generator: generate folds for nested cv dfm_gen: added optional lowercasing parameter estimator: estimate model and performance based on parameters feat_select: select features based on textstat_keyness metric_gen: convert output from estimator to model performance metrics modelizer: updated for new pipeline modelizer_old: old model pipeline out_parser: now correctly exported
Your Name
2020-04-09 14:02:50 +0200
e76a914dd2actor_fetcher: Updated to tidyr 1.0.0, no longer using preserve, slightly different approach to keeping ids_list, and not removing actorsDetail anymore because it does not exist
Your Name
2020-03-18 14:10:01 +0100
a01a53f105class_update: added cores parameter for multicore processing of sources when using lemmas
Your Name
2020-03-11 15:44:52 +0100
d9f936c566modelizer: tf-idf application updated, final model now also includes idf values from training set, explicitly setting positive category in binary classification for confusion matrices, minor code fixes dfm_gen: added old junk codes for recoding, and removed deprecated ngrams parameter from dfm function class_update: removed dfm_words parameter, which is replaced by the force = T parameter in predict(), training/model idf is now applied to unseen data DESCRIPTION: added quanteda.textmodels as new dependency, since these have been separated from base quanteda 2.0.0 onwards
Your Name
2020-03-11 15:35:04 +0100
889e7e92aflemma_writer: updated to provide support for writing raw documents to individual files using utf-8 encodingErik de Vries2019-08-28 15:52:52 +0200
115297f597actor_aggregation,aggregator,aggregator_elastic: moved out of package directory to Old actor_fetcher: moved sentiment validation code blockErik de Vries2019-08-12 13:50:31 +0200
3fcbbd1f1factor_fetch: fixed error where source.ud would not existErik de Vries2019-07-06 18:34:25 +0200
674ef09e10query_gen_actors: added junior minister check to if statementErik de Vries2019-07-06 14:47:58 +0200
853c117dafactor_fetcher: change in code to keep original actorid lists in output query_gen_actors: added code for junior ministers in BE and NLErik de Vries2019-07-05 14:43:15 +0200
bf3d11ffe0query_gen_actors: various bugfixes and changesErik de Vries2019-07-04 17:11:58 +0200
99af1427f0query_gen_actors: fixed scandinavian query generationErik de Vries2019-07-03 11:48:04 +0200
e49a4ae93equery_gen_actors: fixed problem with too many brackets in queryErik de Vries2019-07-03 11:24:33 +0200
060751237bactorizer, out_parser: switched from mclapply to future_lapply and removed windows-specific code from out_parser query_gen_actors: rewritten minister queries to only use proximity queriesErik de Vries2019-07-02 15:29:31 +0200
d0601d2aa7actor_fetcher: added minimum verbosity to identify cases in which an actor is present without a party mentionErik de Vries2019-06-25 19:43:35 +0200
9e433ecf9eactor_fetcher: added handling of exception where all actorsids related to a party are individual actorsErik de Vries2019-06-25 19:08:12 +0200
526270900cactor_fetcher: integrated party merging into actor_fetcher in what hopefully is the most efficient wayErik de Vries2019-06-25 18:53:26 +0200
84df9658ffactor_fetcher: added lemma output when validating, to detect most problematic lemmasErik de Vries2019-06-25 15:28:23 +0200
a3e8dcf96eactor_fetcher: switched from binary word sentiment scores to proximity scores (cosine similarity)Erik de Vries2019-06-21 16:23:28 +0200
6f5ace8c52actor_fetcher: elasticizer batch function to fetch actorsDetail fields from all relevant documentsErik de Vries2019-06-21 15:35:04 +0200
edd4b785a5actor_aggregation: updated to use future package for parallel processing as beta test for switching all parallel processing to future. Also disabled some of the aggregator output to save computation timeErik de Vries2019-06-20 12:54:14 +0200
f8bc53006dactor_aggregation: added sentiment analysis support for generating aggregationsErik de Vries2019-06-19 19:33:48 +0200