191 Commits (b7f1afddd1b4c98630bfa0660e430e9046e21834)
 

Author SHA1 Message Date
Your Name b7f1afddd1 actor_merger: total rewrite based on data.table for performance reasons. Added some exceptions due to non-existing partyIds that some individual actors have in the actor database
4 years ago
Your Name 2c8a88f9a0 elasticizer: switched from bind_rows to rbindlist for composing result
4 years ago
Your Name 559199bb97 sentencizer: totally removed sent_lemmas field
4 years ago
Your Name 36f2b341a8 sentencizer: removed derived output from function
4 years ago
Your Name 80ec0be1f8 actorizer: updated to account for token start offset in udpipe output. Sometimes, the first token in an article doesn't start at character position 1 (or 2 if the article starts with a whitespace), but at position 16 and possibly other positions.
4 years ago
Your Name 336567732c elastic_update: added more debug output
4 years ago
Your Name df7631b9f1 sentencizer: Changed output, removed lemma list and added separate positive and negative sentiment sums
4 years ago
Your Name ecdb5be3b4 actorizer: moved some code
4 years ago
Your Name 50f33e78d7 DESCRIPTION: updated
4 years ago
Your Name 69d4b6f5b0 actorizer: updated to data.table for conditional joins
4 years ago
Your Name 085855908c query_gen_actors: switched from Minister to Min
4 years ago
Your Name b406304c80 actorizer: Removed nested parallelization function
4 years ago
Your Name 5de4e1488c estimator, modelizer, preproc: Removed experimental we-vector support, and disabled inefficiently implemented preproc.R
4 years ago
Your Name 77eb51a1bf actorizer: totally revamped way of finding actors
4 years ago
Your Name 0e593075ee query_gen_actors: only retrieve ud field from source
5 years ago
Your Name 6eb405f8bd merger: selecting only relevant columns
5 years ago
Your Name 38ff4dcbf0 ud_update: small fix to file naming
5 years ago
Your Name 4b4d860235 class_update: remove dfm_gen multicore option
5 years ago
Your Name 5d99ec9509 elasticizer: added option to dump data frames to rds files
5 years ago
Your Name aa6587b204 dupe_detect: fix for quotation marks
5 years ago
Your Name 2a220ded5d dupe_detect: fix to query string for multi-word doctype names
5 years ago
Your Name 5bd36dcb44 dupe_detect: Changed query from json to query_string style, and added filter for already detected duplicates
5 years ago
Your Name e499d70671 actor_merger: added ungroup() calls at the start and end of function, to speed up processing
5 years ago
Your Name 8634d549a3 sentencizer: updates to collect sentence word counts and number of sentences also when no sent_dict is provided
5 years ago
Your Name 61e0581595 actor_merger: removed debug line
5 years ago
Your Name 11bf71c7dd fixes for removal of actor_fetcher function
5 years ago
Your Name f022312485 actor_merger: added function for generating actor-document data frames
5 years ago
Your Name 4e867214dd sentencizer: commented code
5 years ago
Your Name ec8afc4990 sentencizer: fixed actorsDetail coding error
5 years ago
Your Name 9ccfd2952e sentencizer: minor updates
5 years ago
Your Name 98325bde8f sentencizer: added new function for sentiment coding and actor collection
5 years ago
Your Name 7f958bbc11 actor_fetcher: small fixes
5 years ago
Your Name 8eedec8bb5 actor_fetcher: added option for using dictionaries with just lemmas, besides the option of using lemma_upos dictionaries
5 years ago
Your Name 057d225a7a actor_fetcher: Allow generation of actor df containing only specified actor ids and aggregations
5 years ago
Your Name 9eae486a80 separated data preprocessing routines
5 years ago
Your Name a3b6e19646 revised modeling pipeline:
5 years ago
Your Name e76a914dd2 actor_fetcher: Updated to tidyr 1.0.0, no longer using preserve, slightly different approach to keeping ids_list, and not removing actorsDetail anymore because it does not exist
5 years ago
Your Name a01a53f105 class_update: added cores parameter for multicore processing of sources when using lemmas
5 years ago
Your Name d9f936c566 modelizer: tf-idf application updated, final model now also includes idf values from training set, explicitly setting positive category in binary classification for confusion matrices, minor code fixes
5 years ago
Erik de Vries 06bfec71bc lemma_writer: unlist lemmas before writing
5 years ago
Erik de Vries a83ee5dfd0 lemma_writer: update to write lemma instead of full document text
5 years ago
Erik de Vries e594185719 dfm_gen: set default cores to 1
5 years ago
Erik de Vries 889e7e92af lemma_writer: updated to provide support for writing raw documents to individual files using utf-8 encoding
5 years ago
Erik de Vries 115297f597 actor_aggregation,aggregator,aggregator_elastic: moved out of package directory to Old
5 years ago
Erik de Vries 3fcbbd1f1f actor_fetch: fixed error where source.ud would not exist
5 years ago
Erik de Vries 674ef09e10 query_gen_actors: added junior minister check to if statement
5 years ago
Erik de Vries 853c117daf actor_fetcher: change in code to keep original actorid lists in output
5 years ago
Erik de Vries bf3d11ffe0 query_gen_actors: various bugfixes and changes
5 years ago
Erik de Vries 99af1427f0 query_gen_actors: fixed scandinavian query generation
5 years ago
Erik de Vries e49a4ae93e query_gen_actors: fixed problem with too many brackets in query
5 years ago