Commit Graph

160 Commits (e499d70671fbcc73102ce81907f02e03877cf0bb)

Author SHA1 Message Date
Your Name e499d70671 actor_merger: added ungroup() calls at the start and end of function, to speed up processing 5 years ago
Your Name 8634d549a3 sentencizer: updates to collect sentence word counts and number of sentences also when no sent_dict is provided 5 years ago
Your Name 61e0581595 actor_merger: removed debug line 5 years ago
Your Name f022312485 actor_merger: added function for generating actor-document data frames 5 years ago
Your Name 4e867214dd sentencizer: commented code 5 years ago
Your Name ec8afc4990 sentencizer: fixed actorsDetail coding error 5 years ago
Your Name 9ccfd2952e sentencizer: minor updates 5 years ago
Your Name 98325bde8f sentencizer: added new function for sentiment coding and actor collection 5 years ago
Your Name 7f958bbc11 actor_fetcher: small fixes 5 years ago
Your Name 8eedec8bb5 actor_fetcher: added option for using dictionaries with just lemmas, besides the option of using lemma_upos dictionaries 5 years ago
Your Name 057d225a7a actor_fetcher: Allow generation of actor df containing only specified actor ids and aggregations 5 years ago
Your Name 9eae486a80 separated data preprocessing routines 5 years ago
Your Name a3b6e19646 revised modeling pipeline: 5 years ago
Your Name e76a914dd2 actor_fetcher: Updated to tidyr 1.0.0, no longer using preserve, slightly different approach to keeping ids_list, and not removing actorsDetail anymore because it does not exist 5 years ago
Your Name a01a53f105 class_update: added cores parameter for multicore processing of sources when using lemmas 5 years ago
Your Name d9f936c566 modelizer: tf-idf application updated, final model now also includes idf values from training set, explicitly setting positive category in binary classification for confusion matrices, minor code fixes 5 years ago
Erik de Vries 06bfec71bc lemma_writer: unlist lemmas before writing 6 years ago
Erik de Vries a83ee5dfd0 lemma_writer: update to write lemma instead of full document text 6 years ago
Erik de Vries e594185719 dfm_gen: set default cores to 1 6 years ago
Erik de Vries 889e7e92af lemma_writer: updated to provide support for writing raw documents to individual files using utf-8 encoding 6 years ago
Erik de Vries 115297f597 actor_aggregation,aggregator,aggregator_elastic: moved out of package directory to Old 6 years ago
Erik de Vries 3fcbbd1f1f actor_fetch: fixed error where source.ud would not exist 6 years ago
Erik de Vries 674ef09e10 query_gen_actors: added junior minister check to if statement 6 years ago
Erik de Vries 853c117daf actor_fetcher: change in code to keep original actorid lists in output 6 years ago
Erik de Vries bf3d11ffe0 query_gen_actors: various bugfixes and changes 6 years ago
Erik de Vries 99af1427f0 query_gen_actors: fixed scandinavian query generation 6 years ago
Erik de Vries e49a4ae93e query_gen_actors: fixed problem with too many brackets in query 6 years ago
Erik de Vries 060751237b actorizer, out_parser: switched from mclapply to future_lapply and removed windows-specific code from out_parser 6 years ago
Erik de Vries d0601d2aa7 actor_fetcher: added minimum verbosity to identify cases in which an actor is present without a party mention 6 years ago
Erik de Vries 82ef165c5f actor_fetcher: quick fix 6 years ago
Erik de Vries 9e433ecf9e actor_fetcher: added handling of exception where all actorsids related to a party are individual actors 6 years ago
Erik de Vries 526270900c actor_fetcher: integrated party merging into actor_fetcher in what hopefully is the most efficient way 6 years ago
Erik de Vries 84df9658ff actor_fetcher: added lemma output when validating, to detect most problematic lemmas 6 years ago
Erik de Vries 499ee74f0d actor_fetcher: fixed code error 6 years ago
Erik de Vries a3e8dcf96e actor_fetcher: switched from binary word sentiment scores to proximity scores (cosine similarity) 6 years ago
Erik de Vries 6f5ace8c52 actor_fetcher: elasticizer batch function to fetch actorsDetail fields from all relevant documents 6 years ago
Erik de Vries edd4b785a5 actor_aggregation: updated to use future package for parallel processing as beta test for switching all parallel processing to future. Also disabled some of the aggregator output to save computation time 6 years ago
Erik de Vries f8bc53006d actor_aggregation: added sentiment analysis support for generating aggregations 6 years ago
Erik de Vries d3d4045f1c actor_aggregation: added sentence count to output, and changed occurences to count instead of mean, changed prom and rel_first to prom_art and rel_first_art, changed output filename to include function 6 years ago
Erik de Vries 176a8f6de4 elasticizer: added additional verbosity on errors 6 years ago
Erik de Vries d420b02c20 elasticizer: Added more verbosity to investigate error handling 6 years ago
Erik de Vries 48b589dda0 query_gen_actors: reset to original state 6 years ago
Erik de Vries 7a01a7f18d query_gen_actors: temporary update for fixing broken shit 6 years ago
Erik de Vries 45da9dd929 aggregator_elastic: revert to single-core lapply, due to sendMaster errors 6 years ago
Erik de Vries f8e4111e70 aggregator_elastic: correct partyid implementation 6 years ago
Erik de Vries c047a4a1db aggregator_elastic: explicit reference to aggregator function 6 years ago
Erik de Vries 0d81d6fc7a added aggregator and aggregator_elastic functions for aggregating and storing article level actor aggregations 6 years ago
Erik de Vries 2281d11a68 actor_aggregation: fixed filenaming of .Rds files 6 years ago
Erik de Vries d9f28a46d8 actor_aggregation: small fixes to code 6 years ago
Erik de Vries a29d04dacd actorizer: fixed handling of empty results due to regex filtering 6 years ago