mamlr

edevries

mamlr

Archived

Author	SHA1	Message	Date
Your Name	b7f1afddd1	actor_merger: total rewrite based on data.table for performance reasons. Added some exceptions due to non-existing partyIds that some individual actors have in the actor database	5 years ago
Your Name	2c8a88f9a0	elasticizer: switched from bind_rows to rbindlist for composing result actor_merger: added noactor.* sentiment columns, and switched to data.table for matching actor metadata with articles	5 years ago
Your Name	559199bb97	sentencizer: totally removed sent_lemmas field	5 years ago
Your Name	36f2b341a8	sentencizer: removed derived output from function	5 years ago
Your Name	80ec0be1f8	actorizer: updated to account for token start offset in udpipe output. Sometimes, the first token in an article doesn't start at character position 1 (or 2 if the article starts with a whitespace), but at position 16 and possibly other positions.	5 years ago
Your Name	336567732c	elastic_update: added more debug output	5 years ago
Your Name	df7631b9f1	sentencizer: Changed output, removed lemma list and added separate positive and negative sentiment sums	5 years ago
Your Name	ecdb5be3b4	actorizer: moved some code	5 years ago
Your Name	50f33e78d7	DESCRIPTION: updated	5 years ago
Your Name	69d4b6f5b0	actorizer: updated to data.table for conditional joins DESCRIPTION: added data.table dependency	5 years ago
Your Name	085855908c	query_gen_actors: switched from Minister to Min	5 years ago
Your Name	b406304c80	actorizer: Removed nested parallelization function query_gen_actors: Integrated startDate and endDate for parties, changed party exception method from abbreviation only to both full names and abbreviations for NL and BE	5 years ago
Your Name	5de4e1488c	estimator, modelizer, preproc: Removed experimental we-vector support, and disabled inefficiently implemented preproc.R	5 years ago
Your Name	77eb51a1bf	actorizer: totally revamped way of finding actors elasticizer: updated dump handling to create a dump for every batch, instead of one big file at the end out_parser: streamlined code query_gen_actors: only include relevant fields ud_update: changed function parameters to work with elasticizer dump function	5 years ago
Your Name	0e593075ee	query_gen_actors: only retrieve ud field from source	5 years ago
Your Name	6eb405f8bd	merger: selecting only relevant columns	5 years ago
Your Name	38ff4dcbf0	ud_update: small fix to file naming	5 years ago
Your Name	4b4d860235	class_update: remove dfm_gen multicore option dfm_gen: remove multicore, update merger() code elasticizer: changed filenaming scheme for dump option merger: Fixed bug where an NA lemma would cause the entire document to become NA. Now the NA lemmas are filtered out before merging ud_update: removed parallel processing, changed script to save bulk updates in .Rds files instead of sending them straight away	5 years ago
Your Name	5d99ec9509	elasticizer: added option to dump data frames to rds files out_parser: changed to single core, due to performance increase sentencizer: corrected documentation for sent_dict parameter	5 years ago
Your Name	aa6587b204	dupe_detect: fix for quotation marks	5 years ago
Your Name	2a220ded5d	dupe_detect: fix to query string for multi-word doctype names	5 years ago
Your Name	5bd36dcb44	dupe_detect: Changed query from json to query_string style, and added filter for already detected duplicates cv_generator: Changed code to use a generic vector of true values to draw the conditional random sample, instead of dfm/docvars specifically	5 years ago
Your Name	e499d70671	actor_merger: added ungroup() calls at the start and end of function, to speed up processing sentencizer: added ungroup() call at the end of the function to speed up processing	5 years ago
Your Name	8634d549a3	sentencizer: updates to collect sentence word counts and number of sentences also when no sent_dict is provided	5 years ago
Your Name	61e0581595	actor_merger: removed debug line	5 years ago
Your Name	11bf71c7dd	fixes for removal of actor_fetcher function	5 years ago
Your Name	f022312485	actor_merger: added function for generating actor-document data frames actor_fetcher: removed from package other: major update to documentation	5 years ago
Your Name	4e867214dd	sentencizer: commented code	5 years ago
Your Name	ec8afc4990	sentencizer: fixed actorsDetail coding error	5 years ago
Your Name	9ccfd2952e	sentencizer: minor updates	5 years ago
Your Name	98325bde8f	sentencizer: added new function for sentiment coding and actor collection	5 years ago
Your Name	7f958bbc11	actor_fetcher: small fixes	5 years ago
Your Name	8eedec8bb5	actor_fetcher: added option for using dictionaries with just lemmas, besides the option of using lemma_upos dictionaries	5 years ago
Your Name	057d225a7a	actor_fetcher: Allow generation of actor df containing only specified actor ids and aggregations	5 years ago
Your Name	9eae486a80	separated data preprocessing routines class_update: check if there are idf values associated with model, before applying weights estimator: make use of preproc() function for data preprocessing preproc: function containing all logic with regards to text data preprocessing and weighting	5 years ago
Your Name	a3b6e19646	revised modeling pipeline: cv_generator: generate folds for nested cv dfm_gen: added optional lowercasing parameter estimator: estimate model and performance based on parameters feat_select: select features based on textstat_keyness metric_gen: convert output from estimator to model performance metrics modelizer: updated for new pipeline modelizer_old: old model pipeline out_parser: now correctly exported	5 years ago
Your Name	e76a914dd2	actor_fetcher: Updated to tidyr 1.0.0, no longer using preserve, slightly different approach to keeping ids_list, and not removing actorsDetail anymore because it does not exist	5 years ago
Your Name	a01a53f105	class_update: added cores parameter for multicore processing of sources when using lemmas	5 years ago
Your Name	d9f936c566	modelizer: tf-idf application updated, final model now also includes idf values from training set, explicitly setting positive category in binary classification for confusion matrices, minor code fixes dfm_gen: added old junk codes for recoding, and removed deprecated ngrams parameter from dfm function class_update: removed dfm_words parameter, which is replaced by the force = T parameter in predict(), training/model idf is now applied to unseen data DESCRIPTION: added quanteda.textmodels as new dependency, since these have been separated from base quanteda 2.0.0 onwards	5 years ago
Erik de Vries	06bfec71bc	lemma_writer: unlist lemmas before writing	6 years ago
Erik de Vries	a83ee5dfd0	lemma_writer: update to write lemma instead of full document text	6 years ago
Erik de Vries	e594185719	dfm_gen: set default cores to 1	6 years ago
Erik de Vries	889e7e92af	lemma_writer: updated to provide support for writing raw documents to individual files using utf-8 encoding	6 years ago
Erik de Vries	115297f597	actor_aggregation,aggregator,aggregator_elastic: moved out of package directory to Old actor_fetcher: moved sentiment validation code block	6 years ago
Erik de Vries	3fcbbd1f1f	actor_fetch: fixed error where source.ud would not exist	6 years ago
Erik de Vries	674ef09e10	query_gen_actors: added junior minister check to if statement	6 years ago
Erik de Vries	853c117daf	actor_fetcher: change in code to keep original actorid lists in output query_gen_actors: added code for junior ministers in BE and NL	6 years ago
Erik de Vries	bf3d11ffe0	query_gen_actors: various bugfixes and changes	6 years ago
Erik de Vries	99af1427f0	query_gen_actors: fixed scandinavian query generation	6 years ago
Erik de Vries	e49a4ae93e	query_gen_actors: fixed problem with too many brackets in query	6 years ago

1 2 3 4

191 Commits (b7f1afddd1b4c98630bfa0660e430e9046e21834) All Branches Search

191 Commits (b7f1afddd1b4c98630bfa0660e430e9046e21834)

All Branches