query_gen_actors: only retrieve ud field from source
5 years ago
merger: selecting only relevant columns
5 years ago
ud_update: small fix to file naming
5 years ago
class_update: remove dfm_gen multicore option
dfm_gen: remove multicore, update merger() code
elasticizer: changed filenaming scheme for dump option
merger: Fixed bug where an NA lemma would cause the entire document to become NA. Now the NA lemmas are filtered out before merging
ud_update: removed parallel processing, changed script to save bulk updates in .Rds files instead of sending them straight away
5 years ago
elasticizer: added option to dump data frames to rds files
out_parser: changed to single core, due to performance increase
sentencizer: corrected documentation for sent_dict parameter
5 years ago
dupe_detect: fix for quotation marks
5 years ago
dupe_detect: fix to query string for multi-word doctype names
5 years ago
dupe_detect: Changed query from json to query_string style, and added filter for already detected duplicates
cv_generator: Changed code to use a generic vector of true values to draw the conditional random sample, instead of dfm/docvars specifically
5 years ago
actor_merger: added ungroup() calls at the start and end of function, to speed up processing
sentencizer: added ungroup() call at the end of the function to speed up processing
5 years ago
sentencizer: updates to collect sentence word counts and number of sentences also when no sent_dict is provided
5 years ago
actor_merger: removed debug line
5 years ago
fixes for removal of actor_fetcher function
5 years ago
actor_merger: added function for generating actor-document data frames
actor_fetcher: removed from package
other: major update to documentation
5 years ago
sentencizer: commented code
5 years ago
sentencizer: fixed actorsDetail coding error
5 years ago
sentencizer: minor updates
5 years ago
sentencizer: added new function for sentiment coding and actor collection
5 years ago
actor_fetcher: small fixes
5 years ago
actor_fetcher: added option for using dictionaries with just lemmas, besides the option of using lemma_upos dictionaries
5 years ago
actor_fetcher: Allow generation of actor df containing only specified actor ids and aggregations
5 years ago
separated data preprocessing routines
class_update: check if there are idf values associated with model, before applying weights
estimator: make use of preproc() function for data preprocessing
preproc: function containing all logic with regards to text data preprocessing and weighting
5 years ago
revised modeling pipeline:
cv_generator: generate folds for nested cv
dfm_gen: added optional lowercasing parameter
estimator: estimate model and performance based on parameters
feat_select: select features based on textstat_keyness
metric_gen: convert output from estimator to model performance metrics
modelizer: updated for new pipeline
modelizer_old: old model pipeline
out_parser: now correctly exported
5 years ago
actor_fetcher: Updated to tidyr 1.0.0, no longer using preserve, slightly different approach to keeping ids_list, and not removing actorsDetail anymore because it does not exist
5 years ago
class_update: added cores parameter for multicore processing of sources when using lemmas
5 years ago
Your Name
modelizer: tf-idf application updated, final model now also includes idf values from training set, explicitly setting positive category in binary classification for confusion matrices, minor code fixes
dfm_gen: added old junk codes for recoding, and removed deprecated ngrams parameter from dfm function
class_update: removed dfm_words parameter, which is replaced by the force = T parameter in predict(), training/model idf is now applied to unseen data
DESCRIPTION: added quanteda.textmodels as new dependency, since these have been separated from base quanteda 2.0.0 onwards
5 years ago
Erik de Vries
lemma_writer: unlist lemmas before writing
6 years ago
Erik de Vries
lemma_writer: update to write lemma instead of full document text
6 years ago
Erik de Vries
dfm_gen: set default cores to 1
6 years ago
Erik de Vries
lemma_writer: updated to provide support for writing raw documents to individual files using utf-8 encoding
6 years ago
Erik de Vries
actor_aggregation,aggregator,aggregator_elastic: moved out of package directory to Old
actor_fetcher: moved sentiment validation code block
6 years ago
Erik de Vries
actor_fetch: fixed error where source.ud would not exist
6 years ago
Erik de Vries
query_gen_actors: added junior minister check to if statement
6 years ago
Erik de Vries
actor_fetcher: change in code to keep original actorid lists in output
query_gen_actors: added code for junior ministers in BE and NL
6 years ago
Erik de Vries
query_gen_actors: various bugfixes and changes
6 years ago
Erik de Vries
query_gen_actors: fixed scandinavian query generation
6 years ago
Erik de Vries
query_gen_actors: fixed problem with too many brackets in query
6 years ago
Erik de Vries
actorizer, out_parser: switched from mclapply to future_lapply and removed windows-specific code from out_parser
query_gen_actors: rewritten minister queries to only use proximity queries
6 years ago
Erik de Vries
actor_fetcher: added minimum verbosity to identify cases in which an actor is present without a party mention
6 years ago
Erik de Vries
actor_fetcher: quick fix
6 years ago
Erik de Vries
actor_fetcher: added handling of exception where all actorsids related to a party are individual actors
6 years ago
Erik de Vries
actor_fetcher: integrated party merging into actor_fetcher in what hopefully is the most efficient way
6 years ago
Erik de Vries
actor_fetcher: added lemma output when validating, to detect most problematic lemmas
6 years ago
Erik de Vries
actor_fetcher: fixed code error
6 years ago
Erik de Vries
actor_fetcher: switched from binary word sentiment scores to proximity scores (cosine similarity)
6 years ago
Erik de Vries
actor_fetcher: elasticizer batch function to fetch actorsDetail fields from all relevant documents
6 years ago
Erik de Vries
actor_aggregation: updated to use future package for parallel processing as beta test for switching all parallel processing to future. Also disabled some of the aggregator output to save computation time
6 years ago
Erik de Vries
actor_aggregation: added sentiment analysis support for generating aggregations
6 years ago
Erik de Vries
actor_aggregation: added sentence count to output, and changed occurences to count instead of mean, changed prom and rel_first to prom_art and rel_first_art, changed output filename to include function
6 years ago
Erik de Vries
elasticizer: added additional verbosity on errors
6 years ago
Erik de Vries
elasticizer: Added more verbosity to investigate error handling
6 years ago