mamlr

edevries

mamlr

Archived

Author	SHA1	Message	Date
Your Name	0e593075ee	query_gen_actors: only retrieve ud field from source	5 years ago
Your Name	6eb405f8bd	merger: selecting only relevant columns	5 years ago
Your Name	38ff4dcbf0	ud_update: small fix to file naming	5 years ago
Your Name	4b4d860235	class_update: remove dfm_gen multicore option dfm_gen: remove multicore, update merger() code elasticizer: changed filenaming scheme for dump option merger: Fixed bug where an NA lemma would cause the entire document to become NA. Now the NA lemmas are filtered out before merging ud_update: removed parallel processing, changed script to save bulk updates in .Rds files instead of sending them straight away	5 years ago
Your Name	5d99ec9509	elasticizer: added option to dump data frames to rds files out_parser: changed to single core, due to performance increase sentencizer: corrected documentation for sent_dict parameter	5 years ago
Your Name	aa6587b204	dupe_detect: fix for quotation marks	5 years ago
Your Name	2a220ded5d	dupe_detect: fix to query string for multi-word doctype names	5 years ago
Your Name	5bd36dcb44	dupe_detect: Changed query from json to query_string style, and added filter for already detected duplicates cv_generator: Changed code to use a generic vector of true values to draw the conditional random sample, instead of dfm/docvars specifically	5 years ago
Your Name	e499d70671	actor_merger: added ungroup() calls at the start and end of function, to speed up processing sentencizer: added ungroup() call at the end of the function to speed up processing	5 years ago
Your Name	8634d549a3	sentencizer: updates to collect sentence word counts and number of sentences also when no sent_dict is provided	5 years ago
Your Name	61e0581595	actor_merger: removed debug line	5 years ago
Your Name	f022312485	actor_merger: added function for generating actor-document data frames actor_fetcher: removed from package other: major update to documentation	5 years ago
Your Name	4e867214dd	sentencizer: commented code	5 years ago
Your Name	ec8afc4990	sentencizer: fixed actorsDetail coding error	5 years ago
Your Name	9ccfd2952e	sentencizer: minor updates	5 years ago
Your Name	98325bde8f	sentencizer: added new function for sentiment coding and actor collection	5 years ago
Your Name	7f958bbc11	actor_fetcher: small fixes	5 years ago
Your Name	8eedec8bb5	actor_fetcher: added option for using dictionaries with just lemmas, besides the option of using lemma_upos dictionaries	5 years ago
Your Name	057d225a7a	actor_fetcher: Allow generation of actor df containing only specified actor ids and aggregations	5 years ago
Your Name	9eae486a80	separated data preprocessing routines class_update: check if there are idf values associated with model, before applying weights estimator: make use of preproc() function for data preprocessing preproc: function containing all logic with regards to text data preprocessing and weighting	5 years ago
Your Name	a3b6e19646	revised modeling pipeline: cv_generator: generate folds for nested cv dfm_gen: added optional lowercasing parameter estimator: estimate model and performance based on parameters feat_select: select features based on textstat_keyness metric_gen: convert output from estimator to model performance metrics modelizer: updated for new pipeline modelizer_old: old model pipeline out_parser: now correctly exported	5 years ago
Your Name	e76a914dd2	actor_fetcher: Updated to tidyr 1.0.0, no longer using preserve, slightly different approach to keeping ids_list, and not removing actorsDetail anymore because it does not exist	5 years ago
Your Name	a01a53f105	class_update: added cores parameter for multicore processing of sources when using lemmas	5 years ago
Your Name	d9f936c566	modelizer: tf-idf application updated, final model now also includes idf values from training set, explicitly setting positive category in binary classification for confusion matrices, minor code fixes dfm_gen: added old junk codes for recoding, and removed deprecated ngrams parameter from dfm function class_update: removed dfm_words parameter, which is replaced by the force = T parameter in predict(), training/model idf is now applied to unseen data DESCRIPTION: added quanteda.textmodels as new dependency, since these have been separated from base quanteda 2.0.0 onwards	5 years ago
Erik de Vries	06bfec71bc	lemma_writer: unlist lemmas before writing	6 years ago
Erik de Vries	a83ee5dfd0	lemma_writer: update to write lemma instead of full document text	6 years ago
Erik de Vries	e594185719	dfm_gen: set default cores to 1	6 years ago
Erik de Vries	889e7e92af	lemma_writer: updated to provide support for writing raw documents to individual files using utf-8 encoding	6 years ago
Erik de Vries	115297f597	actor_aggregation,aggregator,aggregator_elastic: moved out of package directory to Old actor_fetcher: moved sentiment validation code block	6 years ago
Erik de Vries	3fcbbd1f1f	actor_fetch: fixed error where source.ud would not exist	6 years ago
Erik de Vries	674ef09e10	query_gen_actors: added junior minister check to if statement	6 years ago
Erik de Vries	853c117daf	actor_fetcher: change in code to keep original actorid lists in output query_gen_actors: added code for junior ministers in BE and NL	6 years ago
Erik de Vries	bf3d11ffe0	query_gen_actors: various bugfixes and changes	6 years ago
Erik de Vries	99af1427f0	query_gen_actors: fixed scandinavian query generation	6 years ago
Erik de Vries	e49a4ae93e	query_gen_actors: fixed problem with too many brackets in query	6 years ago
Erik de Vries	060751237b	actorizer, out_parser: switched from mclapply to future_lapply and removed windows-specific code from out_parser query_gen_actors: rewritten minister queries to only use proximity queries	6 years ago
Erik de Vries	d0601d2aa7	actor_fetcher: added minimum verbosity to identify cases in which an actor is present without a party mention	6 years ago
Erik de Vries	82ef165c5f	actor_fetcher: quick fix	6 years ago
Erik de Vries	9e433ecf9e	actor_fetcher: added handling of exception where all actorsids related to a party are individual actors	6 years ago
Erik de Vries	526270900c	actor_fetcher: integrated party merging into actor_fetcher in what hopefully is the most efficient way	6 years ago
Erik de Vries	84df9658ff	actor_fetcher: added lemma output when validating, to detect most problematic lemmas	6 years ago
Erik de Vries	499ee74f0d	actor_fetcher: fixed code error	6 years ago
Erik de Vries	a3e8dcf96e	actor_fetcher: switched from binary word sentiment scores to proximity scores (cosine similarity)	6 years ago
Erik de Vries	6f5ace8c52	actor_fetcher: elasticizer batch function to fetch actorsDetail fields from all relevant documents	6 years ago
Erik de Vries	edd4b785a5	actor_aggregation: updated to use future package for parallel processing as beta test for switching all parallel processing to future. Also disabled some of the aggregator output to save computation time	6 years ago
Erik de Vries	f8bc53006d	actor_aggregation: added sentiment analysis support for generating aggregations	6 years ago
Erik de Vries	d3d4045f1c	actor_aggregation: added sentence count to output, and changed occurences to count instead of mean, changed prom and rel_first to prom_art and rel_first_art, changed output filename to include function	6 years ago
Erik de Vries	176a8f6de4	elasticizer: added additional verbosity on errors	6 years ago
Erik de Vries	d420b02c20	elasticizer: Added more verbosity to investigate error handling	6 years ago
Erik de Vries	48b589dda0	query_gen_actors: reset to original state	6 years ago
Erik de Vries	7a01a7f18d	query_gen_actors: temporary update for fixing broken shit	6 years ago
Erik de Vries	45da9dd929	aggregator_elastic: revert to single-core lapply, due to sendMaster errors	6 years ago
Erik de Vries	f8e4111e70	aggregator_elastic: correct partyid implementation	6 years ago
Erik de Vries	c047a4a1db	aggregator_elastic: explicit reference to aggregator function	6 years ago
Erik de Vries	0d81d6fc7a	added aggregator and aggregator_elastic functions for aggregating and storing article level actor aggregations	6 years ago
Erik de Vries	2281d11a68	actor_aggregation: fixed filenaming of .Rds files	6 years ago
Erik de Vries	d9f28a46d8	actor_aggregation: small fixes to code	6 years ago
Erik de Vries	a29d04dacd	actorizer: fixed handling of empty results due to regex filtering	6 years ago
Erik de Vries	8e920f5f37	elasticizer: removed idiotic 15min sleep time after 500 batches	6 years ago
Erik de Vries	a11d7728ea	actor_aggregation: only aggregate scores on non-junk articles	6 years ago
Erik de Vries	54a70c47a0	actor_aggregation: removed timeout for parallel processing, requires fix in elasticizer (cannot recycle the same connection)	6 years ago
Erik de Vries	58fce4d560	actor_aggregation: added randomized short sleep, to allow for parallel execution	6 years ago
Erik de Vries	e3b26c0be3	actor_aggregation: Added function to generate aggregate actor measures at daily, weekly, monthly and yearly level query_string: Added default_operator parameter, to define whether whitespaces should be interpreted as AND or OR, defaults to AND	6 years ago
Erik de Vries	28989f2bc4	dfm_gen: yet another fix for codes	6 years ago
Erik de Vries	0757b6bf8b	dfm_gen: re-added codes variable	6 years ago
Erik de Vries	2fc48cc2f7	dfm_gen: fixed absence of out$codes field	6 years ago
Erik de Vries	b249ff22de	dfm_gen.R: fixed junk mutation	6 years ago
Erik de Vries	0d05765ca7	dfm_gen: removed last remains of summer sample exceptions	6 years ago
Erik de Vries	e199b23227	dfm_gen: removed exceptions for NO summer codes modelizer: created exception for outer_folds = 1 query_string: added parameter for default_operator	6 years ago
Erik de Vries	fbd525dc2e	modelizer: updated outer cross validation procedure to output raw prediction and true values, instead of processed and aggregated confusion matrix results	6 years ago
Erik de Vries	6a94bc3ed8	query_gen_actors: removed quotation marks from Minister search part	6 years ago
Erik de Vries	8d19333e59	query_gen_actors: changed script order for belgium exceptions	6 years ago
Erik de Vries	3bfe61e425	query_gen_actors: fixed implementation of Belgian exceptions	6 years ago
Erik de Vries	81697345cb	modelizer: removed breaking code	6 years ago
Erik de Vries	9ca952ca89	elastic_update: removed wait_for from url	6 years ago
Erik de Vries	8051a81b66	actorizer, dfm_gen, modelizer, out_parser: replaced all instances of detectCores by cores parameter (which defaults to detectCores)	6 years ago
Erik de Vries	ac37d836f5	elasticizer: added scroll_clear to null hits as well	6 years ago
Erik de Vries	75623856f7	elasticizer: updated scroll_clear to use conn object	6 years ago
Erik de Vries	c2d666c81d	bogus commit	6 years ago
Erik de Vries	e34460bf0f	elasticizer: clear scroll context when finishing query	6 years ago
Erik de Vries	9bd526fee0	elasticizer: fixed compatibility issues with elastic v1.0.0	6 years ago
Erik de Vries	f2312f65d5	elasticizer: update to account for syntax change in newer package versions	6 years ago
Erik de Vries	f6006eb9ba	actorizer: simplified pre/postfix check, only for NA, replace empty strings by NA beforehand	6 years ago
Erik de Vries	298099a4e6	actorizer: fix to deal with empty updates (ie dont do an update)	6 years ago
Erik de Vries	6961c0b866	query_gen_actors: updated actorid filter to use the keyword subfield	6 years ago
Erik de Vries	703b5e59a4	actorizer: fixed exceptionizer by adding whitespace before and after sentence, which is necessary because of negative regex (match anything before or after the highlight string that is NOT x actually requires something to be in front or after)	6 years ago
Erik de Vries	593d2de6e2	actorizer: add pre_tags and post_tags to argument list bulk_writer: updated to use _doc doctype query_gen_actors: added NA for all searches that don't have pre- or postfixes	6 years ago
Erik de Vries	a1b6c6a7cb	actorizer, query_gen_actors: revamped actor searches entirely elasticizer: updated script for use with ES 7.x	6 years ago
Erik de Vries	88fc4ec53c	dfm_gen: changed out_parser call to mamlr:::out_parser	6 years ago
Erik de Vries	90fdbcc982	out_parser: parallelized when not in windoze	6 years ago
Erik de Vries	6414f759bd	actorizer: parallelized calculation of marker positions	6 years ago
Erik de Vries	522c872dba	out_parser: moved cleaning regex to end of pipeline, to prevent collissions with other (mandatory) regex cleaning	6 years ago
Erik de Vries	5b9793cd8c	actorizer: removed nested mclapply	6 years ago
Erik de Vries	1a4ba19546	actorizer: Removed udmodel dependencies, commented code, changed nested lists to flat lists bulk_writer: changed handling of single-row dataframe parsing to JSON elastic_update: changed function to return instead of print appData on error ud_update: Changed nested lists to flat lists, and added start and end character positions	6 years ago
Erik de Vries	3abc3056e0	actorizer: fix to columns selected for actors variable, removed udmodel requirement	6 years ago
Erik de Vries	41c86ea116	actorizer, ud_update: Updated ud parsing and actorizer to work based on character positions. This code is used for local testing	6 years ago
Erik de Vries	eae1a22609	actorizer: update to use '\|\|\|' as highlight indicator, and set up ud output merging accordingly	6 years ago
Erik de Vries	5665b6d622	actorizer: more fixes to punctuation	7 years ago
Erik de Vries	cd05733648	actorizer: Additional fix for missing punctuation (see previous commit)	7 years ago
Erik de Vries	09732a1b5a	actorizer: quick fix for problem where original UK UD output does not have a dot at the end of the document, but the actor output does (old vs new parsing)	7 years ago

1 2 3 4 5

218 Commits (bbec8f55476a5494f49bc03a803e56c1177214b1)