mamlr

edevries

mamlr

Archived

Author	SHA1	Message	Date
Your Name	2a220ded5d	dupe_detect: fix to query string for multi-word doctype names	5 years ago
Your Name	5bd36dcb44	dupe_detect: Changed query from json to query_string style, and added filter for already detected duplicates cv_generator: Changed code to use a generic vector of true values to draw the conditional random sample, instead of dfm/docvars specifically	5 years ago
Your Name	e499d70671	actor_merger: added ungroup() calls at the start and end of function, to speed up processing sentencizer: added ungroup() call at the end of the function to speed up processing	5 years ago
Your Name	8634d549a3	sentencizer: updates to collect sentence word counts and number of sentences also when no sent_dict is provided	5 years ago
Your Name	61e0581595	actor_merger: removed debug line	5 years ago
Your Name	f022312485	actor_merger: added function for generating actor-document data frames actor_fetcher: removed from package other: major update to documentation	5 years ago
Your Name	4e867214dd	sentencizer: commented code	5 years ago
Your Name	ec8afc4990	sentencizer: fixed actorsDetail coding error	5 years ago
Your Name	9ccfd2952e	sentencizer: minor updates	5 years ago
Your Name	98325bde8f	sentencizer: added new function for sentiment coding and actor collection	5 years ago
Your Name	7f958bbc11	actor_fetcher: small fixes	5 years ago
Your Name	8eedec8bb5	actor_fetcher: added option for using dictionaries with just lemmas, besides the option of using lemma_upos dictionaries	5 years ago
Your Name	057d225a7a	actor_fetcher: Allow generation of actor df containing only specified actor ids and aggregations	5 years ago
Your Name	9eae486a80	separated data preprocessing routines class_update: check if there are idf values associated with model, before applying weights estimator: make use of preproc() function for data preprocessing preproc: function containing all logic with regards to text data preprocessing and weighting	5 years ago
Your Name	a3b6e19646	revised modeling pipeline: cv_generator: generate folds for nested cv dfm_gen: added optional lowercasing parameter estimator: estimate model and performance based on parameters feat_select: select features based on textstat_keyness metric_gen: convert output from estimator to model performance metrics modelizer: updated for new pipeline modelizer_old: old model pipeline out_parser: now correctly exported	5 years ago
Your Name	e76a914dd2	actor_fetcher: Updated to tidyr 1.0.0, no longer using preserve, slightly different approach to keeping ids_list, and not removing actorsDetail anymore because it does not exist	5 years ago
Your Name	a01a53f105	class_update: added cores parameter for multicore processing of sources when using lemmas	5 years ago
Your Name	d9f936c566	modelizer: tf-idf application updated, final model now also includes idf values from training set, explicitly setting positive category in binary classification for confusion matrices, minor code fixes dfm_gen: added old junk codes for recoding, and removed deprecated ngrams parameter from dfm function class_update: removed dfm_words parameter, which is replaced by the force = T parameter in predict(), training/model idf is now applied to unseen data DESCRIPTION: added quanteda.textmodels as new dependency, since these have been separated from base quanteda 2.0.0 onwards	5 years ago
Erik de Vries	06bfec71bc	lemma_writer: unlist lemmas before writing	6 years ago
Erik de Vries	a83ee5dfd0	lemma_writer: update to write lemma instead of full document text	6 years ago
Erik de Vries	e594185719	dfm_gen: set default cores to 1	6 years ago
Erik de Vries	889e7e92af	lemma_writer: updated to provide support for writing raw documents to individual files using utf-8 encoding	6 years ago
Erik de Vries	115297f597	actor_aggregation,aggregator,aggregator_elastic: moved out of package directory to Old actor_fetcher: moved sentiment validation code block	6 years ago
Erik de Vries	3fcbbd1f1f	actor_fetch: fixed error where source.ud would not exist	6 years ago
Erik de Vries	674ef09e10	query_gen_actors: added junior minister check to if statement	6 years ago
Erik de Vries	853c117daf	actor_fetcher: change in code to keep original actorid lists in output query_gen_actors: added code for junior ministers in BE and NL	6 years ago
Erik de Vries	bf3d11ffe0	query_gen_actors: various bugfixes and changes	6 years ago
Erik de Vries	99af1427f0	query_gen_actors: fixed scandinavian query generation	6 years ago
Erik de Vries	e49a4ae93e	query_gen_actors: fixed problem with too many brackets in query	6 years ago
Erik de Vries	060751237b	actorizer, out_parser: switched from mclapply to future_lapply and removed windows-specific code from out_parser query_gen_actors: rewritten minister queries to only use proximity queries	6 years ago
Erik de Vries	d0601d2aa7	actor_fetcher: added minimum verbosity to identify cases in which an actor is present without a party mention	6 years ago
Erik de Vries	82ef165c5f	actor_fetcher: quick fix	6 years ago
Erik de Vries	9e433ecf9e	actor_fetcher: added handling of exception where all actorsids related to a party are individual actors	6 years ago
Erik de Vries	526270900c	actor_fetcher: integrated party merging into actor_fetcher in what hopefully is the most efficient way	6 years ago
Erik de Vries	84df9658ff	actor_fetcher: added lemma output when validating, to detect most problematic lemmas	6 years ago
Erik de Vries	499ee74f0d	actor_fetcher: fixed code error	6 years ago
Erik de Vries	a3e8dcf96e	actor_fetcher: switched from binary word sentiment scores to proximity scores (cosine similarity)	6 years ago
Erik de Vries	6f5ace8c52	actor_fetcher: elasticizer batch function to fetch actorsDetail fields from all relevant documents	6 years ago
Erik de Vries	edd4b785a5	actor_aggregation: updated to use future package for parallel processing as beta test for switching all parallel processing to future. Also disabled some of the aggregator output to save computation time	6 years ago
Erik de Vries	f8bc53006d	actor_aggregation: added sentiment analysis support for generating aggregations	6 years ago
Erik de Vries	d3d4045f1c	actor_aggregation: added sentence count to output, and changed occurences to count instead of mean, changed prom and rel_first to prom_art and rel_first_art, changed output filename to include function	6 years ago
Erik de Vries	176a8f6de4	elasticizer: added additional verbosity on errors	6 years ago
Erik de Vries	d420b02c20	elasticizer: Added more verbosity to investigate error handling	6 years ago
Erik de Vries	48b589dda0	query_gen_actors: reset to original state	6 years ago
Erik de Vries	7a01a7f18d	query_gen_actors: temporary update for fixing broken shit	6 years ago
Erik de Vries	45da9dd929	aggregator_elastic: revert to single-core lapply, due to sendMaster errors	6 years ago
Erik de Vries	f8e4111e70	aggregator_elastic: correct partyid implementation	6 years ago
Erik de Vries	c047a4a1db	aggregator_elastic: explicit reference to aggregator function	6 years ago
Erik de Vries	0d81d6fc7a	added aggregator and aggregator_elastic functions for aggregating and storing article level actor aggregations	6 years ago
Erik de Vries	2281d11a68	actor_aggregation: fixed filenaming of .Rds files	6 years ago
Erik de Vries	d9f28a46d8	actor_aggregation: small fixes to code	6 years ago
Erik de Vries	a29d04dacd	actorizer: fixed handling of empty results due to regex filtering	6 years ago
Erik de Vries	8e920f5f37	elasticizer: removed idiotic 15min sleep time after 500 batches	6 years ago
Erik de Vries	a11d7728ea	actor_aggregation: only aggregate scores on non-junk articles	6 years ago
Erik de Vries	54a70c47a0	actor_aggregation: removed timeout for parallel processing, requires fix in elasticizer (cannot recycle the same connection)	6 years ago
Erik de Vries	58fce4d560	actor_aggregation: added randomized short sleep, to allow for parallel execution	6 years ago
Erik de Vries	e3b26c0be3	actor_aggregation: Added function to generate aggregate actor measures at daily, weekly, monthly and yearly level query_string: Added default_operator parameter, to define whether whitespaces should be interpreted as AND or OR, defaults to AND	6 years ago
Erik de Vries	28989f2bc4	dfm_gen: yet another fix for codes	6 years ago
Erik de Vries	0757b6bf8b	dfm_gen: re-added codes variable	6 years ago
Erik de Vries	2fc48cc2f7	dfm_gen: fixed absence of out$codes field	6 years ago
Erik de Vries	b249ff22de	dfm_gen.R: fixed junk mutation	6 years ago
Erik de Vries	0d05765ca7	dfm_gen: removed last remains of summer sample exceptions	6 years ago
Erik de Vries	e199b23227	dfm_gen: removed exceptions for NO summer codes modelizer: created exception for outer_folds = 1 query_string: added parameter for default_operator	6 years ago
Erik de Vries	fbd525dc2e	modelizer: updated outer cross validation procedure to output raw prediction and true values, instead of processed and aggregated confusion matrix results	6 years ago
Erik de Vries	6a94bc3ed8	query_gen_actors: removed quotation marks from Minister search part	6 years ago
Erik de Vries	8d19333e59	query_gen_actors: changed script order for belgium exceptions	6 years ago
Erik de Vries	3bfe61e425	query_gen_actors: fixed implementation of Belgian exceptions	6 years ago
Erik de Vries	81697345cb	modelizer: removed breaking code	6 years ago
Erik de Vries	9ca952ca89	elastic_update: removed wait_for from url	6 years ago
Erik de Vries	8051a81b66	actorizer, dfm_gen, modelizer, out_parser: replaced all instances of detectCores by cores parameter (which defaults to detectCores)	6 years ago
Erik de Vries	ac37d836f5	elasticizer: added scroll_clear to null hits as well	6 years ago
Erik de Vries	75623856f7	elasticizer: updated scroll_clear to use conn object	6 years ago
Erik de Vries	c2d666c81d	bogus commit	6 years ago
Erik de Vries	e34460bf0f	elasticizer: clear scroll context when finishing query	6 years ago
Erik de Vries	9bd526fee0	elasticizer: fixed compatibility issues with elastic v1.0.0	6 years ago
Erik de Vries	f2312f65d5	elasticizer: update to account for syntax change in newer package versions	6 years ago
Erik de Vries	f6006eb9ba	actorizer: simplified pre/postfix check, only for NA, replace empty strings by NA beforehand	6 years ago
Erik de Vries	298099a4e6	actorizer: fix to deal with empty updates (ie dont do an update)	6 years ago
Erik de Vries	6961c0b866	query_gen_actors: updated actorid filter to use the keyword subfield	6 years ago
Erik de Vries	703b5e59a4	actorizer: fixed exceptionizer by adding whitespace before and after sentence, which is necessary because of negative regex (match anything before or after the highlight string that is NOT x actually requires something to be in front or after)	6 years ago
Erik de Vries	593d2de6e2	actorizer: add pre_tags and post_tags to argument list bulk_writer: updated to use _doc doctype query_gen_actors: added NA for all searches that don't have pre- or postfixes	6 years ago
Erik de Vries	a1b6c6a7cb	actorizer, query_gen_actors: revamped actor searches entirely elasticizer: updated script for use with ES 7.x	6 years ago
Erik de Vries	88fc4ec53c	dfm_gen: changed out_parser call to mamlr:::out_parser	6 years ago
Erik de Vries	90fdbcc982	out_parser: parallelized when not in windoze	6 years ago
Erik de Vries	6414f759bd	actorizer: parallelized calculation of marker positions	6 years ago
Erik de Vries	522c872dba	out_parser: moved cleaning regex to end of pipeline, to prevent collissions with other (mandatory) regex cleaning	6 years ago
Erik de Vries	5b9793cd8c	actorizer: removed nested mclapply	6 years ago
Erik de Vries	1a4ba19546	actorizer: Removed udmodel dependencies, commented code, changed nested lists to flat lists bulk_writer: changed handling of single-row dataframe parsing to JSON elastic_update: changed function to return instead of print appData on error ud_update: Changed nested lists to flat lists, and added start and end character positions	6 years ago
Erik de Vries	3abc3056e0	actorizer: fix to columns selected for actors variable, removed udmodel requirement	6 years ago
Erik de Vries	41c86ea116	actorizer, ud_update: Updated ud parsing and actorizer to work based on character positions. This code is used for local testing	6 years ago
Erik de Vries	eae1a22609	actorizer: update to use '\|\|\|' as highlight indicator, and set up ud output merging accordingly	6 years ago
Erik de Vries	5665b6d622	actorizer: more fixes to punctuation	6 years ago
Erik de Vries	cd05733648	actorizer: Additional fix for missing punctuation (see previous commit)	6 years ago
Erik de Vries	09732a1b5a	actorizer: quick fix for problem where original UK UD output does not have a dot at the end of the document, but the actor output does (old vs new parsing)	6 years ago
Erik de Vries	835d2332bc	actorizer: now uses the original udpipe output for sentence and token ids. When the actorized and original udpipe output do not have the same number of rows, it prints an error and sets err to TRUE in actorDetails	6 years ago
Erik de Vries	e70b6ccf7a	actorizer: fixed sentence_count and out_parser calls out_parser: Added comment with old regex	6 years ago
Erik de Vries	9b0ac775af	class_update: add ver variable to set version for class updated articles	6 years ago
Erik de Vries	85306007f4	class_update: added words and clean parameters, in addition to text parameter, to be able to set data preprocessing exactly the same as in the trained model	6 years ago
Erik de Vries	e110780ad5	merger: idiotic fix for a non-problem, see comment on line 32	6 years ago
Erik de Vries	ce5f812252	dfm_gen, merger: Added option for generating lemma_upos hybrids for merged field merger: Added custom clean option (sometimes not cleaning is preferred, even with lemmas) merger, out_parser: Updated regex for filtering out non-words to also include email addresses (containing both @ and .)	6 years ago

1 2 3 4 5

212 Commits (3f268bbf06fe6d097fb14be234ea6702caa8a53c)