mamlr

edevries

mamlr

Archived

Author	SHA1	Message	Date
Erik de Vries	17d49f07c0	updated namespace and docs	5 years ago
Erik de Vries	4a0f2206fd	removed multicore support, added parameters for dfm_gen	5 years ago
Your Name	b7f1afddd1	actor_merger: total rewrite based on data.table for performance reasons. Added some exceptions due to non-existing partyIds that some individual actors have in the actor database	5 years ago
Your Name	77eb51a1bf	actorizer: totally revamped way of finding actors elasticizer: updated dump handling to create a dump for every batch, instead of one big file at the end out_parser: streamlined code query_gen_actors: only include relevant fields ud_update: changed function parameters to work with elasticizer dump function	5 years ago
Your Name	4b4d860235	class_update: remove dfm_gen multicore option dfm_gen: remove multicore, update merger() code elasticizer: changed filenaming scheme for dump option merger: Fixed bug where an NA lemma would cause the entire document to become NA. Now the NA lemmas are filtered out before merging ud_update: removed parallel processing, changed script to save bulk updates in .Rds files instead of sending them straight away	5 years ago
Your Name	5d99ec9509	elasticizer: added option to dump data frames to rds files out_parser: changed to single core, due to performance increase sentencizer: corrected documentation for sent_dict parameter	5 years ago
Your Name	11bf71c7dd	fixes for removal of actor_fetcher function	5 years ago
Your Name	f022312485	actor_merger: added function for generating actor-document data frames actor_fetcher: removed from package other: major update to documentation	5 years ago
Your Name	98325bde8f	sentencizer: added new function for sentiment coding and actor collection	5 years ago
Your Name	9eae486a80	separated data preprocessing routines class_update: check if there are idf values associated with model, before applying weights estimator: make use of preproc() function for data preprocessing preproc: function containing all logic with regards to text data preprocessing and weighting	5 years ago
Your Name	a3b6e19646	revised modeling pipeline: cv_generator: generate folds for nested cv dfm_gen: added optional lowercasing parameter estimator: estimate model and performance based on parameters feat_select: select features based on textstat_keyness metric_gen: convert output from estimator to model performance metrics modelizer: updated for new pipeline modelizer_old: old model pipeline out_parser: now correctly exported	5 years ago
Erik de Vries	889e7e92af	lemma_writer: updated to provide support for writing raw documents to individual files using utf-8 encoding	6 years ago
Erik de Vries	6f5ace8c52	actor_fetcher: elasticizer batch function to fetch actorsDetail fields from all relevant documents	6 years ago
Erik de Vries	0d81d6fc7a	added aggregator and aggregator_elastic functions for aggregating and storing article level actor aggregations	6 years ago
Erik de Vries	e3b26c0be3	actor_aggregation: Added function to generate aggregate actor measures at daily, weekly, monthly and yearly level query_string: Added default_operator parameter, to define whether whitespaces should be interpreted as AND or OR, defaults to AND	6 years ago
Erik de Vries	8051a81b66	actorizer, dfm_gen, modelizer, out_parser: replaced all instances of detectCores by cores parameter (which defaults to detectCores)	6 years ago
Erik de Vries	a1b6c6a7cb	actorizer, query_gen_actors: revamped actor searches entirely elasticizer: updated script for use with ES 7.x	6 years ago
Erik de Vries	5b9793cd8c	actorizer: removed nested mclapply	6 years ago
Erik de Vries	9b0ac775af	class_update: add ver variable to set version for class updated articles	7 years ago
Erik de Vries	85306007f4	class_update: added words and clean parameters, in addition to text parameter, to be able to set data preprocessing exactly the same as in the trained model	7 years ago
Erik de Vries	ce5f812252	dfm_gen, merger: Added option for generating lemma_upos hybrids for merged field merger: Added custom clean option (sometimes not cleaning is preferred, even with lemmas) merger, out_parser: Updated regex for filtering out non-words to also include email addresses (containing both @ and .)	7 years ago
Erik de Vries	386ac42aee	lemma_writer: new function to write raw lemma's (without interpunction) to text file. Is structured as elasticizer update function (despite not updating anything on the server)	7 years ago
Erik de Vries	1955692346	dfm_gen, out_parser: updated documentation dupe_detect: major fix to function, no longer using rownames for article ids	7 years ago
Erik de Vries	34531b0da8	out_parser: added option to clean output using regex to remove numbers and non-words dfm_gen, ud_update: updated functions to make use of out_parser cleaning option merger: updated regex for cleaning lemmatized output	7 years ago
Erik de Vries	4f8b1f2024	elasticizer: renamed size parameter to batch_size, created max_batch parameter to limit the number of results returned query_string: renamed x parameter to query, added fields parameter to select what fields to return and random boolean parameter to define whether the returned results should be randomized	7 years ago
Erik de Vries	0a3bdb630b	actorizer, dfm_gen, ud_update: unified output parsing from _source and highlight fields into a single function (out_parser) out_parser: function to parse raw text output into a single field, either from _source or highlight fields dupe_detect: updated function to use 'ver' parameter for versioning	7 years ago
Erik de Vries	8ffbddc073	actorizer, ud_update: implemented 'ver' variable for keeping track of updates	7 years ago
Erik de Vries	ae23456736	actorizer, ud_update: Updated merging of document fields to properly deal with missing punctuation at the end of fields (e.g. a title without punctuation at the end of the string) modelizer: Minor update to feature keyness, using absolute values now to determine the most informative features for a class (so features that are either strongly postively or negatively related to the class) bulk_writer: Added the 'ver' parameter to include a short version string with each update. Mostly to deal with updates that do not complete successfully on all data	7 years ago
Erik de Vries	9f3418ef37	class_update; dfm_gen; merger: updated functions to accept text parameter for both old style 'lemmas' and new style 'ud'	7 years ago
Erik de Vries	39005c7518	elasticizer: Updated bulk size to 1024 (a power of 2) and set a timeout of 900s every 500000 updates query_gen_actors: Added an additional generator for the "Institution" type (for EU support) actorizer: Created an updater function to search for actors and use UDPipe to parse the results	7 years ago
Erik de Vries	061da17c2a	ud_update: Added function to lemmatize documents	7 years ago
Erik de Vries	ef51ce60a9	Fixed dupe_detect error on documents with one sentence or less, and a maximum # of words in dfm_gen	7 years ago
Erik de Vries	0e8c127b86	bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname class_update/elastic_update: Moved response checking to elastic_update dupe_detect: Finalized dupe_detect	7 years ago
Erik de Vries	085252abda	documentation: updated dupe_detect and merger	7 years ago
Erik de Vries	f543d658bd	Major overhaul to ES bulk update integration. Added support for both setting and appending to variables	7 years ago
Erik de Vries	4cd46d1a5e	dupe_detect: added support for both lower and upper cutoff point	7 years ago
Erik de Vries	11d8b31c60	Added generic actor search query generator. Updated elasticizer and elastic_update to connect either to the remote server, or a local ES instance	7 years ago
Erik de Vries	adc4b3c639	Updated feature selection in modelizer function (see comment on lines 166/167)	7 years ago
Erik de Vries	65f8c26ec6	Renamed dupe_detect, and added return output	7 years ago
Erik de Vries	db418d7396	Add query_string function for generating query_string queries	7 years ago
Erik de Vries	d203de0b2a	Updated elasticizer docs, created modelizer and class_update functions	7 years ago
Erik de Vries	c815dc7f2b	Duplicate detection first commit	7 years ago
Erik de Vries	217ee76568	V 0.1 for elasticizer function with updater support	7 years ago
Erik de Vries	a273524105	Added support for custom update function to elasticizer	7 years ago
Erik de Vries	0e45c0f2d1	Added option for fulltext vs lemmas merged field	7 years ago
Erik de Vries	4bbe84ab83	First release of mamlr package	7 years ago

46 Commits (17d49f07c0915debe6dced7786037a9076e9e608)