mamlr

edevries

mamlr

Archived

Author	SHA1	Message	Date
Erik de Vries	17cf6d04e9	modelizer: debug update	7 years ago
Erik de Vries	7544e5323f	modelizer: update to allow tf both as count (for naive bayes), and as proportion (for other machine learning algorithms)	7 years ago
Erik de Vries	5f5e4a03c8	modelizer: Changed tf-idf weighting from absolute tf count to proportional (normalized) tf! Also added initial support for neural networks	7 years ago
Erik de Vries	34a6adf64e	changed udpipe output variable from tokens to ud	7 years ago
Erik de Vries	061da17c2a	ud_update: Added function to lemmatize documents	7 years ago
Erik de Vries	ef51ce60a9	Fixed dupe_detect error on documents with one sentence or less, and a maximum # of words in dfm_gen	7 years ago
Erik de Vries	0e8c127b86	bulk_writer: fixes for JSON generation and added exception for use of 'tokens' varname class_update/elastic_update: Moved response checking to elastic_update dupe_detect: Finalized dupe_detect	7 years ago
Erik de Vries	755a58d84d	dupe_detect: fix to prevent errors when a query returns no results	7 years ago
Erik de Vries	887f1aa774	dupe_detect: fix for empty results dataframe (no duplicates for given date and newspaper)	7 years ago
Erik de Vries	993f39957a	dfm_gen: word cutoff now as final step in script, caused bugs with mutating code variables	7 years ago
Erik de Vries	02b8a8c1da	dfm_gen & merger: Changed word cutoff point to be a general setting in dfm_gen. Cuts off at the last [.?!] before the cutoff point (so returns documents at a sentence, shorter than cutoff).	7 years ago
Erik de Vries	4a713ddc23	bulk_writer: setting names(x) <- NULL when there is only one value (list or otherwise) to be updated. This is because R apply treats rows of single values as a matrix, while it treats rows containing lists as (named) list. This has the nasty result of getting subvalues when using to JSON. i.e. computerCodes.actors = [list, of, ids] becomes computerCodes.actors.ids = [list, of, ids].	7 years ago
Erik de Vries	6bb8f9b635	class_update: added explicit httr::: references	7 years ago
Erik de Vries	f543d658bd	Major overhaul to ES bulk update integration. Added support for both setting and appending to variables	7 years ago
Erik de Vries	4adae2bbc6	Fixed bug in dupe_detect caused by switch from cutoff to cutoff_lower/upper	7 years ago
Erik de Vries	4cd46d1a5e	dupe_detect: added support for both lower and upper cutoff point	7 years ago
Erik de Vries	11d8b31c60	Added generic actor search query generator. Updated elasticizer and elastic_update to connect either to the remote server, or a local ES instance	7 years ago
Erik de Vries	3e66c7e1cd	Updated dfm_gen to have all topic vectors as numeric variables	7 years ago
Erik de Vries	adc4b3c639	Updated feature selection in modelizer function (see comment on lines 166/167)	7 years ago
Erik de Vries	65f8c26ec6	Renamed dupe_detect, and added return output	7 years ago
Erik de Vries	db418d7396	Add query_string function for generating query_string queries	7 years ago
Erik de Vries	d203de0b2a	Updated elasticizer docs, created modelizer and class_update functions	7 years ago
Erik de Vries	c815dc7f2b	Duplicate detection first commit	7 years ago
Erik de Vries	015411feaf	Added refresh=wait_for to bulk update url. This should make update scripts less demanding on the server side, because the server only replies after refreshing (happens every second)	7 years ago
Erik de Vries	413ad02a87	Set default to "lemmas" for dfm_gen	7 years ago
Erik de Vries	217ee76568	V 0.1 for elasticizer function with updater support	7 years ago
Erik de Vries	a273524105	Added support for custom update function to elasticizer	7 years ago
Erik de Vries	311838b34b	Updated dfm_gen to only create derivative codes if majorTopic actually exists, and set docvars to NULL when no majorTopic codes	7 years ago
Erik de Vries	dc4daf9de4	Added line to replace multiple whitespace characters in full text by a single regular whitespace	7 years ago
Erik de Vries	0e45c0f2d1	Added option for fulltext vs lemmas merged field	7 years ago
Erik de Vries	4bbe84ab83	First release of mamlr package	7 years ago

1 2

81 Commits (a1b6c6a7cb55403c10323982b215b41085ea1e9e)