Erik de Vries
f6dfc6711b
minor fix
4 years ago
Erik de Vries
09fd8d0cb2
removed some unused aggregations
4 years ago
Erik de Vries
8ff4097304
renamed actor_merger to sent_merger and implemented fixes to work with sentiment data frames without actor ids
4 years ago
Erik de Vries
a37fc0410d
removed sent_sum_pos/neg
4 years ago
Erik de Vries
153c54b376
reintroduced arousal, but should be warned that arousal performance is not directly evaluated
4 years ago
Erik de Vries
cdc78039ed
removing text-level output from sentencizer, and optimizing storage by using factors
4 years ago
Erik de Vries
523d86799c
removed arousal measures
4 years ago
Erik de Vries
4a0f2206fd
removed multicore support, added parameters for dfm_gen
4 years ago
Your Name
274c9179cb
remove meta_file argument
4 years ago
Your Name
6e0e693d4e
lemma_writer: removed meta csv code
4 years ago
Your Name
4fd9222a2d
lemma_writer: updated to write metadata csv when dumping documents in ud format
...
out_parser: fix for generating empty columns
4 years ago
Your Name
955f034e6a
actor_merger: changed computation of arousal, and removed uninformative variables
4 years ago
Your Name
3cdb68b196
out_parser: updated fncols function
4 years ago
Your Name
dc40fbbb19
elasticizer: update rbindlist implementation
4 years ago
Your Name
18d47762d2
actor_merger: overhaul to include cutoffs at sentence level as intended, also included options to generate sentiment for text only (don't provide actors_meta or actor_groups)
4 years ago
Your Name
74909ca3a0
sentencizer: removed text sentiment computation from script, because of incorrect implementation
4 years ago
Your Name
c99ac23bb5
actor_merger: fixed absence of publication_date in some cases
4 years ago
Your Name
cc7fa5bffa
actor_merger: added aggregations of all individual actors and all party mentions in an article
4 years ago
Your Name
d9d578c06a
actor_merger: mult fix
4 years ago
Your Name
771145faf7
actor_merger: added mult='first' to metadata join for parties_actors to deal with duplicate partyIds (see 50Plus, Conservatives and Labour)
4 years ago
Your Name
1c14646e8f
actor_merger: dont deselect sent_words and sent_sum columns
4 years ago
Your Name
9bd382f955
actor_merger: fix to generate bogus sentiment columns
4 years ago
Your Name
b7f1afddd1
actor_merger: total rewrite based on data.table for performance reasons. Added some exceptions due to non-existing partyIds that some individual actors have in the actor database
4 years ago
Your Name
2c8a88f9a0
elasticizer: switched from bind_rows to rbindlist for composing result
...
actor_merger: added noactor.* sentiment columns, and switched to data.table for matching actor metadata with articles
4 years ago
Your Name
559199bb97
sentencizer: totally removed sent_lemmas field
4 years ago
Your Name
36f2b341a8
sentencizer: removed derived output from function
4 years ago
Your Name
80ec0be1f8
actorizer: updated to account for token start offset in udpipe output. Sometimes, the first token in an article doesn't start at character position 1 (or 2 if the article starts with a whitespace), but at position 16 and possibly other positions.
4 years ago
Your Name
336567732c
elastic_update: added more debug output
4 years ago
Your Name
df7631b9f1
sentencizer: Changed output, removed lemma list and added separate positive and negative sentiment sums
4 years ago
Your Name
ecdb5be3b4
actorizer: moved some code
4 years ago
Your Name
69d4b6f5b0
actorizer: updated to data.table for conditional joins
...
DESCRIPTION: added data.table dependency
4 years ago
Your Name
085855908c
query_gen_actors: switched from Minister to Min
4 years ago
Your Name
b406304c80
actorizer: Removed nested parallelization function
...
query_gen_actors: Integrated startDate and endDate for parties, changed party exception method from abbreviation only to both full names and abbreviations for NL and BE
4 years ago
Your Name
5de4e1488c
estimator, modelizer, preproc: Removed experimental we-vector support, and disabled inefficiently implemented preproc.R
4 years ago
Your Name
77eb51a1bf
actorizer: totally revamped way of finding actors
...
elasticizer: updated dump handling to create a dump for every batch, instead of one big file at the end
out_parser: streamlined code
query_gen_actors: only include relevant fields
ud_update: changed function parameters to work with elasticizer dump function
4 years ago
Your Name
0e593075ee
query_gen_actors: only retrieve ud field from source
5 years ago
Your Name
6eb405f8bd
merger: selecting only relevant columns
5 years ago
Your Name
38ff4dcbf0
ud_update: small fix to file naming
5 years ago
Your Name
4b4d860235
class_update: remove dfm_gen multicore option
...
dfm_gen: remove multicore, update merger() code
elasticizer: changed filenaming scheme for dump option
merger: Fixed bug where an NA lemma would cause the entire document to become NA. Now the NA lemmas are filtered out before merging
ud_update: removed parallel processing, changed script to save bulk updates in .Rds files instead of sending them straight away
5 years ago
Your Name
5d99ec9509
elasticizer: added option to dump data frames to rds files
...
out_parser: changed to single core, due to performance increase
sentencizer: corrected documentation for sent_dict parameter
5 years ago
Your Name
aa6587b204
dupe_detect: fix for quotation marks
5 years ago
Your Name
2a220ded5d
dupe_detect: fix to query string for multi-word doctype names
5 years ago
Your Name
5bd36dcb44
dupe_detect: Changed query from json to query_string style, and added filter for already detected duplicates
...
cv_generator: Changed code to use a generic vector of true values to draw the conditional random sample, instead of dfm/docvars specifically
5 years ago
Your Name
e499d70671
actor_merger: added ungroup() calls at the start and end of function, to speed up processing
...
sentencizer: added ungroup() call at the end of the function to speed up processing
5 years ago
Your Name
8634d549a3
sentencizer: updates to collect sentence word counts and number of sentences also when no sent_dict is provided
5 years ago
Your Name
61e0581595
actor_merger: removed debug line
5 years ago
Your Name
f022312485
actor_merger: added function for generating actor-document data frames
...
actor_fetcher: removed from package
other: major update to documentation
5 years ago
Your Name
4e867214dd
sentencizer: commented code
5 years ago
Your Name
ec8afc4990
sentencizer: fixed actorsDetail coding error
5 years ago
Your Name
9ccfd2952e
sentencizer: minor updates
5 years ago