mamlr/man/modelizer.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/modelizer.R
\name{modelizer}
\alias{modelizer}
\title{Generate a classification model}
\usage{
modelizer(
  dfm,
  outer_k,
  inner_k,
  class_type,
  opt_measure,
  country,
  grid,
  seed,
  model,
  we_vectors,
  cores = 1
)
}
\arguments{
\item{dfm}{A quanteda dfm used to train and evaluate the model, should contain the vector with class labels in docvars}

\item{outer_k}{Number of outer cross-validation folds (for performance estimation)}

\item{inner_k}{Number of inner cross-validation folds (for hyperparameter optimization and feature selection)}

\item{class_type}{Type of classification to model ("junk", "aggregate", or "codes")}

\item{opt_measure}{Label of measure in confusion matrix to use as performance indicator}

\item{country}{Two-letter country abbreviation of the country the model is estimated for (used for filename)}

\item{grid}{Data frame providing all possible combinations of hyperparameters and feature selection parameters for a given model (grid search)}

\item{seed}{Integer to use as seed for random number generation, ensures replicability}

\item{model}{Classification algorithm to use (currently only "nb" for Naïve Bayes using textmodel_nb)}

\item{we_vectors}{Matrix with word embedding vectors}

\item{cores}{Number of threads used for parallel processing using future_lapply, defaults to 1}
}
\value{
A list containing all relevant output
}
\description{
Generate a nested cross validated classification model based on a dfm with class labels as docvars
Currently only supports Naïve Bayes using quanteda's textmodel_nb
Hyperparemeter optimization is enabled through the grid parameter
A grid should be generated from vectors with the labels as described for each model, using the crossing() command
For Naïve Bayes, the following parameters can be used:
- percentiles (cutoff point for tf-idf feature selection)
- measures (what measure to use for determining feature importance, see textstat_keyness for options)
}
\examples{
modelizer(dfm, outer_k, inner_k, class_type, opt_measure, country, grid, seed, model, cores = 1)
}
Updated elasticizer docs, created modelizer and class_update functions 6 years ago			`% Generated by roxygen2: do not edit by hand`
			`% Please edit documentation in R/modelizer.R`
			`\name{modelizer}`
			`\alias{modelizer}`
			`\title{Generate a classification model}`
			`\usage{`
actor_merger: added function for generating actor-document data frames actor_fetcher: removed from package other: major update to documentation 5 years ago			`modelizer(`
			`dfm,`
			`outer_k,`
			`inner_k,`
			`class_type,`
			`opt_measure,`
			`country,`
			`grid,`
			`seed,`
			`model,`
			`we_vectors,`
			`cores = 1`
			`)`
Updated elasticizer docs, created modelizer and class_update functions 6 years ago			`}`
			`\arguments{`
			`\item{dfm}{A quanteda dfm used to train and evaluate the model, should contain the vector with class labels in docvars}`

			`\item{outer_k}{Number of outer cross-validation folds (for performance estimation)}`

			`\item{inner_k}{Number of inner cross-validation folds (for hyperparameter optimization and feature selection)}`

			`\item{class_type}{Type of classification to model ("junk", "aggregate", or "codes")}`

			`\item{opt_measure}{Label of measure in confusion matrix to use as performance indicator}`

			`\item{country}{Two-letter country abbreviation of the country the model is estimated for (used for filename)}`

			`\item{grid}{Data frame providing all possible combinations of hyperparameters and feature selection parameters for a given model (grid search)}`
revised modeling pipeline: cv_generator: generate folds for nested cv dfm_gen: added optional lowercasing parameter estimator: estimate model and performance based on parameters feat_select: select features based on textstat_keyness metric_gen: convert output from estimator to model performance metrics modelizer: updated for new pipeline modelizer_old: old model pipeline out_parser: now correctly exported 5 years ago
			`\item{seed}{Integer to use as seed for random number generation, ensures replicability}`

			`\item{model}{Classification algorithm to use (currently only "nb" for Naïve Bayes using textmodel_nb)}`

actor_merger: added function for generating actor-document data frames actor_fetcher: removed from package other: major update to documentation 5 years ago			`\item{we_vectors}{Matrix with word embedding vectors}`

revised modeling pipeline: cv_generator: generate folds for nested cv dfm_gen: added optional lowercasing parameter estimator: estimate model and performance based on parameters feat_select: select features based on textstat_keyness metric_gen: convert output from estimator to model performance metrics modelizer: updated for new pipeline modelizer_old: old model pipeline out_parser: now correctly exported 5 years ago			`\item{cores}{Number of threads used for parallel processing using future_lapply, defaults to 1}`
Updated elasticizer docs, created modelizer and class_update functions 6 years ago			`}`
			`\value{`
actor_merger: added function for generating actor-document data frames actor_fetcher: removed from package other: major update to documentation 5 years ago			`A list containing all relevant output`
Updated elasticizer docs, created modelizer and class_update functions 6 years ago			`}`
			`\description{`
			`Generate a nested cross validated classification model based on a dfm with class labels as docvars`
			`Currently only supports Naïve Bayes using quanteda's textmodel_nb`
			`Hyperparemeter optimization is enabled through the grid parameter`
			`A grid should be generated from vectors with the labels as described for each model, using the crossing() command`
			`For Naïve Bayes, the following parameters can be used:`
			`- percentiles (cutoff point for tf-idf feature selection)`
			`- measures (what measure to use for determining feature importance, see textstat_keyness for options)`
			`}`
			`\examples{`
revised modeling pipeline: cv_generator: generate folds for nested cv dfm_gen: added optional lowercasing parameter estimator: estimate model and performance based on parameters feat_select: select features based on textstat_keyness metric_gen: convert output from estimator to model performance metrics modelizer: updated for new pipeline modelizer_old: old model pipeline out_parser: now correctly exported 5 years ago			`modelizer(dfm, outer_k, inner_k, class_type, opt_measure, country, grid, seed, model, cores = 1)`
Updated elasticizer docs, created modelizer and class_update functions 6 years ago			`}`