seetrees

Package that extends some stylo capabilities and enhances interpretation of (unsupervised clustering) analysis. Includes few convenience functions for teaching and demonstration purposes.

Installation

Install from GitHub (make sure you have devtools package):

devtools::install_github("perechen/seetrees")

view_tree()

library(stylo)
library(seetrees)

data(lee) ## load one of the stylo datasets

stylo_res <- stylo(frequencies=lee,gui=F)
view_tree(stylo_res, k=2,right_margin=12) ## redraws a dendrogram based on distance matrix, cuts it to k groups, shows associated features

Check ?view_tree() for more details.

It should produce two plots (a dendrogram cut to groups , and lists of words associated with groups)

Note: words associated with clusters are determined by calculating correlation ratio $\eta^2$ of word frequency ($f$) across clusters ($c$) and documents ($d$). Then results are filtered by p-value (which might not make sense at all). Notation adopted from Cafiero & Camps 2020, implementation by catdes() from FactoMineR

$$ \eta^2 = \frac{\sum_\nolimits{c} \sum_\nolimits{d}(f_{d,c}-\bar{f_c})^2}{\sum_\nolimits{c} \sum_\nolimits{d}(f_{d,c} - \bar{f})^2} $$

view_scores()

Simple visualisation of most distinctive features in a text, or a class (an author). It uses corpus-wide scaled (z-scored) feature frequencies, and returns deviations (in both directions) of the top $n$ features. Quick and dirty way to ask "What is going on in here?".

library(stylo)
library(seetrees)

data(lee) ## load one of the stylo datasets

stylo_res <- stylo(frequencies=lee,gui=F)
# ask for 20 (positive and negative) features in Faulkner's "Absalom! Absalom!" that deviate from the corpus mean the most 
view_scores(stylo_res, target_text="Faulkner_Absalom_1936",top=20)

Returns a column plot that shows preferred (pink) and avoided (lightblue) words. NB Numbers on columns indicate the feature's corpus-wide frequency rank. Dashed lines mark the mean, +-1 and +-2 SD.

Check ?view_scores() for more details.

compare_scores()

Compares two documents based on used features in stylo(). Draws z-scores profile, or difference profile with the option to annotate largest differences

library(stylo)
library(seetrees)

data(lee) ## load one of the stylo datasets

stylo_res <- stylo(frequencies=lee,gui=F)
# compare "To Kill a Mocking Bird" and "In Cold Blood", annotate 10 features that behave most differently
compare_scores(stylo_res,
   		   source_text="HarperLee_Mockingbird_1960",
   		   target_text="Capote_Blood_1966",
   		   top_diff=10,
   		   type="profile")

Also supports type="diff" flavor of visualisation (profile of differences)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
R		R
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
seetrees.Rproj		seetrees.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seetrees

Installation

view_tree()

view_scores()

compare_scores()

About

Releases

Packages

Languages

perechen/seetrees

Folders and files

Latest commit

History

Repository files navigation

seetrees

Installation

view_tree()

view_scores()

compare_scores()

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages