Package that extends some stylo
capabilities and enhances interpretation of (unsupervised clustering) analysis. Includes few convenience functions for teaching and demonstration purposes.
Install from GitHub (make sure you have devtools
package):
devtools::install_github("perechen/seetrees")
library(stylo)
library(seetrees)
data(lee) ## load one of the stylo datasets
stylo_res <- stylo(frequencies=lee,gui=F)
view_tree(stylo_res, k=2,right_margin=12) ## redraws a dendrogram based on distance matrix, cuts it to k groups, shows associated features
Check ?view_tree()
for more details.
It should produce two plots (a dendrogram cut to groups , and lists of words associated with groups)
Note: words associated with clusters are determined by calculating correlation ratio catdes()
from FactoMineR
Simple visualisation of most distinctive features in a text, or a class (an author). It uses corpus-wide scaled (z-scored) feature frequencies, and returns deviations (in both directions) of the top
library(stylo)
library(seetrees)
data(lee) ## load one of the stylo datasets
stylo_res <- stylo(frequencies=lee,gui=F)
# ask for 20 (positive and negative) features in Faulkner's "Absalom! Absalom!" that deviate from the corpus mean the most
view_scores(stylo_res, target_text="Faulkner_Absalom_1936",top=20)
Returns a column plot that shows preferred (pink) and avoided (lightblue) words. NB Numbers on columns indicate the feature's corpus-wide frequency rank. Dashed lines mark the mean, +-1 and +-2 SD.
Check ?view_scores()
for more details.
Compares two documents based on used features in stylo()
. Draws z-scores profile, or difference profile with the option to annotate largest differences
library(stylo)
library(seetrees)
data(lee) ## load one of the stylo datasets
stylo_res <- stylo(frequencies=lee,gui=F)
# compare "To Kill a Mocking Bird" and "In Cold Blood", annotate 10 features that behave most differently
compare_scores(stylo_res,
source_text="HarperLee_Mockingbird_1960",
target_text="Capote_Blood_1966",
top_diff=10,
type="profile")
Also supports type="diff"
flavor of visualisation (profile of differences)