Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inf and -Inf from pairwise_cor #7

Open
wendywangwwt opened this issue Mar 29, 2017 · 0 comments
Open

Inf and -Inf from pairwise_cor #7

wendywangwwt opened this issue Mar 29, 2017 · 0 comments

Comments

@wendywangwwt
Copy link

Hi,

I'm using widyr to do text mining homework, where I'm asked to calculate word association of NY Time articles.

For the input dataframe, I have word (unigram) and document idx and author name.
Then I use the following code to calculate pairwise correlation and pick trump out.

for (name in authors){
  idx <- idx + 1
  l.cor[[idx]] <- d.tc %>%
    filter(author == name) %>%
    pairwise_cor(word, document) %>%
    filter(!is.na(correlation))
}

trump.cor <- rbind(l.cor[[1]]%>% 
                     filter(item1 == "trump") %>%
                     mutate(author = authors[1]),
                   l.cor[[2]] %>%
                     filter(item1 == "trump")%>%
                     mutate(author = authors[2]),
                   l.cor[[3]] %>%
                     filter(item1 == "trump")%>%
                     mutate(author = authors[3]),
                   l.cor[[4]] %>%
                     filter(item1 == "trump")%>%
                     mutate(author = authors[4]),
                   l.cor[[5]] %>%
                     filter(item1 == "trump")%>%
                     mutate(author = authors[5]))

There are inf values in the result:

> trump.cor[which(trump.cor$correlation==Inf),]
# A tibble: 38 × 4
    item1    item2 correlation             author
   <fctr>   <fctr>       <dbl>             <fctr>
1   trump       ad         Inf Thomas L. Friedman
2   trump american         Inf Thomas L. Friedman
3   trump      ani         Inf Thomas L. Friedman
4   trump    anoth         Inf Thomas L. Friedman
5   trump      bad         Inf Thomas L. Friedman
6   trump    bring         Inf Thomas L. Friedman
7   trump   candid         Inf Thomas L. Friedman
8   trump   common         Inf Thomas L. Friedman
9   trump  connect         Inf Thomas L. Friedman
10  trump democrat         Inf Thomas L. Friedman
# ... with 28 more rows
> summary(trump.cor)
       item1            item2        correlation                     author    
 trump    :20908   ad      :    5   Min.   :   -Inf   David Brooks      :4710  
 a        :    0   american:    5   1st Qu.:0.02592   Maureen Dowd      :5909  
 aaron    :    0   ani     :    5   Median :0.03043   Nicholas Kristof  :5877  
 aarondmil:    0   anoth   :    5   Mean   :    NaN   Paul Krugman      :4372  
 aarp     :    0   bad     :    5   3rd Qu.:0.04327   Thomas L. Friedman:  40  
 ababa    :    0   bring   :    5   Max.   :    Inf                            
 (Other)  :    0   (Other) :20878                                              

For anyone who wants to replicate my result, the r data file (read using readRDS) is attached.
trump clinton.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant