Wednesday, October 12, 2011

Wordclouds of Chemistry Texts

After stumbling across a post on One R Tip A Day about building Wordclouds using R, I thought I would try it out myself.  I have been subjecting my undergrad chemistry students to R, so this seemed like a good opportunity to use R for a little fun.  I decided that it would be interesting to look at some old chemistry books.  I picked three works from Project Gutenberg:
  • The Sceptical Chymist (1661) by Robert Boyle
  • Elements of Chemistry (1789) by Antoine Lavoisier
  • An Elementary Study of Chemistry (1905)
    • by William McPherson and William Edwards Henderson

The Sceptical Chymist (1661) Robert Boyle
Elements of Chemistry (1789) Antoine Lavoisier
An Elementary Study of Chemistry (1905) McPherson and Henderson

It's no surprise that Boyle's wordcloud is so different from the other two.  The influence of Alchemy was still quite strong in 1661, and Boyle's vocabulary reflects this.  It is perhaps more interesting to see how similar Lavoisier is to McPherson and Henderson despite their being 116 years apart.

Robert Boyle and Antoine Lavoisier were both instrumental in developing modern Chemistry, William McPherson was a chemistry professor at Ohio State University and one of the chemistry buildings there is named for him.

I adapted Paolo's code from  Wordclouds using R. You will need the R packages tm, wordcloud and RColorBrewer. I downloaded the books as plain text from Project Gutenberg and saved them in a directory called chemtxt.


#reads all files in the directory chemtxt
chemtexts <- Corpus(DirSource("chemtxt/")) 

book <- Corpus(VectorSource(chemtexts[["boyle.txt"]])) 
book <- tm_map(book, removePunctuation) 
book <- tm_map(book, tolower) 
book <- tm_map(book, stripWhitespace) 
book <- tm_map(book, function(x) removeWords(x, stopwords("english"))) 

# format as a dataframe with words and their frequencies 
book.tdm <- TermDocumentMatrix(book) 
book.m <- as.matrix(book.tdm) 
book.v <- sort(rowSums(book.m),decreasing=TRUE) 
book.d <- data.frame(word = names(book.v),freq=book.v) 

#color scheme 
pal2 <- brewer.pal(8,"Dark2") 

# uncomment this line to save wordcloud as an image file #png("wordcloud_boyle.png", width=600,height=600) 

#I picked just the 60 most frequent words 
# to show ALL the words in the wordcloud use max.words=Inf wordcloud(book.d$word,book.d$freq, scale=c(8,.2),min.freq=3,max.words=60, random.order=FALSE, rot.per=.15, colors=pal2)

Enhanced by Zemanta