R Dataset / Package DAAG / spam7
On this R-data statistics page, you will find information about the spam7 data set which pertains to Spam E-mail Data. The spam7 data set is found in the DAAG R package. You can load the spam7 data set in R by issuing the following command at the console data("spam7"). This will load the data into a variable called spam7. If R says the spam7 data set is not found, you can try installing the package by issuing this command install.packages("DAAG") and then attempt to reload the data with the library() command. If you need to download R, you can go to the R project website. You can download a CSV (comma separated values) version of the spam7 R data set. The size of this file is about 101,669 bytes.
Spam E-mail Data
Description
The data consist of 4601 email items, of which 1813 items were identified as spam.
Usage
spam7
Format
This data frame contains the following columns:
- crl.tot
-
total length of words in capitals
- dollar
-
number of occurrences of the \$ symbol
- bang
-
number of occurrences of the ! symbol
- money
-
number of occurrences of the word ‘money’
- n000
-
number of occurrences of the string ‘000’
- make
-
number of occurrences of the word ‘make’
- yesno
-
outcome variable, a factor with levels
n
not spam,y
spam
Source
George Forman, Hewlett-Packard Laboratories
These data are available from the University of California at Irvine Repository of Machine Learning Databases and Domain Theories. The address is: http://www.ics.uci.edu/~Here
Examples
require(rpart) spam.rpart <- rpart(formula = yesno ~ crl.tot + dollar + bang + money + n000 + make, data=spam7) plot(spam.rpart) text(spam.rpart)
Dataset imported from https://www.r-project.org.