1.str() In many languages, other types can be converted into strings, but the data type is returned in R.
data(iris) str(iris)
Return:'data.frame': 100 obs. of 5 variables: $ Sepal.Length: num 7 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 ... $ Sepal.Width: num 3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.4 2.9 2.7 ... $ Petal.Length: num 4.7 4.5 4.9 4 4.6 4.5 4.7 3.3 4.6 3.9 ... $ Petal.Width: num 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1 1.3 1.4 ... $ Species: Factor w/2 levels " versicolor","virginica": 1 1 1 1 1 1 1 1 1 1 ...
2. Read data through the link
site <- "http://random.org/integers/" # This is a website that generates random numbers # Generate two columns of 10 rows of random numbers, the minimum value is 100, the maximum value is 200 query <- "num=10&min=100&max=200&col=2&base=10&format=plain&rnd=new" txt <- paste(site, query, sep="?") # URL nums <- read.table(file=txt) # read
3. Backticks
df <- data.frame(x=rnorm(5),y=runif(5)) names(df) <- 1:2
Take the first column, if so, an error will be reported:
df$1
Report an "error: unexpected numeric constant in "df$1"" error.
But this works:
df$`1`
There will also be backticks when the tab key is prompted after df$.
It is also possible to do linear regression:
lm(`2`~`1`,data=df)
4. Operations on the data frame
New data:
sales <- expand.grid(country = c('USA','UK','FR'), product = c(1, 2, 3)) sales$revenue <- rnorm(dim(sales)[1], mean=100, sd=10)
Transform to increase the column:
usd2eur <- 1.5 transform(sales, euro = revenue * usd2eur)
5. Cut/table cut can divide the data into the desired interval:
irisSL <- iris$Sepal.Length # Divide into five bins cut(irisSL, 5) # It can also be divided according to the range we want cut(irisSL, breaks = seq(1,8,1))
You can use the table to count the number of each range:
table(cut(irisSL, 5))
Returns: (4.9,5.5] (5.5,6.1] (6.1,6.7] (6.7,7.3] (7.3,7.9] 12 33 35 13 7
Welcome to pay attention~
Shengxin programming daily