Data frame
A data frame in R can be used to store a table (two dimension) data structure. Unlike matrix, different columns of a data frame can have different data type (like numeric, character, factor, etc.)
Table of Contents
Create an empty data frame
> df <- data.frame("GeneID"=character(), "Species"=character(),
"TaxonID"=character(), stringsAsFactors=FALSE)
and use for
loop to add values into it
> GeneID = c("g01","g02","g03")
> Species = c("Human","Mouse","Cat")
> TaxonID = c(9606,10090,9685)
> for(i in 1:length()){
+ df[i,] <- c(paste0("GENE_",GeneID[i]),
paste("Species",Species[i]),TaxonID[i])
+ }
> df
GeneID Species TaxonID
1 GENE_g01 Species Human 9606
2 GENE_g02 Species Mouse 10090
3 GENE_g03 Species Cat 9685
Note the difference between paste0() and paste() above!!! Back to Contents
Create a data frame with data
> df <- data.frame("GeneID" = c("g01","g02","g03"),
"Species" = c("Human","Mouse","Cat"),
"TaxonID" = c(9606,10090,9685))
> df
GeneID Species TaxonID
1 g01 Human 9606
2 g02 Mouse 10090
3 g03 Cat 9685
or
> GeneID = c("g01","g02","g03")
> Species = c("Human","Mouse","Cat")
> TaxonID = c(9606,10090,9685)
> df <- data.frame(GeneID,Species,TaxonID)
> df
GeneID Species TaxonID
1 g01 Human 9606
2 g02 Mouse 10090
3 g03 Cat 9685
To check data types in a data frame, use str(dataframe)
> str(df)
'data.frame': 3 obs. of 3 variables:
$ GeneID : Factor w/ 3 levels "g01","g02","g03": 1 2 3
$ Species: Factor w/ 3 levels "Cat","Human",..: 2 3 1
$ TaxonID: num 9606 10090 9685
Variables GeneID
and Species
here have data type of factor, instead of characters. To prevent data.frame() from automatically converting character vector to factor, use stringsAsFactors=FALSE
> df <- data.frame("GeneID" = c("g01","g02","g03"),
"Species" = c("Human","Mouse","Cat"),
"TaxonID" = c(9606,10090,9685),
stringsAsFactors=FALSE)
> str(df)
'data.frame': 3 obs. of 3 variables:
$ GeneID : chr "g01" "g02" "g03"
$ Species: chr "Human" "Mouse" "Cat"
$ TaxonID: num 9606 10090 9685
How to work with a data frame
In next post I write about how to work with a data frame, such as: subset data frame, remove rows and columns, replace values in data frame, etc.