First, obtain your database token. Instructions on where to obtain these are available here.
Make sure the
dplyr packages are installed. If necessary, install them with
install.packages(c("DBI", "RPostgres", "dbplyr", "dplyr")).
bit.io currently does not support GSSAPI, so you will need to disable it by setting the environment variable
Connect to the database as follows, replacing
BITIO_PASS with your database token. We recommend storing the token in your
.RProfile files and making sure it is excluded from version control, shared notebooks, etc.
We'll use the public 2020 Census Reapportionment data as an example in this guide. To do so, we connect to the database
library(RPostgres) library(DBI) library(dbplyr) library(dplyr) library(ggplot2) Sys.setenv(PGGSSENCMODE="disable") con <- dbConnect(RPostgres::Postgres(), dbname = 'dliden/2020_Census_Reapportionment', # database name host = 'db.beta.bit.io', port = 5432, user = 'GettingStarted', password = BITIO_PASS)
Once we've established a connection with
dbConnect, we can query the database.
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"' d <- DBI::dbGetQuery(con, sql(query))
Running this query will return the
Historical Apportionment table from the
dliden/2020_Census_Reapportionment schema of the
dliden/2020_Census_Reapportionment database as an
R data.frame called
d which can then be manipulated with the full range of
R tools that work on data frames.
dbplyr package provides a way to work with databases using
dplyr syntax. After connecting in the same way as described above, we can query a table and then use
dplyr verbs to subset the table as needed.
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"' d <- data.frame(tbl(con, sql(query)))
As with the approach detailed above, this results in a local data frame.
Instead of getting to an R data frame as quickly as possible, we can use the initial query as a starting point and apply a series of
dplyr subsetting, aggregation, or summarization methods to it. We can even pipe the results to
ggplot2 as if we were working with a local
library(scales) query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"' con %>% tbl(sql(query)) %>% filter(Year > 1940, Name=="California") %>% select(Name, Year, `Resident Population`) %>% ggplot(., aes(x=as.integer(Year), y=as.integer(`Resident Population`))) + geom_line() + xlab("Year") + ylab("Resident Population") + ggtitle("Population of California, 1950-2020") + scale_y_continuous(labels=comma)
We can also use the
in_schema function (in place of a full SQL query) to access a table using
con %>% tbl(in_schema("dliden/2020_Census_Reapportionment", "Historical Apportionment"))
Updated 7 months ago