First, obtain your database token. Instructions on where to obtain these are available here.
Make sure the
dplyr packages are installed. If necessary, install them with
install.packages(c("DBI", "RPostgres", "dbplyr", "dplyr")).
bit.io currently does not support GSSAPI, so you will need to disable it by setting the environment variable
Connect to the database as follows, replacing
BITIO_PASS with your database token. We recommend storing the token in your
.RProfile files and making sure it is excluded from version control, shared notebooks, etc.
library(RPostgres) library(DBI) library(dbplyr) library(dplyr) library(ggplot2) Sys.setenv(PGGSSENCMODE="disable") con <- dbConnect(RPostgres::Postgres(), dbname = 'bitdotio', host = 'db.bit.io', port = 5432, user = 'bitdotio', password = BITIO_PASS)
Heads Up: RPostgres behavior around listing tables in your repo
In short, before running
dbListTables, you first need to add a repo of interest to your
query <- 'SET search_path to "dliden/2020_Census_Reapportionment"' DBI::dbSendQuery(con, sql(query))
Why? RPostgres dbListTables lists tables for the schemas currently on your
search_path. A Postgres
schemais a logical grouping of tables. On bit.io, each repo is a schema. The
search_pathis a postgres setting that tells postgres where to find tables. So, you need to set your
search_pathto a particular repo (i.e., schema) before you'll see meaningful results.
Once we've established a connection with
dbConnect, we can query the database. We will use the 2020 Census Reapportionment repository as an example in this guide.
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"' d <- DBI::dbGetQuery(con, sql(query))
Running this query will return the
Historical Apportionment table from the
dliden/2020_Census_Reapportionment repository as an
R data.frame called
d which can then be manipulated with the full range of
R tools that work on data frames.
dbplyr package provides a way to work with databases using
dplyr syntax. After connecting in the same way as described above, we can query a table and then use
dplyr verbs to subset the table as needed.
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"' d <- data.frame(tbl(con, sql(query)))
As with the approach detailed above, this results in a local data frame.
Instead of getting to an R data frame as quickly as possible, we can use the initial query as a starting point and apply a series of
dplyr subsetting, aggregation, or summarization methods to it. We can even pipe the results to
ggplot2 as if we were working with a local
library(scales) query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"' con %>% tbl(sql(query)) %>% filter(Year > 1940, Name=="California") %>% select(Name, Year, `Resident Population`) %>% ggplot(., aes(x=as.integer(Year), y=as.integer(`Resident Population`))) + geom_line() + xlab("Year") + ylab("Resident Population") + ggtitle("Population of California, 1950-2020") + scale_y_continuous(labels=comma)
We can also use the
in_schema function (in place of a full SQL query) to access a table using
con %>% tbl(in_schema("dliden/2020_Census_Reapportionment", "Historical Apportionment"))
Updated 11 months ago