R—RPostgres
Getting Started
First, obtain your database token. Instructions on where to obtain these are available here.
Make sure the RPostgres
, DBI
, dbplyr
, and dplyr
packages are installed. If necessary, install them with install.packages(c("DBI", "RPostgres", "dbplyr", "dplyr"))
.
bit.io currently does not support GSSAPI, so you will need to disable it by setting the environment variable PGGSSENCMODE
to "disable"
.
Connecting to the Database
Connect to the database as follows, replacing BITIO_PASS
with your database token. We recommend storing the token in your .Renviron
or .RProfile
files and making sure it is excluded from version control, shared notebooks, etc.
We'll use the public 2020 Census Reapportionment data as an example in this guide. To do so, we connect to the database dliden/2020_Census_Reapportionment
.
library(RPostgres)
library(DBI)
library(dbplyr)
library(dplyr)
library(ggplot2)
Sys.setenv(PGGSSENCMODE="disable")
con <- dbConnect(RPostgres::Postgres(),
dbname = 'dliden/2020_Census_Reapportionment', # database name
host = 'db.beta.bit.io',
port = 5432,
user = 'GettingStarted',
password = BITIO_PASS)
Querying the Database
Once we've established a connection with dbConnect
, we can query the database.
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"'
d <- DBI::dbGetQuery(con, sql(query))
Running this query will return the Historical Apportionment
table from the dliden/2020_Census_Reapportionment
schema of the dliden/2020_Census_Reapportionment
database as an R data.frame
called d
which can then be manipulated with the full range of R
tools that work on data frames.
Working with dbplyr
dbplyr
The dbplyr
package provides a way to work with databases using dplyr
syntax. After connecting in the same way as described above, we can query a table and then use dplyr
verbs to subset the table as needed.
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"'
d <- data.frame(tbl(con, sql(query)))
As with the approach detailed above, this results in a local data frame.
Instead of getting to an R data frame as quickly as possible, we can use the initial query as a starting point and apply a series of dplyr
subsetting, aggregation, or summarization methods to it. We can even pipe the results to ggplot2
as if we were working with a local data.frame
.
library(scales)
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"'
con %>%
tbl(sql(query)) %>%
filter(Year > 1940, Name=="California") %>%
select(Name, Year, `Resident Population`) %>%
ggplot(., aes(x=as.integer(Year), y=as.integer(`Resident Population`))) +
geom_line() +
xlab("Year") + ylab("Resident Population") +
ggtitle("Population of California, 1950-2020") +
scale_y_continuous(labels=comma)

We can also use the in_schema
function (in place of a full SQL query) to access a table using dbplyr
.
con %>%
tbl(in_schema("dliden/2020_Census_Reapportionment",
"Historical Apportionment"))
Updated 11 months ago