Getting Started
First, obtain your database token. Instructions on where to obtain these are available here.
Make sure the RPostgres
, DBI
, dbplyr
, and dplyr
packages are installed. If necessary, install them with install.packages(c("DBI", "RPostgres", "dbplyr", "dplyr"))
.
bit.io currently does not support GSSAPI, so you will need to disable it by setting the environment variable PGGSSENCMODE
to "disable"
.
Connecting to the Database
Connect to the database as follows, replacing BITIO_PASS
with your database token. We recommend storing the token in your .Renviron
or .RProfile
files and making sure it is excluded from version control, shared notebooks, etc.
library(RPostgres)
library(DBI)
library(dbplyr)
library(dplyr)
library(ggplot2)
Sys.setenv(PGGSSENCMODE="disable")
con <- dbConnect(RPostgres::Postgres(), dbname = 'bitdotio',
host = 'db.bit.io',
port = 5432,
user = 'bitdotio',
password = BITIO_PASS)
Heads Up: RPostgres behavior around listing tables in your repo
In short, before running
dbListTables
, you first need to add a repo of interest to yoursearch_path
query <- 'SET search_path to "dliden/2020_Census_Reapportionment"' DBI::dbSendQuery(con, sql(query))
Why? RPostgres dbListTables lists tables for the schemas currently on your
search_path
. A Postgresschema
is a logical grouping of tables. On bit.io, each repo is a schema. Thesearch_path
is a postgres setting that tells postgres where to find tables. So, you need to set yoursearch_path
to a particular repo (i.e., schema) before you'll see meaningful results.
Querying the Database
Once we've established a connection with dbConnect
, we can query the database. We will use the 2020 Census Reapportionment repository as an example in this guide.
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"'
d <- DBI::dbGetQuery(con, sql(query))
Running this query will return the Historical Apportionment
table from the dliden/2020_Census_Reapportionment
repository as an R data.frame
called d
which can then be manipulated with the full range of R
tools that work on data frames.
Working with dbplyr
dbplyr
The dbplyr
package provides a way to work with databases using dplyr
syntax. After connecting in the same way as described above, we can query a table and then use dplyr
verbs to subset the table as needed.
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"'
d <- data.frame(tbl(con, sql(query)))
As with the approach detailed above, this results in a local data frame.
Instead of getting to an R data frame as quickly as possible, we can use the initial query as a starting point and apply a series of dplyr
subsetting, aggregation, or summarization methods to it. We can even pipe the results to ggplot2
as if we were working with a local data.frame
.
library(scales)
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"'
con %>%
tbl(sql(query)) %>%
filter(Year > 1940, Name=="California") %>%
select(Name, Year, `Resident Population`) %>%
ggplot(., aes(x=as.integer(Year), y=as.integer(`Resident Population`))) +
geom_line() +
xlab("Year") + ylab("Resident Population") +
ggtitle("Population of California, 1950-2020") +
scale_y_continuous(labels=comma)


We can also use the in_schema
function (in place of a full SQL query) to access a table using dbplyr
.
con %>%
tbl(in_schema("dliden/2020_Census_Reapportionment",
"Historical Apportionment"))
Updated 8 months ago