The bit.io Developer Hub

Welcome to the bit.io developer hub. You'll find comprehensive guides and documentation to help you start working with bit.io as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Connecting via R with RPostgres

Getting Started

First, obtain your database token. Instructions on where to obtain these are available here.

Make sure the RPostgres, DBI, dbplyr, and dplyr packages are installed. If necessary, install them with install.packages(c("DBI", "RPostgres", "dbplyr", "dplyr")).

bit.io currently does not support GSSAPI, so you will need to disable it by setting the environment variable PGGSSENCMODE to "disable".

Connecting to the Database

Connect to the database as follows, replacing BITIO_PASS with your database token. We recommend storing the token in your .Renviron or .RProfile files and making sure it is excluded from version control, shared notebooks, etc.

library(RPostgres)
library(DBI)
library(dbplyr)
library(dplyr)
library(ggplot2)

Sys.setenv(PGGSSENCMODE="disable")

con <- dbConnect(RPostgres::Postgres(), dbname = 'bitdotio', 
                 host = 'db.bit.io',
                 port = 5432,
                 user = 'bitdotio',
                 password = BITIO_PASS)

📘

Heads Up: RPostgres behavior around listing tables in your repo

In short, before running dbListTables, you first need to add a repo of interest to your search_path

query <- 'SET search_path to "dliden/2020_Census_Reapportionment"'
DBI::dbSendQuery(con, sql(query))

Why? RPostgres dbListTables lists tables for the schemas currently on your search_path. A Postgres schema is a logical grouping of tables. On bit.io, each repo is a schema. The search_path is a postgres setting that tells postgres where to find tables. So, you need to set your search_path to a particular repo (i.e., schema) before you'll see meaningful results.

Querying the Database

Once we've established a connection with dbConnect, we can query the database. We will use the 2020 Census Reapportionment repository as an example in this guide.

query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"'
d <- DBI::dbGetQuery(con, sql(query))

Running this query will return the Historical Apportionment table from the dliden/2020_Census_Reapportionment repository as an R data.frame called d which can then be manipulated with the full range of R tools that work on data frames.

Working with dbplyr

The dbplyr package provides a way to work with databases using dplyr syntax. After connecting in the same way as described above, we can query a table and then use dplyr verbs to subset the table as needed.

query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"'
d <- data.frame(tbl(con, sql(query)))

As with the approach detailed above, this results in a local data frame.

Instead of getting to an R data frame as quickly as possible, we can use the initial query as a starting point and apply a series of dplyr subsetting, aggregation, or summarization methods to it. We can even pipe the results to ggplot2 as if we were working with a local data.frame.

library(scales)
query <- 'SELECT * FROM "dliden/2020_Census_Reapportionment"."Historical Apportionment"'
con %>%
  tbl(sql(query)) %>%
  filter(Year > 1940, Name=="California") %>%
  select(Name, Year, `Resident Population`) %>%
  ggplot(., aes(x=as.integer(Year), y=as.integer(`Resident Population`))) +
  geom_line() +
  xlab("Year") + ylab("Resident Population") +
  ggtitle("Population of California, 1950-2020") +
  scale_y_continuous(labels=comma)

We can also use the in_schema function (in place of a full SQL query) to access a table using dbplyr.

con %>%
  tbl(in_schema("dliden/2020_Census_Reapportionment",
                "Historical Apportionment"))

Updated 12 days ago


Connecting via R with RPostgres


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.