Skip to contents

Transforms a date column in a dbplyr table connected to BigQuery by truncating each date to the start of the specified period (week, month, or quarter) and aggregates the data by counting the number of records in each period.

Usage

count_by_period(
  tbl,
  date_col,
  period = "week",
  week_start = "MONDAY",
  date_type = "string"
)

Arguments

tbl

A dbplyr table object connected to BigQuery. Typically obtained using tbl(con, "table_name").

date_col

A quoted or unquoted column name representing the date to be transformed and aggregated. Utilizes tidy evaluation for flexibility.

period

A character string specifying the aggregation period. Acceptable values are "week", "month", and "quarter". Defaults to "week".

week_start

A character string indicating the starting day of the week when period = "week". Acceptable values are "MONDAY" and "SUNDAY". Defaults to "MONDAY". This parameter is ignored when period is "month" or "quarter".

date_type

A character string specifying the data type of the date_col. Acceptable values are "string" and "timestamp". Defaults to "string".

Value

A dbplyr table object containing two columns:

period_start

The start date of the aggregation period (Monday for weeks, the first day of the month for months, and the first day of the quarter for quarters).

num_records

The count of records within each aggregation period.

Details

The count_by_period function is designed to streamline the process of aggregating time-series data by specified periods. It leverages the dplyr and dbplyr packages to translate R code into optimized SQL queries compatible with BigQuery.

Key Features:

  • **Flexible Aggregation:** Supports weekly, monthly, and quarterly aggregations.

  • **Custom Week Start:** Allows specification of the week's starting day (Monday or Sunday) for weekly aggregations.

  • **Tidy Evaluation:** Accepts both quoted and unquoted column names for ease of use.

  • **Performance Optimized:** Designed to work efficiently with large datasets in BigQuery by translating operations into SQL.

Note: Ensure that the date_col contains valid date values to prevent errors during the transformation process. The function automatically filters out NULL dates.

Examples

if (FALSE) { # \dontrun{
library(dplyr)
library(dbplyr)
library(bigrquery)
library(DBI)

# Establish a connection to BigQuery
project_id <- "nih-nci-dceg-connect-stg-5519"
dataset_id <- "FlatConnect"
con <- dbConnect(
  bigrquery::bigquery(),
  project = project_id,
  dataset = dataset_id,
  use_legacy_sql = FALSE
)

# Reference your BigQuery table
your_table <- tbl(con, "FlatConnect.participants_JP")

# Aggregate counts by week
weekly_counts <- count_by_period(
  tbl = your_table,
  date_col = d_471593703,
  period = "week",
  date_type = "string"
)
weekly_counts_local <- weekly_counts %>% collect()
print(weekly_counts_local)

# Aggregate counts by month
monthly_counts <- count_by_period(
  tbl = your_table,
  date_col = d_471593703,
  period = "month",
  date_type = "string"
)
monthly_counts_local <- monthly_counts %>% collect()
print(monthly_counts_local)

# Aggregate counts by quarter
quarterly_counts <- count_by_period(
  tbl = your_table,
  date_col = d_471593703,
  period = "quarter",
  date_type = "string"
)
quarterly_counts_local <- quarterly_counts %>% collect()
print(quarterly_counts_local)

# Aggregate counts by week starting on Sunday
weekly_sunday_counts <- count_by_period(
  tbl = your_table,
  date_col = d_471593703,
  period = "week",
  week_start = "SUNDAY",
  date_type = "string"
)
weekly_sunday_counts_local <- weekly_sunday_counts %>% collect()
print(weekly_sunday_counts_local)
} # }