Retrieves specified fields for OpenAlex work IDs using the OpenAlex API. Processes data in batches to avoid API rate limits.
Usage
get_openalex_fields(
openalex_ids,
variables = "publication_year",
batch_size = 50,
save_dir = NULL
)Arguments
- openalex_ids
Character vector of OpenAlex work IDs (format: "W1234567890") or a data frame/tibble containing a column named "CR" with OpenAlex IDs. IDs can be semicolon-separated strings which will be split automatically.
- variables
Character vector of variable names to fetch from OpenAlex. Options include: "publication_year", "doi", "type", "source_display_name", or any valid OpenAlex work field. Default is "publication_year".
- batch_size
Number of IDs to process per API call (default: 50). Smaller batches help avoid API rate limits.
- save_dir
Optional path to directory where intermediate results should be saved as RDS files. If NULL (default), no saving occurs. Directory will be created if it doesn't exist.
Value
A tibble with the following columns:
id: The OpenAlex work IDOne column for each requested variable (e.g., "publication_year", "doi", "type")
Rows without valid OpenAlex IDs or where API calls fail will have NA values.
Details
This function:
Accepts either a character vector of IDs or a data frame with a "CR" column
Splits semicolon-separated ID strings into individual IDs
Validates IDs against the pattern "^W\d+$"
Fetches specified variables from OpenAlex API in batches
Optionally saves each batch to disk as it's processed
Handles API errors gracefully with informative messages
Includes delays between batches to respect API rate limits
Note
The OpenAlex API has rate limits. This function implements:
Batch processing to reduce number of API calls
0.5 second delays between batches
Error handling for failed API requests
Progress messages to track execution
Optional disk saving for data persistence
If you encounter rate limiting errors, consider reducing batch_size or implementing longer delays.
Examples
if (FALSE) { # \dontrun{
# From a character vector
ids <- c("W2261389918", "W1548650423", "W1504492735")
result <- get_openalex_fields(ids)
# Fetch multiple variables
result <- get_openalex_fields(
ids,
variables = c("publication_year", "doi", "type")
)
# From a data frame with CR column
oa_data <- data.frame(CR = c("W123;W456", "W789"))
result <- get_openalex_fields(oa_data)
# Save intermediate results while downloading
result <- get_openalex_fields(
ids,
variables = c("publication_year", "source_display_name"),
save_dir = tempdir()
)
} # }
