Parse datasets exported from OpenAlex in two ways:
(1) a CSV file exported in the browser, or
(2) a data frame obtained via the {openalexR}
API helpers.
The function standardizes fields to common bibliographic tags (e.g., AU
,
SO
, CR
, PY
, DI
) and returns a tidy tibble.
Value
A tibble with standardized bibliographic columns. Typical output includes:
id_short
, AU
, DI
, CR
, SO
, DT
, DE
, AB
, C1
, TC
, SC
, SR
,
PY
, and DB
(source flag: "openalex_csv"
or "openalex_api"
). See Details.
Details
CSV mode (format = "csv"
):
If
file
is a URL, it is downloaded to a temporary file before parsing (a progress message is printed).Selected fields are mapped to standardized tags:
id_short
(short OpenAlex ID),SR
(=id_short
),PY
(=publication_year
),TI
(=title
),DI
(=doi
),DT
(=type
),DE
(=keywords.display_name
),AB
(=abstract
),AU
(=authorships.author.display_name
),SO
(=locations.source.display_name
),C1
(=authorships.countries
),TC
(=cited_by_count
),SC
(=primary_topic.field.display_name
),CR
(=referenced_works
, with thehttps://openalex.org/
prefix stripped), andDB = "openalex_csv"
.PY
is coerced to numeric; a helper columnDI2
(uppercase, punctuation-stripped variant ofDI
) is added; columns with all-caps tags are placed first andDI2
is relocated afterDI
.
API mode (format = "api"
):
file
must be a data frame containing at least columnid
; typically this is returned byopenalexR::oa_request()
+openalexR::oa2df()
or similar.Records are filtered to
type %in% c("article","review")
and deduplicated byid
.The function derives:
id_short
(=id
without thehttps://openalex.org/
prefix) andSR
(=id_short
);CR
: concatenated short IDs fromreferenced_works
(semicolon-separated);DE
: concatenated keyword names (lower case) fromkeywords
;AU
: concatenated author names (upper case) fromauthorships
;plus core fields
PY
(=publication_year
),TC
(=cited_by_count
),TI
(=title
),AB
(=abstract
),DI
(=doi
), andDB = "openalex_api"
.
The result keeps one row per
id
and may include original columns from the input (via a right join), after constructing the standardized fields above.
Supported inputs
format = "csv"
— a local path or an HTTP(S) URL to an OpenAlex CSV export.format = "api"
— a data frame produced by{openalexR}
for the works entity (with the usual OpenAlex columns, including list-columns such askeywords
,authorships
, andreferenced_works
).
See also
OpenAlex R client: oa_request
, oa2df
.
Importers for Web of Science: read_wos
.
Examples
if (FALSE) { # \dontrun{
## CSV export (local path)
x <- read_openalex("~/Downloads/openalex-works.csv", format = "csv")
## CSV export (URL)
x <- read_openalex("http://yoursite/openalex-works-2025-05-28T23-12-11.csv", format = "csv")
## Using the API with openalexR
# install.packages("openalexR")
library(openalexR)
url_api <- "https://api.openalex.org/works?page=1&filter=primary_location.source.id:s121026525"
df_api <- openalexR::oa_request(query_url = url_api) |>
openalexR::oa2df(entity = "works")
y <- read_openalex(df_api, format = "api")
} # }