PubMed is an online repository of references and abstracts of publications in the fields of medicine and life sciences. Pubmed is a free resource that is developed and maintained by the National Center for Biotechnology Information (NCBI), at the U.S. National Library of Medicine (NLM), located at the National Institutes of Health (NIH). PubMed homepage is located at the following URL: https://www.ncbi.nlm.nih.gov/pubmed/. Other than its web portal, PubMed can be programmatically queried via the NCBI Entrez E-utilities interface.
easyPubMed is an open-source R interface to
the Entrez Programming Utilities aimed at allowing programmatic
access to PubMed in the R environment. The package is suitable
for downloading large number of records, and includes a collection of
functions to perform basic processing of the Entrez/PubMed
query responses. The library supports either XML or
TXT (“medline”) format. This vignette covers the key
functionalities of easyPubMed
and provides some informative
examples to get started.
This vignette covers easyPubMed
version 3.01.
Compatibility with previous versions of easyPubMed
is NOT
implied nor guaranteed.
easyPubMed
is an open-source software, under the
GPL-3 license and comes with ABSOLUTELY NO WARRANTY. For more questions
about the GPL-3 license terms, see www.gnu.org/licenses.
This R library was written based on the information included in the Entrez Programming Utilities Help manual authored by Eric Sayers, PhD and available on the NCBI Bookshelf (NBK25500).
This R library is NOT endorsed, supported, maintained NOR affiliated with NCBI.
Simplified Pipeline. The process of retrieving and analyzing Pubmed records has been updated and simplified. The revised pipeline includes three steps: 1) submit a query; 2) fetch records; 3) extract information. The corresponding functions are discussed below in this vignette.
Automatic Job splitting into Sub-Queries. The
Entrez server imposes a strict n=10,000 limit to the number of records
that can be programmatically retrieved from a single query. Whenever
possible, the easyPubMed
library automatically attempts to
split queries returning large number of records into lists of smaller,
manageable queries.
The easyPubMed S4 Class. Here we introduce a new
S4 class (easyPubMed
) that was designed to better store and
manage query information, retrieved records and associated meta-data.
This S4 class comes with a series of methods and is aimed at improving
data handling and analysis reproducibility. Raw or processed data can be
obtained from an easyPubMed
object via the appropriate
getter functions (as discussed below).
Additional Parsed Fields. The new
epm_parse()
function supports extraction of additional
information from Pubmed records (compared to previous versions of this R
library). Extracted fields now include mesh_codes,
grant_ids, references and conflict of interest
statements (cois) among others. For more information, see the
examples below.
Compact Output. Unlike previous versions of
easyPubMed
, it is now possible to collapse author
information (i.e., author names) into a single string. This
way, the output (parsed data.frame
) only includes one row
per record.
To install the stable version (2.22) of easyPubMed
from
CRAN, you can run the following line of code:
install.packages("easyPubMed")
A dev version (3.01) of the library is hosted on
GitHub. If you are interested in trying it out, you can install
it using the devtools
library as follows.
devtools::install_github("dami82/easyPubMed")
The first section of the tutorial covers how to use
easyPubMed
(version 3.01) for querying PubMed,
retrieving records from the Entrez History Server, and
analyzing results. The second part of the tutorial provides additional
examples and information about special cases and advanced
operations.
The typical easyPubMed
pipeline is a three-step
process.
Submit a PubMed query via
epm_query()
. This function submits a query to
Entrez/PubMed, reads the server response, and depending on the
anticipated number of results it may automatically split the query into
multiple sub-queries (whenever needed and/or possible). The query string
should be built using standard PubMed syntax, i.e. the user can use the
same tags/keywords (e.g., ‘[AU]’ or ‘[PDAT]’) as the PubMed web portal.
This function returns an easyPubMed
object.
Download PubMed records via
epm_fetch()
. This function reads the information
included in an easyPubMed
object and downloads records from
the Entrez/PubMed server. Records can be retrieved in XML (default) or
Medline formats. By default, raw records are stored in the ‘raw’ slot of
the easyPubMed
object, and can be accessed via the
getEPMRaw()
method. Alternatively, records can be written
to local files.
[Optional] Extract Information from raw XML
records via epm_parse()
. This function is used to
identify XML fields of interest in each record, extract the
corresponding information, and cast them into a structured data.frame.
Results are stored in the ‘data’ slot of the easyPubMed
object, and can be accessed via the getEPMData()
method.
PubMed Query and Record Retrieval
The code below illustrates the typical steps of an
easyPubMed
analysis. All data (raw records as well as
processed data) are stored in the resulting easyPubMed
object. In this example, n=1,597 records were retrieved and processed.
This took about 14 min using a 2-vCPUs, 4Gb-memory machine running on
Ubuntu 20.04. Data parsing (epm_parse()
) is the step taking
the longest time to complete.
# Load library
library(easyPubMed)
# Define Query String
my_query <- '"bladder cancer"[Ti] AND "2018"[PDAT]'
# Submit the Query
epm <- epm_query(my_query)
# Retrieve Records (xml format)
epm <- epm_fetch(epm, format = 'xml')
# Extract Information
epm <- epm_parse(epm)
# All results are stored in an easyPubMed object.
epm
## --- easyPubMed object ---
##
## query_string: <"bladder cancer"[Ti] AND "2018"[PDAT]>
## expected_count: n=1597
## PMID_count: n=1597
## fetch_subjobs: n=1
## fetched_records: n=1597
## record_format: xml
## processed_data: n=1597 (records)
## processed_data: n=1597 (rows)
## easyPubMed_ver: 3.1
Get Meta data
Meta data are attached to each easyPubMed
object and
provide information about the record query job (e.g., number of
expected records; date when the query was performed) as well as
type/format of the downloaded data (e.g., format and encoding
of the raw data). A unique identifier (UID) is also included to track
different objects/query jobs. Meta data can be requested from an
easyPubMed
object via the getEPMMeta
method,
which returns a list.
job_meta <- getEPMMeta(x = epm)
head(job_meta)
## $max_records_per_batch
## [1] 9999
##
## $exp_count
## [1] 1597
##
## $exp_num_of_batches
## [1] 1
##
## $all_records_covered
## [1] TRUE
##
## $exp_missed_records
## [1] 0
##
## $query_date
## [1] "2023-11-22 18:47:31"
Get Raw Records
Raw PubMed records can be obtained from an easyPubMed
(after epm_fetch()
has been completed) via the
getEPMRaw()
method, which returns a named list. Each
element includes one PubMed record. The name of each element corresponds
to its PubMed record identifier (PMID).
raw_records <- getEPMRaw(epm)
# elements are named after the corresponding PMIDs
head(names(raw_records))
## [1] "31572460" "31511849" "31411998" "31411974" "31317867" "31297313"
# elements include raw PubMed records
first_record <- raw_records[[1]]
# Show excerpt (from record #1)
cat(substr(first_record, 1, 1200))
## <MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM"><PMID Version="1">31572460</PMID><DateRevised><Year>2022</Year><Month>04</Month><Day>10</Day></DateRevised><Article PubModel="Print-Electronic"><Journal><ISSN IssnType="Print">1734-1922</ISSN><JournalIssue CitedMedium="Print"><Volume>15</Volume><Issue>5</Issue><PubDate><Year>2019</Year><Month>Sep</Month></PubDate></JournalIssue><Title>Archives of medical science : AMS</Title><ISOAbbreviation>Arch Med Sci</ISOAbbreviation></Journal><ArticleTitle>The effect of miR-124-3p on cell proliferation and apoptosis in bladder cancer by targeting EDNRB.</ArticleTitle><Pagination><StartPage>1154</StartPage><EndPage>1162</EndPage><MedlinePgn>1154-1162</MedlinePgn></Pagination><ELocationID EIdType="doi" ValidYN="Y">10.5114/aoms.2018.78743</ELocationID><Abstract><AbstractText Label="INTRODUCTION" NlmCategory="BACKGROUND">Endothelin receptor type B (EDNRB) is a potential target gene of miR-124-3p, but the association between miR-124-3p and EDNRB has not yet been reported. The aim of this study was to investigate the role of miR-124-3p in bladder cancer (BC) and to determine whether miR-124-3p regulates cell proliferation by targeting EDNRB.</
Get Processed Data
Processed data can be obtained from an easyPubMed
(after
epm_parse()
has been executed) via the
getEPMData()
method. Processed data are returned as a
data.frame. By default, each row corresponds to a PubMed record. This
default behavior can be modified by tuning the
compact_output
and max_authors
arguments (see
section below). The columns/fields extracted include record identifiers,
journal name, publication date, title, abstract, MeSH codes, author
names and affiliations.
proc_data <- getEPMData(epm)
# show an excerpt (first 6 records, selected columns)
slctd_fields <- c('pmid', 'doi', 'jabbrv', 'year', 'month', 'day')
head(proc_data[, slctd_fields])
## pmid doi jabbrv year month day
## 1 31572460 10.5114/aoms.2018.78743 Arch Med Sci 2019 9 1
## 2 31511849 10.21037/biotarget.2018.08.02 Biotarget 2018 9 1
## 3 31411998 10.1016/j.euo.2018.07.009 Eur Urol Oncol 2019 9 1
## 4 31411974 10.1016/j.euo.2018.08.032 Eur Urol Oncol 2019 9 1
## 5 31317867 10.1016/j.euo.2018.11.010 Eur Urol Oncol 2020 6 1
## 6 31297313 10.1016/j.ajur.2018.06.006 Asian J Urol 2019 7 1
A comprehensive list of the fields that are extracted from raw XML records and returned as columns of the processed data object (data.frame) is shown below.
pmid: PubMed Record Unique Reference Number (Identifier).
doi: Digital Object Identifier.
pmc: PubMed Central Unique Reference Number (PMCID) (if available).
journal: Journal Name (full-length).
jabbrv: Journal Name (abbreviation).
lang: Language (e.g., ‘eng’).
year: Publication Date (
month: Publication Date (
day: Publication Date (
title: Record Title.
abstract: Record Abstract.
mesh_codes: Medical Subject Heading Codes (e.g., ‘D001749’).
mesh_terms: Medical Subject Heading Codes (e.g., ‘Urinary Bladder Neoplasms’).
grant_ids: Reference Number of Funding Grants Supporting the Publication (if available/provided).
references: Identifiers (PMIDs or DOIs) of Publications
coi: Conflict of Interest Statement (if available/provided).
authors: List of Author Names. This field is
returned when the compact_output
argument is set to
TRUE
(epm_parse()
function). Otherwise, the
following columns are included: ‘lastname’, ‘forename’, ‘address’,
‘email’.
affiliation: Address and/or Affiliation Associated with the First Author of the Study.
Get Record Identifiers (PMIDs)
The identifiers (PMIDs) of records included in an
easyPubMed
object (after epm_fetch()
has been
executed) can be obtained via the getEPMUilist()
method,
which returns a character vector. PMIDs are automatically detected and
extracted from all downloaded records, independently of the raw record
format.
# Get PMIDs
all_pmids <- getEPMUilist(epm)
# Show excerpt
head(all_pmids)
## [1] "31572460" "31511849" "31411998" "31411974" "31317867" "31297313"
This section includes a few examples of less-common
easyPubMed
pipelines and operations. Please, contact the
package maintainer for additional questions.
The easyPubMed
library comes with two special
Query functions that are designed to address specific
goals:
Query Entrez/PubMed by exact match of a full-length
title (article title): epm_query_by_fulltitle()
.
Execute a PubMed Query by providing a list of record identifiers
(PMIDs): epm_query_by_pmid()
.
These special query functions may replace the first step of the
easyPubMed
pipeline. After the query step has been
completed, record retrieval proceeds as outlined above, i.e.,
via the epm_fetch()
function.
Query by Article Title
It is possible to query PubMed for a record of interest by providing
its full-length title as query string and via the
epm_query_by_fulltitle()
function. This function takes a
string (character vector of length 1) as its fulltitle
argument. The string should NOT include new-line characters
(e.g., ) or multi-spaces, as those may prevent the exact-match
search. These special characters are NOT removed automatically (by
design). You can use regular expressions (e.g.,
gsub()
) to clean a fulltitle
string before
performing the query. An example is shown below.
# Article Title (including new-line chars)
my_title <- "Role of gemcitabine and cisplatin as
neoadjuvant chemotherapy in muscle invasive bladder cancer:
Experience over the last decade."
# Unpolished title string
cat(my_title)
## Role of gemcitabine and cisplatin as
## neoadjuvant chemotherapy in muscle invasive bladder cancer:
## Experience over the last decade.
# Clean the title
my_title <- gsub('[[:space:]]+', ' ', my_title)
# Clean title string
cat(my_title)
## Role of gemcitabine and cisplatin as neoadjuvant chemotherapy in muscle invasive bladder cancer: Experience over the last decade.
# Query and fetch
epm_xmpl_01 <- epm_query_by_fulltitle(fulltitle = my_title)
epm_xmpl_01 <- epm_fetch(epm_xmpl_01)
epm_xmpl_01
## --- easyPubMed object ---
##
## query_string: <"Role of gemcitabine and cisplatin as neoadjuvant chemotherapy in muscle invasive bladder cancer: Experience over the last decade."[Title]>
## expected_count: n=1
## PMID_count: n=1
## fetch_subjobs: n=1
## fetched_records: n=1
## record_format: xml
## processed_data: n=0 (records)
## processed_data: n=0 (rows)
## easyPubMed_ver: 3.1
Query Using a List of PMIDs
The epm_query_by_pmid()
takes a character vector as its
pmids
argument. If a long list of PMIDs is provided
(n>50), the function automatically splits the query job into multiple
50-record sub-jobs. The resulting ‘easyPubMed’ object displays
‘<Custom query (epm_query_by_pmid)>’ as value of the
query_string meta data field. An example is shown below.
my_pmids <- c('31572460', '31511849', '31411998')
epm_xmpl_02 <- epm_query_by_pmid(pmids = my_pmids)
epm_xmpl_02 <- epm_fetch(epm_xmpl_02)
epm_xmpl_02
## --- easyPubMed object ---
##
## query_string: <Custom query (epm_query_by_pmid)>
## expected_count: n=3
## PMID_count: n=3
## fetch_subjobs: n=1
## fetched_records: n=3
## record_format: xml
## processed_data: n=0 (records)
## processed_data: n=0 (rows)
## easyPubMed_ver: 3.1
The epm_fetch()
function supports three different
formats. The default format is xml
. Alternatively, the
medline
and uilist
formats are also supported.
Briefly, the medline
option returns records in plain text
format (see example below). On the contrary, the uilist
format simply requests the identifiers (PMIDs) of all records returned
by a query (no additional record content is retrieved from
Entrez/PubMed). Note that non-XML records cannot be used to extract
record information via epm_parse()
.
# Define Query String
my_query <- '"bladder cancer"[Ti] AND "2018"[PDAT]'
# Submit the Query
epm_xmpl_03 <- epm_query(my_query)
# Retrieve Records (request 'medline' format!)
epm_xmpl_03 <- epm_fetch(epm_xmpl_03, format = 'medline')
# Get records
xmpl_03_raw <- getEPMRaw(epm_xmpl_03)
# Elements are named after the corresponding PMIDs
head(names(xmpl_03_raw))
## [1] "31572460" "31511849" "31411998" "31411974" "31317867" "31297313"
# Elements include raw PubMed records
first_record <- xmpl_03_raw[[1]]
# Show an Excerpt (record n. 12, first 18 lines)
cat(head(first_record, n=20), sep = '\n')
## PMID- 31572460
## OWN - NLM
## STAT- PubMed-not-MEDLINE
## LR - 20220410
## IS - 1734-1922 (Print)
## IS - 1896-9151 (Electronic)
## IS - 1734-1922 (Linking)
## VI - 15
## IP - 5
## DP - 2019 Sep
## TI - The effect of miR-124-3p on cell proliferation and apoptosis in bladder cancer by
## targeting EDNRB.
## PG - 1154-1162
## LID - 10.5114/aoms.2018.78743 [doi]
## AB - INTRODUCTION: Endothelin receptor type B (EDNRB) is a potential target gene of
## miR-124-3p, but the association between miR-124-3p and EDNRB has not yet been
## reported. The aim of this study was to investigate the role of miR-124-3p in
## bladder cancer (BC) and to determine whether miR-124-3p regulates cell
## proliferation by targeting EDNRB. MATERIAL AND METHODS: Bladder cancer tissues
## and cell lines were obtained in order to analyze the miR-124-3p and EDNRB
In easyPubMed
(version 3.01 and later) there are no
dedicated functions for downloading large numbers of records. Large
query jobs are still carried out via the epm_query()
and
epm_fetch()
functions, which will attempt to split a single
query into a list of manageable sub-jobs. An example is shown below.
Briefly, we performed a query that returned n=20,825 records. The job
was automatically split in n=4 sub-jobs, records were downloaded and
parsed. The whole operation took about 3h 28m using a 2-vCPUs,
4Gb-memory machine running on Ubuntu 20.04 (i.e., about 0.6s
per record).
# Define Query String
blca_query <- '"bladder cancer"[Ti] AND ("1980"[PDAT]:"2020"[PDAT])'
# Submit the Query
epm_xmpl_04 <- epm_query(blca_query)
# Retrieve Records (medline format)
epm_xmpl_04 <- epm_fetch(epm_xmpl_04)
# Parse all records
epm_xmpl_04 <- epm_parse(epm_xmpl_04)
# Show Object
epm_xmpl_04
## --- easyPubMed object ---
##
## query_string: <"bladder cancer"[Ti] AND ("1980"[PDAT]:"2020"[PDAT])>
## expected_count: n=20825
## PMID_count: n=20815
## fetch_subjobs: n=4
## fetched_records: n=20815
## record_format: xml
## processed_data: n=20815 (records)
## processed_data: n=20815 (rows)
## easyPubMed_ver: 3.1
Unlike previous versions of easyPubMed
, there are no
dedicated functions to write PubMed records to a local disk. Starting
from easyPubMed
version 3.01, this operation is performed
by tuning the arguments of the epm_fetch()
function and by
setting the write_to_file
to TRUE.
Write Files to the Local Disc.
There are 4 arguments that can be adjusted to fine-tune the behavior
of epm_fetch()
and write PubMed records to local files.
write_to_file
: logical (defaults to
FALSE
). If TRUE
, raw PubMed records are
written to a local disk.
outfile_path
: string pointing to an existing local
directory (defaults to NULL
). This argument is evaluated
only when write_to_file
is set to TRUE
. Files
including the raw PubMed records will be saved at the indicated
location. If NULL
, the current directory is used.
outfile_prefix
: string indicating a prefix used for
the name of files written to the local disk (defaults to
NULL
). If NULL
, a unique prefix (prefix
pattern: easypubmed_job_yyyymmddhhmmss_
) is automatically
generated. Files may be overwritten is a non-unique prefix is
used.
store_contents
: logical (defaults to
TRUE
). This argument is used to control whether a copy of
the raw records should be stored in the current easyPubMed
object (raw
slot). If write_to_file
is set to
TRUE
and store_contents
is set to
FALSE
, raw records are written to disk and no copies are
stored in the current object. This option is recommended for very large
queries and replaces the batch_pubmed_download()
function
(available until version 2.32). If both write_to_file
and
store_contents
are set to TRUE
, records will
be written to the local disk (backup copy) and also stored in the
current easyPubMed
object.
# Define Query String
my_query <- '"bladder cancer"[Ti] AND "2018"[PDAT]'
# Submit the Query
epm_xmpl_05 <- epm_query(my_query)
# Retrieve Records
epm_xmpl_05 <- epm_fetch(epm_xmpl_05, write_to_file = TRUE)
# Check if file exists
dir(pattern = '^easypubmed')
## [1] "easypubmed_job_202311201513_batch_01.txt"
Read Files From the Local Disc.
It is possible to import local files storing raw PubMed records for
further processing via the epm_import_xml()
function. This
function can be used if the following 3 conditions are met:
records were retrieved in the XML format. Other formats (e.g., the “medline” format) are NOT supported;
records were downloaded and written using
easyPubMed
, version 3.0
or later.
Compatibility with data/files downloaded using other tools or former
versions of the easyPubMed
library is NOT
guaranteed.
if multiple files are imported together, all files were written
as part of the same epm_fetch()
job. Note that different
jobs that used the same query are considered as independent
jobs.
Users should feed the epm_import_xml()
function a
character vector of file names (of length >= 1), where each element
indicates a text file to be read and imported.
epm_import_xml()
function will warn
that only a subset of the expected files were imported, and then the
parsing step will continue using all records included in the selected
file(s).# Import XML records from saved file
epm_xmpl_06 <- epm_import_xml(x = 'easypubmed_job_202311201513_batch_01.txt')
# Show Object
epm_xmpl_06
## --- easyPubMed object ---
##
## query_string: <"bladder cancer"[Ti] AND "2018"[PDAT]>
## expected_count: n=1597
## PMID_count: n=1597
## fetch_subjobs: n=1
## fetched_records: n=1597
## record_format: xml
## processed_data: n=0 (records)
## processed_data: n=0 (rows)
## easyPubMed_ver: 3.1
As we outlined above, information can be extracted from raw records
(“xml” format) via the epm_parse()
function. Results
(data.frame
) are stored in the same easyPubMed
object (data
slot) and can be requested via the
getEPMData()
method. Users can adjust the way information
are extracted and formatted from PubMed records by tweaking the
epm_parse()
function arguments. The most important
arguments are discussed below.
Compact vs. extended output.
A new feature of easyPubMed
(ver. 3.01 or later) is the
capacity of tuning the author information extraction process. The
compact_output
and max_authors
arguments can
be adjusted to get the desired behavior.
compact_output
: logical (defaults to
TRUE
). Author names are returned in a compact format
(i.e., author names are collapsed together) when this argument
is set to TRUE
, and each row in the final data.frame
corresponds to a PubMed record. If FALSE
, each row
corresponds to a single author of the publication and the
record-specific data are recycled for all included authors (legacy
approach, this is similar to the typical output of the corresponding
function in easyPubMed
ver. 2.30 and earlier).
When compact_output
is set to FALSE
, the
processed data row number is typically bigger than the number or raw
records that were retrieved. The behavior managed via the
compact_output
argument is a new feature of
easyPubMed
version 3.01 or later.
max_authors
: numeric, maximum number of authors to
retrieve. If this is set to -1, only the last author is extracted. If
this is set to 1, only the first author is returned. If this is set to
2, the first and the last authors are extracted. If this is set to any
other positive number (i), up to the leading (i-1) authors are retrieved
together with the last author. If this is set to a number larger than
the number of authors in a record, all authors are returned. Note that
at least 1 author has to be retrieved, therefore a value of 0 is not
accepted (coerced to -1).
Citations.
The epm_parse()
function can now extract citation
information (if available). This feature was introduced in
easyPubMed
version 3.01. The
max_references
and ref_id_type
arguments can
be adjusted to obtain information in the desired format.
max_references
: numeric, maximum number of
references to return (from each PubMed record).
ref_id_type
: string, must be one of the following
values: c('pmid', 'doi')
. Type of identifier used to
describe citation references.
In the example below, n=1,597 records were retrieved and processed. This took less about 7 min using a 2-vCPUs, 4Gb-memory machine running on Ubuntu 20.04.
my_query <- '"bladder cancer"[Ti] AND "2018"[PDAT]'
# Submit the Query
epm_xmpl_07 <- epm_query(my_query)
# Retrieve Records
epm_xmpl_07 <- epm_fetch(epm_xmpl_07)
# Parse (custom params)
epm_xmpl_07 <- epm_parse(epm_xmpl_07,
max_authors = 3, compact_output = TRUE,
max_references = 5, ref_id_type = 'pmid')
# Request parsed data
epm_data <- getEPMData(epm_xmpl_07)
# Columns of interest
cols_of_int <- c('pmid', 'doi', 'authors', 'jabbrv', 'year', 'references')
# Show an excerpt
head(epm_data[, cols_of_int])
## pmid doi
## 1 31572460 10.5114/aoms.2018.78743
## 2 31511849 10.21037/biotarget.2018.08.02
## 3 31411998 10.1016/j.euo.2018.07.009
## 4 31411974 10.1016/j.euo.2018.08.032
## 5 31317867 10.1016/j.euo.2018.11.010
## 6 31297313 10.1016/j.ajur.2018.06.006
## authors
## 1 Weijin Fu, Xiaoyun Wu, ..., Hua Mi
## 2 Danielle Scheunemann, Anjan K Pradhan, ..., Paul B Fisher
## 3 Jinhai Huo, Mohamed D Ray-Zack, ..., Stephen B Williams
## 4 Marco Racioppi, Luca Di Gianfrancesco, ..., PierFrancesco Bassi
## 5 Zhoobin H Bateni, Shane M Pearce, ..., Siamak Daneshmand
## 6 Sunny Goel, Rahul J Sinha, ..., Vishwajeet Singh
## jabbrv year references
## 1 Arch Med Sci 2019 25651787; 19460053; 17158539; 12944571; 24373477
## 2 Biotarget 2018 28882224; 27940575; 27915480; 23269072; 25494295
## 3 Eur Urol Oncol 2019 28055103; 23584347; 27375033; 28456635; 21502557
## 4 Eur Urol Oncol 2019 NA
## 5 Eur Urol Oncol 2020 16518661; 24373477; 22917985; 27026309; 22543204
## 6 Asian J Urol 2019 21454009; 11157016; 12944571; 21502557; 15939524
The new pipeline proposed in easyPubMed
version 3.01
replaces the old pipeline based on the following functions:
get_pubmed_ids()
, fetch_pubmed_data()
, and
table_articles_byAuth()
. We are planning to phase out these
functions by the end of 2024.
The current version of easyPubMed
still includes
revised versions of the get_pubmed_ids()
,
fetch_pubmed_data()
, and
table_articles_byAuth()
for legacy purposes. The new
versions of these functions should return output that is compatible with
old versions of easyPubMed
(with the exception of
get_pubmed_ids()
). Other functions (e.g.,
batch_pubmed_download ()
or
fetch_all_pubmed_ids()
) have been discontinued and/or
replaced as outlined below.
Function Replacement Map
article_to_df()
->
epm_parse_record()
articles_to_list()
->
discontinued
batch_pubmed_download()
->
epm_fetch()
[write_to_file = TRUE
]
custom_grep()
-> EPM_custom_grep()
[not exported]
fetch_all_pubmed_ids()
->
epm_fetch()
[format = 'uilist'
]
fetch_pubmed_data()
->
epm_fetch()
get_pubmed_ids_by_fulltitle()
->
epm_query_by_fulltitle()
get_pubmed_ids()
->
epm_query()
table_articles_byAuth()
->
epm_parse()
and then getEPMData()
trim_address()
-> discontinued
fetch_PMID_data()
->
epm_query_by_pmid()
and then
epm_fetch()
extract_article_ids()
->
EPM_detect_pmid()
[not exported]
fetch_pubmed_data_by_PMID()
->
epm_query_by_pmid()
and then
epm_fetch()
Dev version of easyPubMed
on
GitHub Website
Additional Resources and Tutorials will be made available in early 2024.
easyPubMed official website including news, vignettes, and further information https://www.data-pulse.com/dev_site/easypubmed/
Sayers, E. A General Introduction to the E-utilities (NCBI) https://www.ncbi.nlm.nih.gov/books/NBK25497/
PubMed Help (NCBI) https://www.ncbi.nlm.nih.gov/books/NBK3827/
Thank you very much for using easyPubMed and/or reading this vignette. Please, feel free to contact me (author/maintainer) for feedback, questions and suggestions: my email is <damiano.fantini(at)gmail(dot)com>.
If you use easyPubMed in a scientific publication, please cite this R package in the Materials and Methods section of the paper. Thanks!
I may be open to collaborations. If you have an idea you would like to discuss or develop based on what you read in this vignette, feel free to contact me via email.
easyPubMed Copyright (C) 2017-2023 Damiano Fantini. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] easyPubMed_3.01
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.29 R6_2.5.1 lifecycle_1.0.3 jsonlite_1.8.4
## [5] magrittr_2.0.3 evaluate_0.16 stringi_1.7.8 cachem_1.0.6
## [9] rlang_1.1.1 cli_3.6.1 rstudioapi_0.13 jquerylib_0.1.4
## [13] bslib_0.4.0 vctrs_0.6.2 rmarkdown_2.14 tools_4.2.1
## [17] stringr_1.5.0 glue_1.6.2 xfun_0.32 yaml_2.3.5
## [21] fastmap_1.1.0 compiler_4.2.1 htmltools_0.5.5 knitr_1.39
## [25] sass_0.4.2
Success! - by Damiano Fantini - Nov 26, 2023.