Skip to contents

This function reads data from a single GenBank file or directorty with GenBank files. It allows selective extraction of information by specifying sections and features.

Usage

read_gbk(path, sections = NULL, features = NULL, origin = TRUE)

Arguments

path

A string representing the file path to the target GenBank (.gbk) file or directory.

sections

An optional vector of strings representing the names of specific sections within the GenBank file to extract (e.g., "LOCUS", "DEFINITION", "ACCESSION", "VERSION"). If `NULL` (the default), the function extracts all available sections.

features

An optional vector of strings indicating specific feature types to extract from the FEATURES section of the GenBank file (e.g., "CDS", "gene", "mRNA"). If `NULL` (the default), the function extracts all feature types present in the FEATURES section.

origin

A boolean flag; when set to `TRUE` (the default), the origin sequence data is included in the output.

Value

A list containing the contents of the specified sections and features of the GenBank file. Each section and feature is returned as a separate list element.

Examples

if (FALSE) {
# Read all data from a GenBank file
gbk_data <- read_gbk("path/to/genbank_file.gbk")

# Read all data from a directory of GenBank files
gbk_data <- read_gbk("path/to/genbank/directory")

# Read only specific sections from a GenBank file
gbk_data <- read_gbk(
  "path/to/genbank_file.gbk",
  sections = c("LOCUS", "DEFINITION")
)

# Read specific features from the FEATURES section of a GenBank file
gbk_data <- read_gbk("path/to/genbank_file.gbk", features = c("gene", "CDS"))

# Read data without the origin sequence
gbk_data <- read_gbk("path/to/genbank_file.gbk", origin = FALSE)
}