Skip to contents

This function processes a list of GenBank features (loaded by read_gbk()) and converts selected features into a data frame. It supports processing multiple gene clusters.

Usage

gbk_features_to_df(
  gbk_list,
  feature = "CDS",
  keys = NULL,
  process_region = TRUE
)

Arguments

gbk_list

A list of lists where each sub-list contains GenBank features for a specific gene cluster. Each sub-list is expected to have a named list of features, with each feature being a character vector.

feature

A string specifying the feature type to extract from each gene cluster's FEATURE list (e.g., "CDS" or "gene"). Defaults to "CDS".

keys

An optional vector of strings representing specific keys within the feature to retain in the final data frame. If `NULL` (the default), all keys within the specified feature are included.

process_region

A boolean flag; when set to `TRUE` (the default), special processing is performed on the 'region' key (if present) to extract 'strand', 'start', and 'end' information.

Value

A data frame where each row corresponds to a feature from the input list. The data frame includes a 'cluster' column indicating the source gbk file.

Examples

if (FALSE) {
gbk <- read_gbk("path/to/genbank_file.gbk")
df <- gbk_features_to_df(gbk)

# To extract only specific keys within the "CDS" feature
df <- gbk_features_to_df(gbk, feature = "CDS", keys = c("gene", "region"))

# To disable special processing of the 'region' key
df <- gbk_features_to_df(gbk, process_region = FALSE)
}