This function processes a list of GenBank features (loaded by read_gbk()) and converts selected features into a data frame. It supports processing multiple gene clusters.
Arguments
- gbk_list
A list of lists where each sub-list contains GenBank features for a specific gene cluster. Each sub-list is expected to have a named list of features, with each feature being a character vector.
- feature
A string specifying the feature type to extract from each gene cluster's FEATURE list (e.g., "CDS" or "gene"). Defaults to "CDS".
- keys
An optional vector of strings representing specific keys within the feature to retain in the final data frame. If `NULL` (the default), all keys within the specified feature are included.
- process_region
A boolean flag; when set to `TRUE` (the default), special processing is performed on the 'region' key (if present) to extract 'strand', 'start', and 'end' information.
Value
A data frame where each row corresponds to a feature from the input list. The data frame includes a 'cluster' column indicating the source gbk file.
Examples
if (FALSE) {
gbk <- read_gbk("path/to/genbank_file.gbk")
df <- gbk_features_to_df(gbk)
# To extract only specific keys within the "CDS" feature
df <- gbk_features_to_df(gbk, feature = "CDS", keys = c("gene", "region"))
# To disable special processing of the 'region' key
df <- gbk_features_to_df(gbk, process_region = FALSE)
}