This function reads protein sequences from the specified FASTA file or all FASTA files within a directory. It specifically looks for metadata in the FASTA headers with key-value pairs separated by an equals sign `=`. For example, from the header '>protein1 [gene=scnD] [protein=ScnD]', it extracts 'gene' as the key and 'scnD' as its value, and similarly for other key-value pairs.
Arguments
- fasta_path
Path to the FASTA file or directory containing FASTA files.
- sequence
Logical; if `TRUE`, the protein sequences are included in the returned data frame.
- keys
An optional vector of strings representing specific keys within the fasta header to retain in the final data frame. If `NULL` (the default), all keys within the specified feature are included.
- file_extension
Extension of the FASTA files to be read from the directory (default is 'fasta').
Examples
if (FALSE) {
# Read sequences from a single FASTA file
sequences_df <- read_fasta("path/to/single_file.fasta")
# Read all sequences from a directory of FASTA files
sequences_df <- read_fasta("path/to/directory/", file_extension = "fa")
# Read sequences and include the protein sequences in the output
sequences_df <- read_fasta("path/to/directory/", sequence = TRUE)
}