This function orchestrates the alignment of sequences in a specified directory using MUMmer, a tool for aligning large DNA or protein sequences. It can handle GenBank and FASTA file formats. and performs checks to ensure necessary files are present.
Usage
mummer_alignment(
path,
cluster = NULL,
maptype = "many-to-many",
seqtype = "protein",
mummer_options = "",
filter_options = "",
remove_files = TRUE,
output_dir = tempdir()
)
Arguments
- path
The directory containing the sequence files.
- cluster
Optional vector of cluster names to consider for alignment. If NULL, clusters are inferred from file names. The order of names determines the alignment sequence.
- maptype
The type of mapping to perform; "many-to-many" or "one-to-one". "many-to-many" allows for multiple matches between clusters, "one-to-one" restricts alignments to unique matches between a pair.
- seqtype
The type of sequences, either "protein" or "nucleotide".
- mummer_options
Additional command line options for MUMmer. To see all available options, you can run `nucmer --help` or `promer --help` in the terminal depending on whether you are aligning nucleotide or protein sequences.
- filter_options
Additional options for filtering MUMmer results. To view all filtering options, run `delta-filter --help` in the terminal.
- remove_files
Logical indicating whether to remove intermediate files generated during the process, defaults to TRUE.
- output_dir
Optional directory for output files; defaults to
tempdir()
References
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004). Versatile and open software for comparing large genomes. Genome Biology, 5(R12).
Examples
if (FALSE) {
# Basic alignment with default options
mummer_alignment(
path = "/path/to/sequences",
maptype = "many-to-many",
seqtype = "protein"
)
# Alignment with specific MUMmer options
mummer_alignment(
path = "/path/to/sequences",
maptype = "one-to-one",
seqtype = "protein",
mummer_options = "--maxgap=500 --mincluster=100",
filter_options = "-i 90"
)
}