This function orchestrates the alignment of sequences in a specified directory using MUMmer, a tool for aligning large DNA or protein sequences. It can handle GenBank and FASTA file formats. and performs checks to ensure necessary files are present.
cluster = NULL,
maptype = "many-to-many",
seqtype = "protein",
mummer_options = "",
filter_options = "",
remove_files = TRUE,
output_dir = tempdir()
- path
The directory containing the sequence files.
- cluster
Optional vector of cluster names to consider for alignment. If NULL, clusters are inferred from file names. The order of names determines the alignment sequence.
- maptype
The type of mapping to perform; "many-to-many" or "one-to-one". "many-to-many" allows for multiple matches between clusters, "one-to-one" restricts alignments to unique matches between a pair.
- seqtype
The type of sequences, either "protein" or "nucleotide".
- mummer_options
Additional command line options for MUMmer. To see all available options, you can run `nucmer --help` or `promer --help` in the terminal depending on whether you are aligning nucleotide or protein sequences.
- filter_options
Additional options for filtering MUMmer results. To view all filtering options, run `delta-filter --help` in the terminal.
- remove_files
Logical indicating whether to remove intermediate files generated during the process, defaults to TRUE.
- output_dir
Optional directory for output files; defaults to
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004). Versatile and open software for comparing large genomes. Genome Biology, 5(R12).
if (FALSE) {
# Basic alignment with default options
path = "/path/to/sequences",
maptype = "many-to-many",
seqtype = "protein"
# Alignment with specific MUMmer options
path = "/path/to/sequences",
maptype = "one-to-one",
seqtype = "protein",
mummer_options = "--maxgap=500 --mincluster=100",
filter_options = "-i 90"