Skip to contents

This function orchestrates the alignment of sequences in a specified directory using MUMmer, a tool for aligning large DNA or protein sequences. It can handle GenBank and FASTA file formats. and performs checks to ensure necessary files are present.

Usage

mummer_alignment(
  path,
  cluster = NULL,
  maptype = "many-to-many",
  seqtype = "protein",
  mummer_options = "",
  filter_options = "",
  remove_files = TRUE,
  output_dir = tempdir()
)

Arguments

path

The directory containing the sequence files.

cluster

Optional vector of cluster names to consider for alignment. If NULL, clusters are inferred from file names. The order of names determines the alignment sequence.

maptype

The type of mapping to perform; "many-to-many" or "one-to-one". "many-to-many" allows for multiple matches between clusters, "one-to-one" restricts alignments to unique matches between a pair.

seqtype

The type of sequences, either "protein" or "nucleotide".

mummer_options

Additional command line options for MUMmer. To see all available options, you can run `nucmer --help` or `promer --help` in the terminal depending on whether you are aligning nucleotide or protein sequences.

filter_options

Additional options for filtering MUMmer results. To view all filtering options, run `delta-filter --help` in the terminal.

remove_files

Logical indicating whether to remove intermediate files generated during the process, defaults to TRUE.

output_dir

Optional directory for output files; defaults to tempdir()

Value

A data frame combining all alignment results, or NULL if errors occur during processing.

References

Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004). Versatile and open software for comparing large genomes. Genome Biology, 5(R12).

Examples

if (FALSE) {
# Basic alignment with default options
mummer_alignment(
  path = "/path/to/sequences",
  maptype = "many-to-many",
  seqtype = "protein"
)

# Alignment with specific MUMmer options
mummer_alignment(
  path = "/path/to/sequences",
  maptype = "one-to-one",
  seqtype = "protein",
  mummer_options = "--maxgap=500 --mincluster=100",
  filter_options = "-i 90"
)
}