This dataset contains detailed information on genes involved in the biosynthesis of Erythromycin, an antibiotic produced by the bacterium Saccharopolyspora erythraea and several homologous gene clusters identified by antiSMASH. It includes gene identifiers, chromosomal positions, orientations, and annotations regarding their products and functions, as well as similarity and identity scores from BlastP analysis.
Format
A data frame with 148 observations and 16 variables:
- protein_id
Unique protein identifiers. A character vector.
- region
The chromosomal region of the gene, indicating start and end positions and strand. A character vector.
- translation
Amino acid sequence of the protein encoded by the gene. A character vector.
- cluster
Identifier of the gene cluster to which the gene belongs. A character vector.
- strand
The strand orientation ("forward" or "complement") of the gene. A character vector.
- start
The start position of the gene on the chromosome. A numeric vector.
- end
The end position of the gene on the chromosome. A numeric vector.
- rowID
A unique identifier for each row in the dataset. An integer vector.
- identity
The identity score from BlastP analysis, representing the percentage of identical matches. A numeric vector.
- similarity
The similarity score from BlastP analysis, often reflecting the functional or structural similarity. A numeric vector.
- BlastP
Reference to the protein_id after BlastP comparison, or NA if not applicable. A character vector.
- score
Score assigned based on the BlastP analysis, quantifying the match quality. A numeric vector.
- Gene
Gene name or identifier if available, otherwise NA. A character vector.
- Position
Formatted string indicating the gene's position and orientation on the chromosome. A character vector.
- Product
Description of the gene product. A character vector.
- Functions
Functional categorization of the gene. A character vector.