XJU-China iGEM - 大Logo导航栏
snow mountain Loading...

Loading...

Dry Lab
Comparative Analysis of Bacterial Strain Genera and Species

We processed the collected samples and performed 16S sequencing on the purified bacterial strains. For the 16S rRNA gene sequence alignment of bacteria, the 16S rRNA gene sequence of the strain to be tested was aligned with reference databases such as SILVA and Greengenes.

Tools like BLAST or RDP Classifier were used for similarity analysis and taxonomic assignment. After completing the sequence alignment and screening out highly similar reference sequences, the genus and species information of the bacterial strains corresponding to these reference sequences was collected.

Experimental Results

This study comprehensively analyzed the microbial community structure and potential functional gene transmission characteristics of four different environmental samples, namely feces, dairy products, water sources, and soil.

The results showed that different sample types exhibited significant differences in microbial community structure at all taxonomic levels, including phylum, class, order, family, and genus. The samples selected in this study were distributed among 120 genera and 286 species, with a total of 2,000 bacterial strains.

#
Plasmid Sequence Alignment

Before the experiment, reference sequences of target gene families or whole genomes were obtained from public databases (such as NCBI's GenBank, ENA, and UniProt).

These sequences included the nucleic acid or protein sequences of the samples to be analyzed, as well as the known species sequences used as references, for the construction of a local alignment database.

Using the professional sequence analysis software BLAST (Basic Local Alignment Search Tool), the sequences to be analyzed were input into the software, and appropriate alignment algorithms (BLASTn for nucleic acid sequence alignment and BLASTp for protein sequence alignment) and parameter settings were selected—including the expectation value (E-value) threshold and alignment region length—to control the sensitivity and specificity of the alignment.

After running the alignment program, the software generated an alignment result report, which detailed the information of reference sequences with similarity to the sequences to be analyzed, including key indicators such as similarity percentage, alignment coverage, and E-value.

Based on these indicators, reference sequences highly similar to the sequences to be analyzed were screened out, and further analysis was conducted on sequence differences between them, such as base substitutions, insertions, or deletions.

Aligning known plasmid sequences with plasmids of the same genus/species can reveal the conservation and specific differences of core functional modules (e.g., replicons, transfer elements) in their gene composition, identify the distribution and dynamics of functional elements such as resistance genes and mobile elements, clarify evolutionary relationships and evolutionary driving forces (e.g., recombination, selection pressure) through phylogenetic analysis, evaluate host adaptability (e.g., replication stability, transfer ability) and the transmission risk of resistance genes, and provide targets for genetic engineering optimization and clinical drug resistance prevention and control.

This comprehensive approach enables the in-depth interpretation of plasmid functions, evolutionary strategies, and application value.

Experimental Results:

After comparing the sequences of each plasmid, we constructed a phylogenetic tree to explore their functional similarity and characteristics. The results are shown in the figure.

#
Plasmid ORF Prediction

Plasmid ORF (Open Reading Frame) prediction involves scanning DNA sequences using bioinformatics software to identify sequence regions that lie between the start codon and stop codon, have a specific length, and often contain a ribosome-binding site upstream.

The habitats of our bacterial strains are widely distributed, and the strains also exhibit diverse genus and species distributions. Different bacteria adapt to different habitats, and functional diversity is closely related to nucleic acid sequences.

The core significance of the prediction results is to convert abstract DNA sequences into a specific list of functional genes, thereby systematically revealing the replication and partitioning mechanisms of plasmids, adaptive genes such as antibiotic resistance genes, and the composition of mobile elements like conjugative transfer elements.

Ultimately, this clarifies how plasmids stably exist in host bacteria, endow them with new biological traits, and their potential for horizontal transmission.

Experimental Results

The functions of proteins encoded by plasmid sequences that appear multiple times in different plasmids mainly fall into four categories:

The first category is involved in the replication and repair of plasmid DNA, such as sequences encoding replication initiation proteins, plasmid recombination enzymes, DNA repair and recombination proteins, and tRNA ligases.

The second category includes sequences that participate in encoding the host's physiological metabolic activities and energy generation and metabolism. These sequences have functions of encoding triosephosphate isomerases, regulator aspartate phosphatase E, alpha-1,3-glucan synthases, and phosphoglycolate phosphatases.

The third category of encoded proteins mainly functions in substance transport and signal transduction. Examples include sequences encoding maltose/maltodextrin transport system permease protein MalF, proteins of the NRT1/PTR family, and phosphate import ATP-binding proteins.

The fourth category consists of ORFs (Open Reading Frames) that encode proteins with specific functions, such as probable F-box proteins and small heat shock protein C4. F-box proteins are important components involved in the substrate ubiquitination process and participate in the assembly of ubiquitin ligases. In contrast, small heat shock protein C4 is induced to express when cells are exposed to various stresses (such as high temperature, oxidative stress, etc.) and plays a role in protecting cells from stress damage.

From the basic physiological metabolism of bacteria to stress responses, ORFs with different functions are predicted to be involved in these processes. The following are some predicted results of plasmid ORFs.

Plasmid Strain ORF Coding Protein No significant similarity found
pRM 4 2-8-4 ORF9 Putative replication protein 7
ORF8 Kanamycin nucleotidyltransferase
ORF11 Flagellar brake protein YcgR
ORF5 Probable F-box protein
ORF6 DNA replication and repair protein
pRM 6 FH 2-3-8 ORF9 Protein NRT1/ PTR FAMILY 11
ORF12 RNA polymerase II degradation factor 1
ORF10 Plasmid recombination enzyme
ORF1 Maltose/maltodextrin transport system permease protein MalF
pRM 7 M6-66 ORF6 DNA primase TraC 3
ORF1 Plasmid recombination enzyme type 3
ORF5 Major capsid protein
pRM 8 M6-69 ORF3 Methyl-accepting chemotaxis protein III 3
ORF4 Mobilization protein MobL
ORF1 Type II methyltransferase M.AgeI
ORF2 Chromosome partitioning protein ParA
ORF8 Tyrosine recombinase XerD(+ 6310-6876)
ORF9 Transposase InsF for insertion sequence IS3A(+ 346-693)