Executive Summary
Aquarius applied engineering principles to the development and testing of SynBio solutions to three major aquatic “case studies” of focus. Each case study required multiple engineering cycles: We redesigned circuits after initial testing, altered our microcosm construction and sampling methods after preliminary experimentation, and, most importantly, reassessed the feasibility of our engineering solutions based on novel genomic and transcriptomic data generated through microcosm experiments.
Ultimately, our engineering results provide important considerations about the effects of water-associated factors on chassis behavior and survival and inform future engineering for aquatic environments.
I. Marine Corrosion Prevention with Bacillus subtilis
We conducted several rounds of redesign and engineering to construct functional biofilm-enhancing circuits in multiple strains of B. subtilis. We verified successful incorporation of engineered constructs via iodine staining and assessed the corrosion-prevention capabilities of engineered strains through motility assays and microcosm experiments. A large number of genes, including various metabolic and flagellar assembly genes, were upregulated under microcosm conditions, indicating altered chassis functionality. Future engineering should address the effects of realistic aquatic conditions on engineered B. subtilis’s corrosion-prevention capabilities.
II. Freshwater Harmful Algal Bloom (HAB) Remediation with Cyanophage and Acinetobacter baylyi
We developed, conducted, and adapted procedures for cyanophage isolation assays with the goal of engineering novel cyanophages for HAB remediation. We found strong evidence of cyanophage presence on an agar assay, but we were unable to consistently propagate any phage. We concluded that cyanophage isolation may have limited feasibility due to unique characteristics of M. aeruginosa and HAB ecology. We pivoted to testing the HAB remediation potential of the naturally algicidal chassis bacterium A. baylyi. We developed and adapted lakewater microcosms for simulated real-world testing through multiple rounds of experimentation. Colony counts, 16S abundances, and transcriptomic data indicated that A. baylyi’s survival was reduced and behavior altered under realistic conditions. Future engineering should address A. baylyi survival limitations associated with real-world deployment.
III. Household Pipe Biofilm Removal Using Phage Therapy
We designed a targeted phage cocktail for mycobacterial biofilm removal and tested its anti-biofilm effectiveness under laboratory conditions and in simulated pipe-environment microcosms. We developed, 3D-printed, and adapted multiple pipe microcosm systems that realistically simulate elements of plumbing conditions, such as regular water flushing. We identified strategies for effective mycobacterial biofilm growth (for example, we found that biofilms grew more effectively under nonsterile conditions) to inform future mycobacterial microcosm development and experimentation with phage.
Bioinformatics & RNA-Seq Meta-Analysis
We engineered a bioinformatic pipeline for accessing and processing large amounts of raw metagenomic data. We iterated through several design-build-test-learn cycles in order to efficiently extract large amounts of sequence data from NCBI in FASTQ format and run it through Kraken and Braken for taxonomic processing. We used various techniques to tune and train a predictive model for species survival in aquatic environments. Finally, we designed and redesigned methods for our meta-analysis of existing RNAseq data, troubleshooting the time constraints of analyzing massive datasets from scratch by instead querying AI chatbots with careful and highly specific prompt engineering (and thorough manual vetting steps) to obtain a comprehensive analysis.
Aquatic Case Study I: Corrosion Prevention Using Engineered Bacillus subtilis in Marine Environments
1. Plasmid Engineering
Summary: To test the differences in Bacillus subtilis biofilm growth between laboratory conditions and natural seawater environments, and their different ability of preventing corrosion, we constructed vectors for Bacillus subtilis strains to enhance their biofilm-forming capabilities to improve corrosion resistance on steel surfaces.
We selected four genetic targets known to influence biofilm properties:
- SinI – to promote biofilm thickness by alleviating SinR-mediated repression
- BslA – to enhance surface hydrophobicity through the formation of a protective protein layer
- TapA-SipW-TasA operon and TasA – to strengthen biofilm architecture via amyloid fiber production
The experimental strategy involved individual overexpression of each gene to assess their respective contributions to biofilm morphology and corrosion inhibition. Detailed plasmid construct includes:
- A xylose-inducible promoter (xylA) from the 2012 LMU-Munich iGEM team (Radeck et al., 2013)
- A ribosome binding site (RBS) adapted from the SpoVG RBS in the iGEM distribution kit
- Coding sequences (CDSs) for the four target genes, PCR-amplified from B. subtilis genomic DNA
- The B0015 terminator from the iGEM distribution kit
The backbone vector was the pBS1C plasmid (also from the 2012 LMU-Munich iGEM team), which was modified to conform to iGEM Type IIS assembly standards. Each construct carried a single target gene under an inducible promoter for regulated expression. Assembled plasmids were transformed into B. subtilis 168 and integrated into the amyE locus of the genome.
To comply with the iGEM type IIS assembly standard, we first modified the pBS1C plasmid that was originally constructed to fit BioBricks standards. A point mutation was introduced via Gibson assembly to eliminate a BsaI site within the ampicillin resistance region. We then replaced the whole RFP BioBrick segment with a GFP BioBrick from pJump28 using restriction digestion and ligation. This segment includes BsaI sites after the BioBrick prefix (EcoRI, XbaI) and before the suffix (PstI, SpeI), enabling compatibility with both BioBrick and iGEM Type IIS assembly. The GFP reporter gene enables us to visually confirm successful gene insertion. The resulting plasmid was named pBS1C-N. Next, we PCR-amplified xylA and four coding sequences (CDSs), adding BsaI recognition sites and standard fusion sequences according to iGEM Type IIS rules. One CDS—the tapA-sipW-TasA operon—contained two illegal BsaI sites. To resolve this, we split the operon into three fragments near the restriction sites, PCRed the fragments, introduced point mutations to remove the BsaI sites, and added SapI sites with fusion sequences. These fragments were assembled into the iGEM L0 SapI-J04450 vector using SapI-based Golden Gate assembly. Once all parts were prepared, we performed the final Golden Gate assembly using BsaI to insert promoter-RBS-CDS-terminator into pBS1C-N.
Figure: pBS1C-N plasmid design.
The constructed vectors were validated using colony PCR and restriction enzyme digestion. All pBS1C-N constructs passed both tests and were confirmed by sequencing. However, only 2 out of 24 L0 SapI-J04450-tapA-sipW-TasA constructs were successful. Sequencing confirmed correct assembly in these two cases. None of the L2 plasmids passed validation; sequencing revealed missing elements such as promoter, RBS, CDS, or terminator in various combinations.
SapI-based assembly may suffer from low ligation efficiency due to its short three-base-pair fusion sites, resulting in a high possibility of mismatch. Further investigation is needed to determine the causes of failure in the L2 constructs.
Since all sequenced L2 vectors lacked the RBS component, we strongly suspected that the SpoVG RBS obtained from the distribution kit is problematic. This led us to send the SpoVG plasmid from the distribution kit for sequencing and purchased SpoVG as a set of oligonucleotides.
Figure: Constructed L2 pBS1C plasmid, where BslA is the coding region.
We bought SpoVG oligos with BsaI cut sites and fusion sites at both ends. We sequenced the SpoVG plasmid. At the same time, we combined the ordered SpoVG oligos into dsDNA.
The SpoVG RBS in the iGEM distribution kit did not contain the expected sequence noted in the iGEM registry.
We incorporated our ordered SpoVG segment into iGEM Type IIS assembly standard and built the vector. The integration vector was designed to integrate into the AmyE locus in the Bacillus subtilis genome. Upon successful transformation, the vector disrupts the native amylase gene, thereby stopping the strain’s ability to hydrolyze starch. This disruption enables a straightforward functional screening method: colonies with successful genomic integration will fail to degrade starch and appear dark blue when stained with iodine solution, whereas wild-type colonies will be orange due to starch breakdown.
Using the bought SpoVG oligonucleotides, we assembled four coding sequences into the pBS1C-N backbone via Golden Gate assembly. Chemically competent B. subtilis 168 and 3610 cells were prepared for transformation. All four constructed integration vectors were transformed into both strains.
Post-transformation, we verified the constructs using colony PCR, restriction enzyme digestion, and sequencing. Transformation into B. subtilis 168 yielded colonies for all four constructs on selective antibiotic plates, no colonies were obtained from B. subtilis 3610. To further test for genomic integration, we plated the modified B. subtilis 168 strains on starch agar alongside the unmodified B. subtilis 168 and B. subtilis 3610 controls. After 24 hours of incubation, iodine staining revealed a uniformly blue stain in all engineered strains, confirming the loss of starch degradation ability. Further validation was performed using colony PCR with primers specific to the AmyE locus, which confirmed successful integration of the constructs into the genome.
Due to the natural incompetence of B. subtilis 3610, transformation of the engineered vectors into the cells was unsuccessful. The combination of iodine-starch staining and colony PCR provided robust evidence for successful genomic integration in B. subtilis 168. As a result, we obtained four genetically modified strains that were designed to prevent steel corrosion.
2. Morphology analysis
Summary: The natural strain, laboratory strain, and four engineered variants of Bacillus subtilis displayed distinct differences in biofilm morphology and motility. To evaluate their difference in biofilm-forming capacity and overall performance, we systematically examined each strain’s motility, colony morphology, and growth dynamics under various conditions.
Brück et al. (2019) demonstrated that Bacillus subtilis 168 loses its ability to produce extracellular polymeric substances (EPS) during domestication, resulting in decreased biofilm formation. The wild-type strain B. subtilis 3610 retains robust biofilm-forming capabilities. We compared the wild-type strain (B. subtilis 3610), the laboratory strain (B. subtilis 168), and the four engineered variants in terms of motility, growth, and biofilm formation. Our modifications included:
- SinI: Overall upregulates biofilm-related genes and suppresses growth rate
- BslA: Enhances surface hydrophobicity.
- TasA & TapA-SipW-TasA: Contribute to amyloid fibers production in the biofilm.
To evaluate growth dynamics, overnight cultures were subcultured into 96-well plates containing LB medium with xylose at induction levels ranging from 0.001% to 2% (v/v). Growth curves were recorded at 24 and 36 hours using a microplate reader, enabling comparisons across strains and induction conditions.
Motility was assessed using semi-solid agar plates (0.3%, 0.7%, and 1.5% agar) supplemented with 0.2% xylose. Movement on 0.3% agar reflects flagellum-dependent swarming, while diffusion on 0.7% agar indicates EPS-dependent sliding. Spreading diameters were measured after 12 hours.
To evaluate biofilm formation and colony morphology, 3 μL of overnight LB cultures were inoculated onto various media:
- LB (baseline growth)
- LB + 0.03% xylose (non-biofilm-inducing)
- MSgg (biofilm-promoting minimal medium)
- MSgg + surface sprayed 0.03% xylose
- MSgg + 0.03% xylose mixed into agar (non-uniform inducer distribution)
- MSgg + 0.2% xylose (to assess promoter response at higher inducer levels)
Figure: MSgg agar plate supplemented with 0.03% xylose inducer, with each Bacillus subtilis strain directly spotted onto the agar surface.
In non-biofilm-promoting media, varying xylose concentrations had minimal impact on growth rate. The four engineered strains showed similar growth rates to B. subtilis 168, all lower than the wild-type B. subtilis 3610. Only B. subtilis 3610 exhibited both swarming sliding motility. Among engineered strains:
- B. subtilis 168 and B. subtilis 168 SinI variant showed strong swarming.
- B. subtilis 168 BslA and TapA-SipW-TasA variants displayed moderate swarming.
- B. subtilis 168 TasA variant showed neither swarming nor sliding, possibly due to toxicity from TasA overexpression.
B. subtilis 3610 formed a distinct, slightly brown biofilm. Only SinI+ exhibited biofilm-like folds and wavy margins, Bacillus subtilis 168 (lab strain) showed a faint wavy margin, other strains formed smooth colonies without biofilm features. LB medium did not support biofilm formation, it should be used for culture rather than biofilm assays. MSgg with surface-sprinkled inducers led to uneven colony growth due to difference distribution of inducer concentration. Adding 0.03% or 0.2% xylose to MSgg did not substantially affect biofilm development, likely due to insufficient concentration differences to influence promoter activity.
3. Microcosm construction
Summary: To investigate differences between natural and laboratory environments, we constructed marine microcosms that simulate real-world conditions. The lab strain B.subtilis 168, natural strain B. subtilis 3610 and the 4 modified strains were introduced into these systems, and steel chips were placed in to the microcosms assess their corrosion resistance in mimicked natural environment.We performed molecular-level analyses as 16S rRNA sequencing and RNA-seq to explore microbial community dynamics and gene expression profiles associated with biofilm formation and corrosion prevention.
We evaluated the corrosion prevention capabilities of various Bacillus subtilis strains under simulated real-world conditions. Specifically, we compared the performance of the natural strain, laboratory strain, and four genetically modified variants. To achieve this, we constructed microcosms containing steel plates to mimic marine environments.
We recycled 1000 mL pipette tip boxes as microcosm water tanks. To simulate seawater with an average salinity of 35‰, we dissolved 35 g of Instant Ocean salt in 1 L of deionized water. Each microcosm was filled with 200 mL of this artificial seawater.
Seven microcosms were prepared:
- One served as a control with untreated seawater.
- Six were inoculated with the following strains: B. subtilis 3610 (wild-type), B. subtilis 168 (lab strain), B. subtilis SinI+, BslA+, TasA+, and TapA-SipW-TasA+. Each strain was induced with 0.2% xylose, and bacterial biomass was added at 0.5% (v/v) of the seawater volume(1ml).
We used A36 steel—commonly employed in marine applications—and cut it into 1.5 × 1.5 cm chips suitable for SEM analysis. Three chips were placed in each microcosm. We created three additional microcosms by adding 50 mL of overnight bacterial culture (with 0.2% xylose) to 150 mL of seawater, resulting in a 25% (v/v) bacterial concentration. Steel chips were also added to these microcosms. These setups were used for RNA extraction. Based on prior performance, we selected B. subtilis 168 SinI+, BslA+, and 3610 for RNA analysis.
Figure: Microcosms that mimic corrosion on steel in seawater.
Figure: Steel chip in microcosm with 0.5% (1mL) of B.subtilis 168 BslA+ culture added and incubated for a day.
Steel chips were removed from each microcosm after 3 and 7 days. To prepare for imaging, chips were cleaned with acetone and alcohol to remove surface biofilms. We documented their appearance using phone photography and conducted surface analysis via SEM. Simultaneously, we extracted RNA from the three high-concentration microcosms using the NEB RNA extraction kit. The same strains were also cultured in a shaking incubator under laboratory conditions, and RNA was extracted from these cultures. RNA sequencing was performed on all six samples to compare gene expression between natural and laboratory environments. Sequence data were processed through the Galaxy platform: adapters were removed using Trim Galore!, reads were aligned to the appropriate B.subtilis reference genome (NC_000964.3 for B.subtilis 168-based strains and CP020102.1 for B.subtilis 3610) using HISAT2, alignments were sorted with SAMtools, read counts were generated with HTSeq, and differential expression analysis was conducted with DESeq2. We used DAVID(Huang et al., 2009; Sherman et al., 2022) to achieve gene function annotation for the DESeq2 output.Additionally, we pelleted bacteria from both 0.5% and 25% microcosms of B. subtilis 168 SinI+, BslA+, and 3610. We did PCR amplification using 16S primers, and sequenced the results to assess strain survival and microbial population in the microcosms. This analysis helped determine how the 3 strains survive in simulated environments under different concentrations and whether microbial dynamics within the microcosms contributed to the antimicrobial effects observed in SEM imaging.
Visually, the seawater control group, which was not inoculated with B. subtilis, exhibited a thick oxidation layer on the steel surface, while moderate oxidation was observed in the B. subtilis 168 treated samples. Oxidation on the remaining samples was minimal and much less apparent. Scanning electron microscopy (SEM) further revealed extensive corrosion on both the three-day and seven-day control plates, consistent with severe surface degradation. In contrast, the three-day B. subtilis 168 sample displayed distinct corrosion lines interspersed with deposits of irregularly shaped compounds—patterns similar to nano-structured calcium silicate(Johnston et al., 2008). Similar surface features were also present in the three-day B. subtilis 3610 sample and showed up in a region of the seven-day SinI plate. Notably, in other regions of the three-day B. subtilis 3610 plate, as well as on the three-day and seven-day BslA+ and SinI+ samples, we observed a novel surface structure: parallel, unidirectional metallic crystals forming across the steel. These formations differ markedly from previously published SEM images of B. subtilis–related corrosion (Guo et al., 2017; Wang et al., 2020), which were obtained under sterile laboratory conditions. In contrast, our study employed non-sterile microcosms designed to mimic natural marine environments showcasing that these unique crystalline structures may arise from species interactions in aqueous environments.
16S rRNA gene sequencing revealed substantial colonization by non-Bacillus subtilis species across all seawater microcosm samples, indicating that mixed microbial communities developed under non-sterile conditions. Among the three tested strains—wild-type B. subtilis 3610, SinI+, and BslA+—the wild-type strain showed minimal persistence within these communities, whereas both engineered strains displayed moderate survival. Community composition varied strongly with inoculum concentration. In microcosms inoculated with 25% bacterial culture, Ochrobactrum dominated the population and emerged as the most abundant non-Bacillus genus. In contrast, microcosms inoculated with only 0.5% bacterial culture were consistently enriched in Acinetobacter. These results suggest that inoculum concentration not only shapes community composition but may also influence competitive interactions among bacteria communities, providing a potential ecological explanation for the observed differences in corrosion-protection outcomes described in Experiment 4.
RNA sequencing results revealed that sporulation-related genes were downregulated in all three B. subtilis strains under marine microcosm conditions. In contrast, metabolic genes were upregulated across all strains, with the SinI+ and BslA+ variants showing increased expression of genes involved in ATP binding, rRNA binding, and translation. Notably, the BslA+strain exhibited specific upregulation of genes associated with flagellar assembly, suggesting an enhanced motility response to the seawater environment. The RNA sequence strongly supports that strains were behaving differentially in simulated natural environments and lab environments.
Aquatic Case Study II: Freshwater Algal Bloom Remediation
4. Microcystis Cyanophage Isolation for Harmful Algal Bloom Remediation
Summary: Cyanophage, viruses that infect cyanobacteria, are a promising tool for HAB remediation, however, 1) few M. aeruginosa cyanophages have been isolated and 2) cyanophage deployment in real lakewater environments may be challenging. Our engineering process aimed to 1) isolate novel M. aeruginosa cyanophages for use in HAB remediation, 2) test their anti-algal effectiveness in flowing water environments, and 3) identify design principles to inform cyanophage engineering for HAB remediation in water systems.
We developed an initial WetLab experimental pipeline for cyanophage isolation from environmental samples before beginning our experiments.
We collected environmental samples, started cyanophage enrichments, and assayed for cyanophage presence via a traditional “phage plating” method with agar plates. Phage plating typically involves pipetting a mixture of 1 ml bacterial culture, 1 ml potential phage filtrate, and 2 ml warm “top agar” onto an agar plate. The experimenter typically monitors the lawns over time for evidence of plaque formation that could indicate phage infection.
Figure: Environmental sample collection and filtration prior to cyanophage enrichment
M. aeruginosa did not grow when we plated only 2 ml of bacterial culture.
We found that we could produce uniform M. aeruginosa lawns by concentrating an initial 30 ml of healthy bacterial culture into 2 ml via centrifugation prior to plating.
Figure: Cyanophage plating assays
Phage plating was too resource-intensive to perform on a large scale. Given M. aeruginosa’s slow growth rate, we could not generate enough bacteria (30 ml per assay) to conduct phage plating assays in large quantities. To maximize our assay count (and to increase the probability of phage isolation), we began to conduct small liquid assays, which consisted of 4 ml bacterial culture and 1 ml potential phage filtrate, in parallel with plating assays.
Figure: Liquid cyanophage assays
We monitored liquid assays for evidence of bacterial clearing or cell death indicative of phage infection, and we screened our plating assays for cyanophage plaque formation.
We found evidence of cyanophage plaque formation on two out of 65 agar plates and evidence of bacterial clearing in 39 out of 205 liquid assays.
Figure: Putative cyanophage plaque formation on an agar plate.
We collected putative cyanophage lysates through 0.22 um filtration of cleared assays and by “flooding” with cyanophage buffer a plate that displayed evidence of plaques. We tested the re-infection potential of putative lysates by “infecting” new assays with the putative lysates. However, we were unable to consistently re-propagate any phage.
Drs. Steven Wilhelm and Gary LeCleir, University of Tennessee microbiologists with expertise in Microcystis ecology, told us that our results are consistent with a broader (though hidden) trend in the field: Microcystis phages are difficult to isolate and propagate. While our team’s preliminary literature review found 13 reports of successful Microcystis phage isolation, the Wilhelm Lab has contacted each associated research group and found that all but one of the phages (M. aeruginosa phage Ma-LMM01) were eventually “lost” after publication — i.e., they could not be continually propagated by their isolators. Dr. Wilhelm and Dr. LeCleir suggested that M. aeruginosa may evolve phage resistance at the level of a single culture or assay. Without an infectable bacterial population, labs have no means to propagate their stock of phage, which eventually degrades in the absence of a host.
Our results support existing evidence that M. aeruginosa possesses strong anti-phage defense systems—likely including frequent genomic rearrangements and phage immunity provided by viral lysogens—that complicate phage isolation and could limit the feasibility of phage-based HAB treatments in the field. Importantly, about 7% of M. aeruginosa’s genome encodes transposases, which allow for rapid rearrangement of the species’s abundant mobile genetic elements (Frangeul et al., 2008). Frequent genomic rearrangement events may allow M. aeruginosa to rapidly evolve phage resistance in the wild and limit propagation in vitro. Additionally, phages often enter the lysogenic state (integration and latency within the host genome) during Microcystis blooms (Huang et al., 2025). Lysogeny could limit the number of free-floating phages available for isolation, provide bacteria with phage immunity that prevents reinfection, and complicate the in vitro phage propagation process.
Given M. aeruginosa’s strong anti-phage defenses (both in vivo and in vitro) and the difficulty of isolating Microcystis cyanophages and maintaining them in the lab, we concluded that cyanophage are not a viable tool for HAB remediation in the short term. Developing cyanophage for effective remediation will require two approaches:
- Successful isolation of more novel Microcystis phages, which may require a longer experimental time course and larger “n” of assays than we conducted. We have developed a brief guidebook on Microcystis and M. aeruginosa phage isolation to help labs without experience in cyanobacterial cultivation get started.
- Engineering cyanophages to bypass host defense mechanisms, e.g., by modifying lysogeny genes or expanding phage host ranges. (Of course, Microcystis cyanophage engineering requires access to phages that can readily infect the host, so an engineering approach is reliant on successful phage isolation, per the previous goal.)
In the short term, other biological HAB control strategies (e.g., remediation via naturally algicidal bacteria) may be more effective and simpler to develop.
5. Comparison of Acinetobacter baylyi Behavior in Laboratory vs Simulated Real-World Microcosms
Summary: Engineered bacteria—particularly, species that already possess natural algicidal properties—may present a more immediate alternative to phage that is easier to cultivate and optimize for in vivo HAB treatment. We chose to pivot our focus to Acinetobacter baylyi ADP1, a common lab strain with algicidal properties and potential applications for HAB remediation.
We hypothesized that A. baylyi would display lower survival and different patterns of gene expression when tested in realistic water conditions as compared to traditional cocultures. We aimed to 1) identify differences in A. baylyi persistence and functioning in realistic versus in vitro conditions and 2) identify design principles that would allow us to adapt the species for future real-world deployment.
We designed water microcosms to simulate two aspects of a lakewater environment: 1) turbulent water flow and 2) diverse microbial communities.
To simulate water flow, we equipped our microcosms with water pumps that generated consistent bubbling and water movement over the course of the experiment. To simulate microbial diversity, we filled our microcosm with water from Lake Matoaka, a local lake. Control cultures were static and grown in sterile BG-11 media. Microcosms and cocultures were both inoculated with A. baylyi and M. aeruginosa.
Figure: Preliminary lakewater microcosms and control cocultures containing A. baylyi and M. aeruginosa.
Figure: Lake Matoaka, Williamsburg, VA. We used lake water from Lake Matoaka to simulate realistic microbial communities in lake microcosms.
For our preliminary microcosm experiment, we collected, pelletted via centrifugation, and flash froze 4 ml samples from each microcosm and control culture at each timepoint over a 21-day time course. We attempted DNA and RNA extractions using a standard PCI procedure and TRIzol reagent, respectively.
Figure: A bacterial pellet from the preliminary microcosm experiment. Initial pellets were too small for reliable nucleic acid extraction.
The 4 ml sample pellets yielded limited nucleic acid material, but we successfully amplified and sequenced 16S fragments for preliminary species/genus abundance analysis over the experimental time course.
To improve nucleic acid yield, we increased our sample volumes to 15 ml (prior to pelleting) and switched from TRIzol to the NEB Monarch Spin RNA Isolation Kit for RNA extraction. We scaled up the volumes of our control cultures to accommodate for volume loss with the collection of larger sample volumes.
Figure: Preparation and centrifugation of large volumes of M. aeruginosa and A. baylyi prior to microcosm inoculation
Our new methods consistently yielded bright RNA bands. We amplified new 16S fragments for parallel species abundance analysis.
Figure: Sample collection during our second lakewater microcosm experiment.
16S data and colony counts indicated that Acinetobacter baylyi predominated upon initial inoculation of lakewater microcosms. Within a week, the A. baylyi population was largely outcompeted by native lakewater organisms, despite a high initial inoculation density. By comparison, A. baylyi predominated in sterile, static control cocultures over the course of the experiment, suggesting that the microcosms’ water flow and microbial competitors reduced A. baylyi’s survival. Additionally, preliminary analysis of RNA sequencing data indicated that a number of A. baylyi’s genes were significantly differentially expressed between microcosm and control coculture conditions approximately 24 hours post-inoculation. Differential gene expression suggests that specific aspects of A. baylyi’s behavior and functioning are concretely affected by real world conditions.
Our results indicate that A. baylyi is not yet ready for real world anti-HAB deployment. While the species may effectively persist and eliminate Microcystis in cocultures, its survival and functioning may be limited in a lake environment. Synthetic biologists should consider the effects of water flow and microbial community dynamics on chassis survival when adapting A. baylyi and other bacterial chassis for HAB remediation purposes and deployment in water environments.
Aquatic Case Study III: Removal of Biofilms in Household Pipes
6. Phage Cocktail Selection
Summary: To exploit the evolutionary adaptations of bacteriophages developed through their co-evolutionary dynamics with bacterial hosts, we formulated a phage cocktail aimed at maximizing biofilm disruption and bacterial lysis efficiency. The selected phages were chosen based on their genetic characteristics, lytic capabilities, and potential for biofilm interference.
Phage Raid is a member of the A1 subcluster of mycobacteriophages. Phages within this subcluster have demonstrated the ability to inhibit GroEL1, a gene implicated in biofilm maturation (Ojha, 2005). This inhibition occurs via lysogenic integration into the GroEL1 locus, thereby disrupting its function.
Phage CrimD, belonging to subcluster K1, is notable for being the first sequenced phage in its cluster. It possesses robust lytic machinery, including genes encoding for the enzymes LysA and LysB, which target the mycobacterial cell wall. Additionally, K1 phages have been successfully genetically modified using CRISPY-BRED (Bacteriophage Recombineering of Electroporated DNA), a technique that enables precise genome editing. This method holds promise for enhancing lytic efficiency by removing integrase genes to prevent lysogeny.
Phage Neighly, classified within subcluster K3, is genetically similar to Phage Larvae (K5 subcluster), differing by only a single nucleotide. Comparative genomic analysis suggests that Neighly also encodes LysA and LysB. Furthermore, Neighly exhibits high phage titers, reaching concentrations of approximately 10¹³ PFU/mL, indicating strong replication and lytic potential.
Phages were isolated from the William & Mary Phage Lab. High-titer lysates were generated by plating phages on Mycobacterium smegmatis lawns using top agar, followed by flooding with phage buffer and filtration to obtain purified lysate.
Phage efficacy was assessed under two conditions: controlled laboratory settings and simulated environmental microcosms.
Laboratory Assays: Plaque assays were conducted to evaluate lytic activity across a range of phage dilutions. This allowed for quantification of plaque-forming units and assessment of dose-dependent lysis efficiency.
Microcosm Assays: To simulate real-world conditions, phages were introduced into tap water and flushed through PVC pipes colonized with mycobacterial biofilms. The cocktail included all three phages. Efficacy was measured via bacterial titer reduction, sustained phage presence, and microscopic analysis of biofilm degradation.
Phages demonstrated significantly higher lytic efficiency under controlled laboratory conditions compared to PVC pipe microcosms. These findings show the complexity of translating in vitro efficacy to in situ applications and highlight the need for further optimization of phage delivery and stability in real-world settings.
7. Microcosm Construction
Summary: To evaluate bacteriophage-mediated biofilm lysis under conditions that mimic household plumbing, we developed a custom microcosm system using PVC piping, tap water flow, and engineered components.
Figure: Microcosms simulating household pipes with clear PVC pipes and 3D printed adaptors. The PVC pipes have mycobacterial biofilms grown in nonsterile environments.
7a. 3D-Printed Adaptors for Microcosm Assembly
To simulate the hydraulic behavior of household PVC plumbing, we designed 3D-printed adaptors that enable controlled flushing and effluent collection. The adaptors were engineered to slow water exit velocity, allowing for prolonged interaction between phage and biofilm. Computer-Aided Design (CAD) models were created using Fusion360.
Design files were sliced using Bambu Studio and printed with the Bambu Lab X1 Carbon 3D printer. PETG HF filament was selected for its chemical inertness and compatibility with microbial growth environments.
Prototype adaptors were fitted to 2-inch PVC pipe segments and flushed with tap water to assess flow consistency and structural stability. Effluent velocity was calculated to ensure the system mimicked realistic plumbing conditions.
The adaptors successfully regulated water flow and maintained system integrity, validating their use in downstream biofilm lysis experiments.
7b. Single-Species Biofilm Development on PVC
To visualize biofilm formation, we used transparent PVC pipes and a strain of Mycobacterium smegmatis expressing red fluorescent protein (RFP). To decrease microbial competition and promote selective growth, biofilms were cultivated in media supplemented with hygromycin.
Sterile bottles containing 7H9 media and hygromycin were inoculated with M. smegmatis at a 1:20 dilution. PVC segments (~2 inches) were submerged to promote biofilm attachment.
Figure: PVC pipes containing M. smegmatis biofilms grown in hygromycin-containing media to promote single-species biofilm growth.
Following incubation, pipes were transferred to the microcosm adaptors. However, biofilm adhesion to PVC was minimal; most biomass remained suspended in the media and detached during handling.
Sterile conditions did not support sufficient biofilm adhesion to PVC surfaces, limiting the feasibility of flushing-based assays.
Literature suggests that Mycobacterium spp. form more robust biofilms when co-cultured with other microbes (Gomez-Smith et al., 2015). We hypothesized that nonsterile conditions would enhance biofilm adhesion through multispecies interactions and extracellular matrix production.
PVC segments were placed in nonsterile trays containing unsupplemented 7H9 media, allowing natural microbial colonization without antibiotic selection.
Figure: (Left) PVC pipes containing M. smegmatis biofilms grown in a nonsterile environment and with media with no antibiotic selection to promote multispecies biofilm growth. (Right) Single PVC pipe grown in conditions that allow multispecies biofilm formation.
Biofilms cultivated under nonsterile conditions demonstrated strong adhesion and structural integrity, remaining firmly attached to PVC surfaces even after repeated flushing with tap water. To investigate the microbial community structure and dynamics within these complex biofilms, we performed 16S rRNA metagenomic sequencing on samples grown under nonsterile conditions.
Cultivating biofilms under nonsterile conditions significantly enhanced their structural resilience and surface adhesion. In addition, this strategy of biofilm development made them more representative of environmental biofilms and better suited for phage treatment testing. Within these biofilms, distinct color variations—particularly pink (attributed to RFP-transformed Mycobacterium smegmatis) and black—indicates the presence of multiple microbial species and likely symbiotic interactions. The 16s sequencing results suggested that the biofilm largely consisted of acinetobacter, but contained various other species. However, this sequencing strategy is limited to bacterial species and does not offer full insight into the community structure of these biofilms.
Bioinformatics
8. Pipeline Engineering
Summary: Metagenomic samples yield large FASTQ files that contain raw DNA sequence information. Given the immense size and quantity of these files, full local storage and processing were deemed impractical. Our engineering solution was designed to bypass this significant data burden by streamlining the acquisition process; we selectively accessed the necessary raw data and fed it directly into the classifier, eliminating the need for extensive local FASTQ file storage.
8a. Integration of FASTQ file retrieval tool
The NCBI SRA database is already widely used and referenced in SynBio and bioinformatics. We decided to use SRA accession numbers as the identification for each sample to allow integration with literature data and future projects uploaded to various genomic databases.
During development, the ESearch tool was the first step of the pipeline, used for retrieval of FASTQ files from the NCBI SRA database.
The integration of the ESearch tool was then tested using a small subset of retrieved samples which was associated with an SRA identification number.
After a few tests, we realized that the ESearch tool was not suited for FASTQ retrieval, and we removed it. Instead, we moved forward with using the A toolkit, with its fasterq-dump tool, which was ideal for reading in SRA accession numbers and retrieving the associated FASTQ files to be processed in later steps.
8b. Integration of Kraken + Bracken for taxonomic processing
To create a database relating species abundance data (for evaluating community structure) and extracted environmental abiotic metadata, we required a pipeline that transforms metagenomic input into taxonomic abundance output. Kraken2 and Bracken were essential final steps, pulling the necessary taxonomic abundances from the metagenomic DNA sequences.
First, Kraken2 and Bracken were installed from the official Kraken GitHub. Additionally, a Kraken database was installed and used for processing.
After implementation of Kraken2 and Braken, we tested different Kraken reference databases, such as the standard and core_nt databases, aiming to assess mapping success and the associated computational expense.
Following testing, we found that the core_nt was too computationally heavy to realistically implement for every sample and for future user integration. Additionally, the more compact databases, such as the 8gb and 16gb databases, are computationally lighter to work with, but we found these databases to be too sparse to map k-mers successfully. After testing, we decided that the standard database would be the best option for implementation, since the larger database would hinder reproducibility due to computational overhead, and the smaller database would lead to loss of valuable data through an excess of unmapped reads.
9. Predictive Model Tuning
Summary: To ensure AQUIRE achieved maximum predictive accuracy, we employed the Design-Build-test cycle. This iterative engineering approach was essential for model enhancement and tuning, allowing us to systematically compare model accuracy and utilize appropriate classification models best suited for a species.
9a. Assembling three candidate classification models
We initially aimed for a simple machine learning classifier, implementing logistic regression to avoid an overly complex model for predicting species survival based on taxonomic abundance and environmental abiotic variables.
We implemented the predictive model using a Python script and the scikit-learn package. This script leverages a logistic regression algorithm, which we trained using a portion of the known data from our assembled database, AQUERY.
We tested the trained model by running the remainder of the known AQUERY samples through the classification process. Scoring revealed a significant lack of data linearity, suggesting that the simple logistic regression model was insufficient for accurately classifying chassis survival based on abundance and environmental conditions.
Recognizing that logistic regression was not an optimal classifier, we decided to test alternative, more robust models to better assess chassis survival. We elected to implement a Random Forest classifier and XGBoost, both of which are powerful and widely utilized machine learning algorithms.
9b. Assessing model accuracy using evaluation metrics
To properly review model accuracy, we implemented a model evaluation step. The initial choice for this metric was F1 score, to enable a rough comparison between different model parameter configurations.
We included a dedicated post-training step in the Python script, to output F1 scores. This integration allowed standardized and quantitative model evaluation.
Multiple training iterations were performed on different model architectures. The resulting F1 scores were systematically recorded and reviewed to assess each model's initial performance against predefined benchmarks.
Based on the results from the testing phase, we decided to switch from F1 score to an AUC metric (area under the curve). Unlike F1 score, AUC provides a more robust approach for comparing the general performance of different models, primarily because it is threshold-independent, not requiring the selection of an arbitrary classification cutoff like F1 scoring does.
10. RNA-Seq Meta-Analysis
Summary: We conducted an RNA-Seq meta-analysis to find design principles by identifying differentially expressed (DE) genes in chassis organisms across laboratory vs. natural environments. Initially, we used custom bioinformatic pipelines, but this proved too slow and resource-intensive. We then redesigned our process to integrate AI tools (Copilot, Gemini, ChatGPT) with our pipelines. By carefully vetting and curating the AI results, this hybrid approach allowed us to generate a more comprehensive and consistent analysis in significantly less time.
Given that we did not find a comprehensive assessment of differentially expressed (DE) genes and associated pathways in the literature from which we could extract design principles, we sought to generate this as part of our project. We did so by identifying literature and databases that compared chassis gene expression between laboratory conditions and more natural environments, and by identifying the gene lists through our own pipeline.
Our first design iteration focused exclusively on developing our own pipelines for:
- Analysis of transcriptomic data comparing laboratory versus environmental samples
- Analysis of metatranscriptomic data comparing laboratory versus environmental samples
- Functional analysis of differentially expressed genes
If a study already included a DE gene list, we incorporated it into our analysis; if not, we ran the pipeline ourselves.
We built pipelines for:
- Analyzing transcriptomic data using Galaxy and a Python-based pipeline (Fastq → TrimGalore → HiSat2 → htseq-count → DESeq2)
- Analyzing metatranscriptomic data
- Performing functional analysis using DAVID Bioinformatics Resources
We tested the pipelines using numerous datasets from the studies that performed lab and environment comparisons. For each dataset, we evaluated the pipeline’s ability to successfully process raw sequencing data from FASTQ files through to DE outputs, to accurately identify DE genes, and to integrate functional enrichment results from DAVID into interpretable pathway-level insights.
We learned that conducting all of this analysis from scratch—particularly for studies that did not include DE gene lists—was taking an inordinate amount of time, and we were barely scratching the surface of the available data. In addition, while this approach was valuable, it did not capture studies that focused on a small number of genes, such as those utilizing qRT-PCR for more targeted analyses of gene expression differences between field and laboratory conditions.
We therefore redesigned our meta-analysis to include AI tools (Microsoft Copilot, Google Gemini, and OpenAI ChatGPT—both version 4 and ChatGPT+) using specific, targeted prompts and current methods for prompt engineering. Given the limitations of AI (e.g., hallucinations and errors), our redesign also included protocols for vetting AI outputs at each step.
We integrated our own data analyses with the results produced by AI to generate a more comprehensive analysis in a much shorter timeframe.
Integrating results generated by AI agents with those produced by our own pipelines—while consistently applying careful vetting and curation—yielded more complete, current, and consistent results in a fraction of the time.