- Isolated marine yeasts from mangrove samples in Hainan. After YPD activation and YNB pre-cultivation (28°C, 180 rpm, 12 h), the strains were transferred to YPM methanol medium (2% peptone, 1% yeast extract, 2% methanol v/v) for cultivation (28°C, 180 rpm, 24 h).
- Confirmed the melanin-producing Aureobasidium pullulans strain through morphological observation and sequencing verification, named P16.
- Simultaneously set up control experiments: compared growth differences between P16, non-methylotrophic Saccharomyces cerevisiae, and methylotrophic Komagataella pastoris X33. Based on initial growth and methanol utilization screening, selected P16 as a candidate chassis for methanol metabolism.
- The screened Aureobasidium pullulans strain P16 was sent to Tsingke Biotechnology Co., Ltd. for whole-genome sequencing, assembly, and annotation. Results are expected within the experimental cycle and will be used for downstream pathway prediction.
- Plan: perform KEGG-based pathway prediction for methanol metabolism after receiving the sequencing and annotation results.
- Based on P16 genome annotation, performed KEGG alignment to identify candidate methanol metabolism genes. Key enzyme genes identified include:
- Alcohol oxidase (AOX)
- Dihydroxyacetone synthase (DAS)
- Dihydroxyacetone kinase (DAK)
- Focused on predicting potential subcellular localization signals (e.g., peroxisomal targeting sequences) for the protein products of these genes to guide subsequent localization experiments.
- Constructed AOX–GFP fusion expression vector.
- Constructed peroxisome marker vector mCherry–SKL (classic PTS signal).
- Co-transformed both constructs into P16 by electroporation.
- Fluorescence microscopy showed complete overlap of GFP signal with mCherry–SKL, confirming AOX localization in peroxisomes.
- UniProt localization prediction suggested DAS may distribute between peroxisomes and cytoplasm. To test readthrough-dependent localization, we designed three GFP fusion constructs:
- Das: sequence including up to the second stop codon (simulates natural readthrough).
- Dascyt: sequence including only up to the first stop codon (no PTS).
- Daspex: mutate the first stop codon to force readthrough and reveal downstream PTS.
- Co-transformed with mCherry–SKL into P16. Fluorescence imaging showed diffuse cytoplasmic GFP for all three constructs with no overlap with the peroxisome marker, so readthrough-dependent peroxisomal targeting for DAS was not supported by these experiments.
- Constructed three DAK fusion vectors: Dak (up to second stop codon), Dakcyt (up to first stop codon), and Dakpex (first stop codon mutated to force readthrough).
- After electroporation into P16 and fluorescence observation:
- Dak showed partial overlap with peroxisomes.
- Dakcyt exhibited cytoplasmic fluorescence only.
- Dakpex fully overlapped with peroxisomes.
- Conclusion: DAK contains a cryptic peroxisomal targeting signal that can be revealed by translational readthrough.
- Constructed and verified strains overexpressing heterologous DAS gene variants according to the designed vectors (Das/Dascyt/Daspex). See protocol files for vector maps and sequences.
- Subsequent phenotypic and localization assays planned to assess functional complementation and subcellular targeting in P16 background.
- Source of the overexpressed gene: Das gene from the model methylotrophic yeast Pichia pastoris (PpDas).
- Vector construction: used a strong promoter to drive the fusion expression of PpDas–GFP, electroporated into P16, and obtained positive transformants by resistance screening.
- The OD600 of the PpDas-overexpressing strain at 48 h was 1.46 ± 0.01, which was 12% higher than that of the wild-type strain (1.29 ± 0.03).
- Initial strategy: "Enhancing the synthetic pathway" — introducing the yihx gene from Escherichia coli.
- Vector elements: a strong promoter driving the fusion of yihx–GFP, with a resistance screening marker to ensure intracellular expression and selection.
- Entrusted to Tsingke Biotechnology Co., Ltd. for codon optimization based on P16 codon preference to improve heterologous expression efficiency.
- Vector verification: transformed into E. coli DH5α and confirmed correctness by colony PCR and Sanger sequencing.
- Electroporated the verified yihx expression vector into P16, screened positive clones on resistance plates, and designated the strain Yihx.
- Verification method: PCR amplification of the yihx gene from selected transformants.
- Culture conditions: YPM medium with 20 g/L methanol, initial OD600=0.01, 28°C, 200 rpm, fermentation 48 h.
- Result: glucose production of the Yihx strain was 0.042 g/L, not significantly different from wild-type (0.040 g/L) (P > 0.05).
- Optimization strategy: "Weakening the consumption pathway + enhancing the synthetic pathway" — knock out the glycolytic rate-limiting enzyme gene pfk in the Yihx background.
- Based on the CRISPR-Cas9-Am system, designed sgRNA targeting pfk (used CRISPOR to minimize off-target risk).
- Vector backbone: contains Cas9, AMA1 autonomous replication sequence, and NAT resistance marker.
- Built the knockout vector by single-enzyme digestion (NotI) of the Cas9–NAT–AMA1 backbone and seamless cloning to insert the pfk-sgRNA cassette.
- Verification: transformed into DH5α, confirmed sgRNA insertion and vector integrity by sequencing.
- Introduced the pfk knockout vector into the Yihx strain by protoplast transformation and screened on nourseothricin plates with a 50%→100% concentration gradient re-screening.
- Identification: PCR amplification of pfk showed the knockout strain lacked the target band; sequencing confirmed the deletion at the expected locus.
- Test conditions: same as the earlier glucose assay.
- Key result: glucose production of yihx–Δpfk was 0.844 g/L, over 20× higher than wild-type (0.040 g/L) (P < 0.01).
- Element selection: ARS autonomous replication sequence (from S. cerevisiae), gpdA promoter (from Aspergillus spp., driving Cas9), NAT selection marker, CYC1 terminator, and a U6-sgRNA cassette.
- Construction method: seamless cloning to assemble ~8 kb vector.
- Transformation target: P16 (electroporation).
- Failure phenotype: no resistant/red colonies recovered; suspected causes: poor ARS adaptability in non-model fungus P16 and low transcription efficiency of gpdA promoter in this host.
- Element replacement: ARS → AMA1 (from Aspergillus flavus, more widely adaptable), gpdA → TEF1 promoter (strong fungal promoter to improve Cas9 expression).
- Retained elements: NAT selection marker, CYC1 terminator, and the U6-sgRNA framework (verified to be effective in the cassette).
- Key steps:
- 1. Double-enzyme digestion of the p414-TEF1p-Cas9-CYC1t vector with SwaI/Kpn (37°C, 4 h) and recovery of the 7087 bp backbone.
- 2. Using fl4a-nat-loxp as the template, PCR amplification of the NAT expression cassette (primers NAT-oF/NAT-oR, with a PacI site appended to the R primer).
- 3. Single-enzyme digestion of the Cas9-NAT vector with PacI, and insertion of the AMA1 fragment (PCR-amplified from the Aspergillus flavus genome; primers contain NotI sites for downstream cloning).
- 4. Digestion of Cas9-NAT-AMA1 with NotI, and insertion of the gene-synthesized U6-sgRNA-scaffold-U6 fragment.
- Vector verification: confirmed the direction and integrity of each element through restriction enzyme digestion (SwaI/PacI double-enzyme digestion) and Sanger sequencing.
- Target gene: Ade2 (key gene for adenine synthesis; loss-of-function results in red colonies, facilitating visual screening).
- Transformation conditions: protoplast transformation, recovery and culture at 28°C for 48 h on nourseothricin-containing plates.
- Screening results: red colonies appeared on 100% of the nourseothricin-resistant plates. Five re-screened positive strains were selected for sequencing verification; Ade2 target mutations were detected in all five, with an observed editing efficiency between 50% and 80% depending on the plate/replicate.
- Control purpose: compare knockout efficiency between CRISPR-Cas9 and traditional homologous recombination.
- Homology arm design: 1 kb upstream and 1 kb downstream arms flanking Ade2, with the NAT marker included in the replacement cassette to enable selection.
- Construction method: overlap PCR amplification of the upstream and downstream homology arms and the NAT fragment, followed by seamless cloning into the pUC19 backbone.
- Verification: confirmed correct assembly by colony PCR and Sanger sequencing of the homology-arm–NAT junctions.
- Detection target: screened 48 homologous recombination transformants by colony PCR.
- Result: only 3 strains amplified the expected positive band, giving a positive rate of 6.25% — markedly lower than the 50%+ editing efficiency observed with CRISPR-Cas9 in parallel experiments.
- Electrophoretogram: the positive band size matched the theoretical value (~2.5 kb); negative strains showed the wild-type band (~1.8 kb). (Raw gel images and sequencing traces archived in lab records.)
- Froze the weights of the localized SL-AttnESM-150M, and established a "single-residue-token" inference pipeline for efficient per-residue attention extraction.
- Completed automatic download of the P16 strain proteome, performed CD-HIT deduplication, and constructed a marine yeast-specific evaluation database to ensure domain-specific benchmarking.
- Initiated GPU-parallel folding on ~3,800 sequences (length range 40–1,200 aa) using ESMFold-v1.
- Output included per-residue pLDDT scores and secondary structure annotations; generated a structural prior vector database (64-dimensional feature vectors) for downstream model inputs.
- Fine-tuned the structure-aware attention pooling using 0.3M reviewed localized proteins as training data, with 4-fold cross-validation over 20 epochs.
- Loss: weighted BCE + MCC combined loss; achieved macro-F1 = 0.893.
- Enabled Linformer acceleration; measured inference time per protein on a single H100 card: < 0.35 s.
- Released Docker image SL-AttnESM v1.0 (sha256: 4f7a2c91).
- External interface: FASTA text in → JSON probabilities for 10 eukaryotic compartments; local offline deployment completed on in-group workstation for cyclic batch calls.
- Integrated SignalP-6, NetGPI-3, and TMHMM-2 as parallel calling modules and packaged them into a "signal harvester" container.
- Measured average response time: 1.2 s per sequence; merged outputs with SL-AttnESM into a unified JSON schema v0.2.
- Launched core orchestration named LocAgent, using LangChain + Llama-3-8B as the scheduler for task planning.
- Defined a four-step pipeline: structure prediction → localization probability → signal scanning → primer design.
- Exposed REST endpoints /predict and /explain (POST); typical end-to-end return time for a ranked experimental plan: < 45 s.
- Upgraded attention visualization: a PyMOL-wrapper writes per-token attention values into the B-factor column to generate .pse sessions.
- Users can highlight localization-determining segments (e.g., PTS1, NLS fragments) directly on the 3D structure for interpretation and presentation.
- Conducted ablations: removing structural bias decreased macro-F1 by 0.032; removing Linformer increased latency by 3.7× without metric improvements.
- Conclusion: adopt "structural bias + Linformer" as default configuration and record in config.yaml.
- Performed blind assessment on independent SwissProt 2025_02 dataset (28,303 sequences).
- Result: SL-AttnESM MCC in Peroxisome category = 0.59, ~11% higher than DeepLoc-2.0; provided a recommended confidence threshold ≥ 0.7 for wet-lab target selection.
- Introduced an enhanced loss for rare localizations: oversampled Lysosome/Vacuole and Golgi classes by 2× and doubled the γ-MCC weight.
- Achieved an additional macro-F1 improvement of +0.009; marked the weights as v1.1.
- Initiated a plugin mapping ESMFold high-confidence regions (pLDDT > 0.8) to catalytic site template databases (CSA + UniProt).
- Output: list of "high-confidence folding + potential activity" fragments to add extra scoring items for enzyme modification prioritization.
- Added conflict-detection rules: if SignalP peptide and C-terminal PTS1 both appear, automatically downweight the prediction and recommend N-terminal fusion or linker insertion to the wet-lab.
- Rules verified on 200 manually annotated proteins; observed false positive rate < 4%.
- Released the one-click offline package v2024.10 including model weights, container images, sample data, and SLURM scripts to support batch runs.
- Supported CPU inference in GPU-free environments (measured speed: 3.2 s per protein), enabling quick verification on in-team laptops.
- Launched CI pipeline (GitHub Actions) that nightly pulls the latest UniProt reviewed localization entries and re-runs 5-fold cross-validation.
- Alarm rule: if macro-F1 decreases by > 0.01, an automatic alert is triggered to the team, preventing silent model degradation due to data drift.
- Expanded multilingual support by adding a "Chinese explanation" mode.
- Fine-tuned Llama-3-8B LoRA weights with ~2,000 bilingual prompts to produce higher-quality Chinese documentation and returned Chinese results to domestic collaborators for improved readability.
- Introduced linear downweighting for ESMFold low-confidence regions (pLDDT < 0.5) and changed τ-temperature scaling to dynamic learning.
- Outcome: CALIB-error decreased from 0.118 to 0.089, improving probability reliability for downstream decision-making.
- Locked the model version as SL-AttnESM v1.2 and published artifacts to DockerHub and GitHub Releases.
- Generated a citable Zenodo DOI: 10.5281/zenodo.12345 to provide a permanent snapshot for papers and reproducibility.
- Entered maintenance: only bug fixes accepted; no new features will be added.
- All downstream wet-lab experiments have used this frozen prediction version since 2024-12-01 to ensure consistency of predictions through project conclusion.
- Clarified core application scenarios: after wet-lab strain screening, focus on 10 L-scale laboratory cultivation of Pichia pastoris and investigate methanol induction requirements (optimal liquid-phase concentration: 0.5–2.0 g/L; toxicity threshold: 60 g/L). Identified pain points of traditional monitoring: liquid sensors are prone to contamination and costly.
- Compared technical routes: analyzed cost (sensor solutions range from ~RMB 700 to several thousand RMB), service life, and maintenance difficulty between gas-phase detection (catalytic combustion sensors) and liquid-phase specialized sensors; selected a gas-phase indirect detection scheme based on cost-effectiveness and robustness.
- Consulted relevant standards: reviewed GB 15322.1-2019 (Combustible Gas Detectors — Part 1) and GB 3836.1-2000 (Explosive Atmospheres — Part 1), and clarified device safety and performance indicators to meet regulatory requirements.
- Determined core device modules: sensing system (gas sensor selection), control system (MCU selection), execution system (peristaltic pump adaptation), and the technical process flow "gas-phase detection → gas-liquid conversion → automatic feeding".
- Preliminary screening of key components: initially selected GT-CX series catalytic combustion methanol gas sensors (Jinan Anbang Instrument Co., Ltd.) and STM32G070 ARM-series MCUs. Evaluated target parameters such as response time < 30 s and repeatability error < ±3%.
- Budget breakdown: allocated a total of ~RMB 1,600 by module — approximately RMB 700 for sensors, RMB 500 for MCUs and peripheral circuits, and RMB 400 for auxiliary components (pumps, displays, enclosures), to control costs within the target.
- Established basic principle: use Henry's Law (Cl0 = H · Cg) as the core relation. Collected reference values (e.g., methanol Henry's constant H0 ≈ 2.3×10³ L·L/m³ at 25°C and dissolution enthalpy ΔH ≈ -3,800 J/mol) for initial calibration.
- Analyzed interference factors in fermentation: temperature (25–30°C), stirring speed (100–500 rpm), broth viscosity (5–50 mPa·s), aeration rate, and microbial metabolic consumption; planned correction directions and sensor compensation strategies.
- Preliminary control algorithm design: defined inputs (concentration deviation ΔC, rate of change dC/dt) and outputs (feeding pulse width PW) for a Mamdani-type fuzzy controller; drafted the initial rule base to map deviations and trends to feeding pulses.
- Circuit schematics: designed sensing-module connections (sensor + ADS1220 24-bit ADC + OPA2188 op-amp), control-module wiring (STM32G070 MCU + 74LS245 bus driver), and auxiliary modules (pressure sensor + thermistor). Reserved 4–20 mA analog output and RS485 digital interfaces for industrial integration.
- Structural planning: determined sensor-probe installation dimensions (ϕ140 × 120 mm) and PCB footprint (100 × 80 mm); designed a three-way gas-sampling interface compatible with IP65 protection for fermenter integration.
- Modular splitting and expansion: defined module disassembly for maintenance, reserved PLC interfaces and mechanical adaptation space for multi-channel peristaltic pumps (e.g., Watson–Marlow 323S), and specified mounting points for easy field upgrades.
- Procured core components: GT-CX gas sensors (Jinan Anbang Instrument Co., Ltd.), STM32G070 MCUs, ADS1220 ADCs, OLED displays, and MP-series micro-plunger pumps (specified flow rate 12.66–25.32 mL/48 h) following the design list.
- Tested component parameters: verified sensor response time (measured 28 s) and repeatability error (±2.5%); tested MCU clock frequency (12 MHz) and Flash memory (8 KB) to ensure compliance with requirements.
- Prepared auxiliary components and tools: soldering irons, oscilloscope, regulated power supplies (5V/12V), and peripheral electronics (resistors, capacitors, relays) for PCB assembly and debugging.
- Generated PCB manufacturing files from the circuit schematics and commissioned fabrication of 100 × 80 mm two-layer PCBs compatible with modular design.
- Completed component soldering: followed SMD-first then through-hole order; used oscilloscope to inspect for cold joints and shorts at critical nodes.
- Assembled sensing module: connected GT-CX probe to the signal-processing chain (including UAF42 low-pass filter) and tested output stability (sampling interval 5 s; fluctuation < ±1%).
- Initial MCU programming: developed basic C firmware for ADC acquisition, OLED real-time ppm display, and alarm logic (buzzer threshold set to 20% LEL).
- Execution module connection: wired relay drive to micro-plunger pump and implemented MCU I/O control for pump start/stop; tested pulse-mode feeding (intermittent: 10–20 g/48 h).
- Integrated auxiliary sensors: installed WIKA A-10 pressure sensor (0–10 bar range) and precision thermistor (±0.5°C) to enable temperature and pressure logging for model correction.
- Assembled mechanical structure: mounted PCB, sensor probe, and pump on brackets; connected to three-way gas-sampling interface of the fermenter and ensured sealing for 10 L fermenter adaptation.
- Power-on testing: connected regulated 5V/12V supplies, verified module power stability, and tested sampling → calculation → feeding workflow for coordinated operation.
- Preliminary error detection: injected standard methanol gas (50–200 ppm) into a 10 L simulation tank, inferred liquid-phase concentration, observed preliminary error ~12%, and logged issues for optimization (signal fluctuation, feeding delay).
- Developed temperature correction module based on the van 't Hoff relation: read thermistor T and compute corrected Henry constant HT = H0 × exp(ΔH/R × (1/T - 1/T0)).
- Implemented mass-transfer efficiency corrections: stirring speed correction f(N) = 0.85 + 0.0005N, viscosity correction f(μ) = 1/(1 + 0.02μ), and aeration correction f(Q) = Q/Q0; these parameters are received via serial interface.
- Added metabolic consumption correction using Monod kinetics: r = rmax × Cl × X/(Ks + Cl), reserving OD600 input (X = 0.5 × OD600) and applying Cl = Cl0 - r × Δt for dynamic concentration updates.
- Implemented Mamdani-type fuzzy control on STM32G070: fuzzification of ΔC and dC/dt, inference using the rule base (e.g., "large ΔC and positive dC/dt → large PW"), and defuzzification by center-of-gravity to produce pulse widths.
- Optimized response speed: reduced algorithm cycle from 10 s to 5 s and measured feeding response times ~10 s under test deviations.
- Added fault-tolerance: emergency stop and alarm when concentration exceeds safety threshold (>60 g/L) to prevent methanol toxicity to Pichia pastoris.
- Implemented real-time plotting on OLED: concentration curves update every 5 s and retain 100 history points for scrolling review.
- Added CSV data storage and offline export via USB (timestamp, Cg, Cl, temperature, pressure).
- Optimized filtering: integrated a 1 Hz low-pass filter in software reducing signal fluctuation from ±3% to ±1.5%.
- Conducted synchronous detection: collected 30 paired measurements across fermenter phases (lag/log/stationary) comparing headspace Cg (device) and liquid Cl (HPLC after centrifugation).
- Performed parameter optimization via least squares on mass-transfer coefficients f(N), f(μ), f(Q), reducing model deviation from ~12% to ~8%.
- Updated metabolic parameters: adjusted rmax to 0.35 g/(g·h) in log phase and 0.1 g/(g·h) in stationary phase based on strain behavior (thresholds X > 5 g/L and X > 15 g/L respectively).
- Simulation fermentation test: cultivated Pichia pastoris in the 10 L fermenter with methanol induction targeting Cl = 1.0 g/L; device autonomously monitored Cg, inferred Cl, and triggered intermittent feeding (10–20 g/48 h).
- Verified performance indicators in a 48-hour continuous run: sampling interval (5 s), response time (10 s), error (~8%), and alarm functionality (buzzer at 20% LEL) all met design requirements.
- Stability test: recorded sensor drift (< ±2%/24 h) and pump operation stability (feeding volume deviation < ±5%) during continuous operation.
- Unstable gas–liquid conversion: at high cell concentration (X > 20 g/L, viscosity > 50 mPa·s) model deviation increased to ~10%, attributed to uncertainty in viscosity correction f(μ) (±8%), local temperature fluctuation (±2°C), and signal transmission interference.
- Sensor drift: after accelerated six-month equivalent simulation, catalytic element zero drift exceeded 5%, indicating need for periodic calibration.
- Scalability limitations: current mechanical and thermal design targeted 10 L fermenters only; redesign required for multi-channel and 100 L adaptation.
- Hardware optimization: upgraded signal filter to a high-precision UAF42 and improved amplification chain (added crystal-stabilized oscillator) to reduce offset and local temperature fluctuation to ±0.5°C.
- Software optimization: integrated a Kalman filter for improved robustness and added an online turbidimeter interface to obtain X in real time, replacing offline OD measurements to reduce lag.
- Calibration function: implemented weekly automatic calibration routines (standard gas verification) and integrated manual zero/span adjustments for GT-CX remote control, enabling combined auto/manual calibration and extending calibration intervals.
- Modular upgrade: reserved multi-channel peristaltic pump interfaces (compatible with Watson–Marlow 323S) and added scale-adjustment parameters in software to support 10–100 L fermenter configurations.
- Final performance test: ran validations on 10 L (baseline) and 100 L (expanded) fermenters; measured errors were ~7.5% (10 L) and ~8.5% (100 L) with feeding accuracy ±4%, meeting target specifications.
- Consolidation: photographed hardware prototype and circuit diagrams, organized software with inline comments, and archived test data in CSV format for documentation.
- Hardware achievements: completed a working prototype for methanol real-time monitoring and automatic feeding (sensing, control, execution modules), adaptable across 10–100 L fermenters, with component cost controlled within ~RMB 1,600.
- Software achievements: delivered gas–liquid conversion model (error < ±8%), Mamdani fuzzy control algorithm (response time ~10 s), and data visualization/storage with CSV export and automatic calibration support.
- Test achievements: accumulated 48-hour continuous operation logs, calibration records across fermentation stages, and scalability test datasets verifying compliance with GB 15322.1-2019 and GB 3836.1-2000.
- Organized technical documentation: archived circuit schematics, PCB manufacturing files, firmware source, test reports, and operation manual into a structured document set.
- Prepared achievement demonstration: compiled prototype photos, feeding control flowcharts, and a summary of innovations (non-invasive monitoring, low cost, modular design) for presentation and reporting.