GenBank-Improvised

An Enhanced Interface for NCBI GenBank

Team: IGEM IISC 2025

Website: You can check out the tool here.

Overview

The GenBank-Improvised project aims to modernize and simplify interaction with NCBI's GenBank, addressing long-standing usability issues such as poor search accuracy, lack of context explanations, and weak visualization options.

Our tool integrates:

NCBI API access for robust backend querying
Smart Search, a ranking-based, context-aware query engine
Plasmid Visualization, a clean graphical interface to explore genetic constructs

The result is a unified web portal that lowers the learning curve for new users while offering more power and clarity for researchers.

This project was led by Aditey Nandan, and involved various junior members of the iGEM-Juniors 2025 team. We ran the iGEM-Juniors team to give them a chance on working on a project in the first semester. They got introduced to various important interdisciplinary concepts, learnt new tools (GitHub, VSCode, etc.), documentation, software reproducibility, full stack web development and software engineering. Their support has been invaluable and this page helps document the progress they have made as part of the team.

2. Identified Problems in GenBank

Through systematic exploration and feedback, we identified several shortcomings in the GenBank interface.

2.1 Search Lottery

Queries for proteins often yield irrelevant results (e.g., "cas9" returns random strains).
No easy way to confirm the correct source organism.
Our fix: A Smart Search module that prioritizes well-annotated entries and model organisms.

2.2 Missing Explanations

Cryptic field labels (SNP, CDS, WGS) lack on-site guidance.
Users must rely on external resources for basic definitions.
Our fix: A dedicated Docs section explaining key terms.

BLAST restricts searches to one database at a time (nr/nt, RefSeq, PDB).
No "search all" or automatic fallback across repositories.
Proposed fix: A Unified BLAST Interface that queries multiple databases seamlessly.

2.4 Inflexible Genome Comparison

GenBank cannot directly compare annotated genomes or list common genes.
Proposed fix: A Comparative Genomics feature that aligns gene annotations from two genomes to identify overlaps and differences.

2.5 No Intelligent Intra-File Search

Users can't find specific genes within large sequences unless they know exact labels.
"Ctrl+F" is often the only search method.
Proposed fix: A Context-aware Find function that matches synonyms, abbreviations, and functional terms (e.g., "nitrate reductase" finds "nirk").

2.6 Visualization Limitations

External tools like SnapGene offer visualization but with editing locked behind paywalls.
Our fix: An open, interactive plasmid viewer supporting annotation, editing, and sequence export for educational use.

3. System Architecture

The project consists of three interconnected components:

Component	Description	Members
NCBI API Layer	Handles Entrez requests, parses results, caches data for efficiency, and serves structured JSON to the frontend.	Thrayambakesh
Smart Search Engine	Ranks and filters results based on annotation quality, model organism relevance, and text similarity.	Soham, Shivansh, Vedanta
Plasmid Visualization Tool	Uses interactive D3.js-based graphics to display circular or linear plasmid maps with feature hover details.	Divyansh, Govind, Ryan

Workflow: User → Search query → Smart Search → NCBI API → Parsed Results → Visualization + Docs

4. Creation Timeline

Phase 1: Ideation

Created a list of pre-existing bio-tech tools online.
Consulted bio-engineering enthusiasts and faculty about problems with these resources.
Brainstormed of possible solution implementations

Phase 2: Prototyping

Created a preliminary framework in Python
Learned to bypass use of Bio-python
Presentation to said-tool users for feedback

Phase 3: Finalising

Optimised and converted code into JS
Built a github site to host our tool.

5. Key Features

5.1 Smart Search

Intelligent ranking and synonym recognition
Autocomplete for organism and accession names
Filters for genome length, organism, and record type
Sorting by relevance, date, or completeness

5.2 Plasmid Visualization

Dynamic circular maps with zoom, hover, and labeling
Color-coded feature tracks (CDS, promoter, regulatory sites, etc.)
Optional export as SVG/PNG
Editable feature names for experimental design or teaching use

6. Results & Impact

Usability improvements:

Queries are faster and more relevant.
Novice users can understand GenBank fields directly from the interface.
Genome and plasmid data are visually interpretable at a glance.

Educational impact:

The visualization module doubles as a teaching tool for genetic constructs.

Our platform bridges the gap between beginner biology students and the overwhelming GenBank interface by providing clarity, context, and interactivity.

7. Future Work

Enable custom sequence uploads (FASTA/GenBank format)
Integrate BLAST alignment and motif search directly
Add plasmid editing and design functionality
Extend visualization to display comparative studies between plasmids
Create protein quaternary structure visualizations

8. Conclusion

GenBank-Improvised enhances the accessibility and functionality of one of biology's most essential databases.

By addressing its usability gaps - from confusing jargon to missing visualization - our project demonstrates how thoughtful design and modern web tools can make genomic data exploration faster, friendlier, and more educational.

The Team:

Thanks to Mayank Kumar Raj and Shrey Gupta for designing the logo of GenBank-Improvised.