GenBank-Improvised
An Enhanced Interface for NCBI GenBank
Team: IGEM IISC 2025
Website: The tool is on iGEM servers, you can check it out here. Incase, the link glitches for you, you can checkout the GitHub (we have not made any commits after 9th October 15:00 UTC, following iGEM rules of wiki freeze): https://igem-iisc-2025.github.io/GenBank-Improvised/
Overview
The GenBank-Improvised project aims to modernize and simplify interaction with NCBI's GenBank, addressing long-standing usability issues such as poor search accuracy, lack of context explanations, and weak visualization options.
Our tool integrates:
- NCBI API access for robust backend querying
- Smart Search, a ranking-based, context-aware query engine
- Plasmid Visualization, a clean graphical interface to explore genetic constructs
The result is a unified web portal that lowers the learning curve for new users while offering more power and clarity for researchers.

2. Identified Problems in GenBank
Through systematic exploration and feedback, we identified several shortcomings in the GenBank interface.
2.1 Search Lottery
- Queries for proteins often yield irrelevant results (e.g., "cas9" returns random strains).
- No easy way to confirm the correct source organism.
- Our fix: A Smart Search module that prioritizes well-annotated entries and model organisms.
2.2 Missing Explanations
- Cryptic field labels (SNP, CDS, WGS) lack on-site guidance.
- Users must rely on external resources for basic definitions.
- Our fix: A dedicated Docs section explaining key terms.
2.3 BLAST's Blind Spots
- BLAST restricts searches to one database at a time (nr/nt, RefSeq, PDB).
- No "search all" or automatic fallback across repositories.
- Proposed fix: A Unified BLAST Interface that queries multiple databases seamlessly.
2.4 Inflexible Genome Comparison
- GenBank cannot directly compare annotated genomes or list common genes.
- Proposed fix: A Comparative Genomics feature that aligns gene annotations from two genomes to identify overlaps and differences.
2.5 No Intelligent Intra-File Search
- Users can't find specific genes within large sequences unless they know exact labels.
- "Ctrl+F" is often the only search method.
- Proposed fix: A Context-aware Find function that matches synonyms, abbreviations, and functional terms (e.g., "nitrate reductase" finds "nirk").
2.6 Visualization Limitations
- External tools like SnapGene offer visualization but with editing locked behind paywalls.
- Our fix: An open, interactive plasmid viewer supporting annotation, editing, and sequence export for educational use.
3. System Architecture
The project consists of three interconnected components:
Component | Description | Members |
---|---|---|
NCBI API Layer | Handles Entrez requests, parses results, caches data for efficiency, and serves structured JSON to the frontend. | Thrayambakesh |
Smart Search Engine | Ranks and filters results based on annotation quality, model organism relevance, and text similarity. | Soham, Shivansh, Vedanta |
Plasmid Visualization Tool | Uses interactive D3.js-based graphics to display circular or linear plasmid maps with feature hover details. | Divyansh, Govind, Ryan |
Workflow: User → Search query → Smart Search → NCBI API → Parsed Results → Visualization + Docs
4. Creation Timeline
Phase 1: Ideation
- Created a list of pre-existing bio-tech tools online.
- Consulted bio-engineering enthusiasts and faculty about problems with these resources.
- Brainstormed of possible solution implementations
Phase 2: Prototyping
- Created a preliminary framework in Python
- Learned to bypass use of Bio-python
- Presentation to said-tool users for feedback
Phase 3: Finalising
- Optimised and converted code into JS
- Built a github site to host our tool.

5. Key Features
5.1 Smart Search
- Intelligent ranking and synonym recognition
- Autocomplete for organism and accession names
- Filters for genome length, organism, and record type
- Sorting by relevance, date, or completeness

5.2 Plasmid Visualization
- Dynamic circular maps with zoom, hover, and labeling
- Color-coded feature tracks (CDS, promoter, regulatory sites, etc.)
- Optional export as SVG/PNG
- Editable feature names for experimental design or teaching use

6. Results & Impact
Usability improvements:
- Queries are faster and more relevant.
- Novice users can understand GenBank fields directly from the interface.
- Genome and plasmid data are visually interpretable at a glance.
Educational impact:
The visualization module doubles as a teaching tool for genetic constructs.
Our platform bridges the gap between beginner biology students and the overwhelming GenBank interface by providing clarity, context, and interactivity.
7. Future Work
- Enable custom sequence uploads (FASTA/GenBank format)
- Integrate BLAST alignment and motif search directly
- Add plasmid editing and design functionality
- Extend visualization to display comparative studies between plasmids
- Create protein quaternary structure visualizations
8. Conclusion
GenBank-Improvised enhances the accessibility and functionality of one of biology's most essential databases.
By addressing its usability gaps - from confusing jargon to missing visualization - our project demonstrates how thoughtful design and modern web tools can make genomic data exploration faster, friendlier, and more educational.
The Team:

Thanks to Mayank Kumar Raj and Shrey Gupta for designing the logo of GenBank-Improvised.