Notebook
Lab Records

Included in this section:

  • Lab Notebook PDF - Complete experimental records and documentation
  • Computational Tools Used in the Notebook - The code and tools that supported our experimental work
Click to open in fullscreen

Computational Tools Used in the Notebook

Here are the computational tools that were mentioned and used throughout our lab notebook:

Program 5: Plasmid Design & Sequence Modification Tool

This Google Colab notebook was used to modify NCBI sequences (like GPC3 from NM_004484.3) to meet our RADAR sensor design criteria. The tool performs automated sequence cleaning, codon-level analysis, and smart editing to generate functional sensor constructs.

Key Features:
  • NCBI sequence processing for gene bank sequences
  • RADAR sensor optimization and codon editing
  • Plasmid-ready output for synthetic biology applications

Usage: Input NCBI gene sequences → Output RADAR sensor-optimized sequences

dna_sensor_generator.py
def main():
    # Input DNA sequence below
    dna = ""  # the sequence you want to be made into a sensor seq

    dna = clean_sequence(dna)
    codons = to_codons(dna)
    sequence = make_sequence(codons)
    print("Spaced sequence:", sequence)

    reverse = reverse_sequence(sequence)
    print("Reversed:", reverse)

    complement = reverse_complement(reverse)
    print("Reverse complement:", complement)

    edited_sequence = sequence_edit(complement)
    print("Edited sequence:", edited_sequence)

def clean_sequence(seq):
    """Remove non-alphabetic characters from DNA sequence"""
    cleaned = ""
    for char in seq:
        if char.isalpha():
            cleaned += char
    return cleaned

def to_codons(seq):
    """Convert DNA sequence to codon triplets"""
    codons = []

    remainder = len(seq) % 3
    print(f"{remainder} letters were excluded from your sequence.")
    length = len(seq) - remainder

    i = 0
    while i + 2 < length:
        codon = seq[i] + seq[i+1] + seq[i+2]
        codons.append(codon)
        i = i + 3
    return codons

def make_sequence(codons):
    """Create spaced sequence from codon list"""
    spaced_sequence = codons[0]
    for i in range(1, len(codons)):
        spaced_sequence = spaced_sequence + " " + codons[i]
    return spaced_sequence

def reverse_sequence(sequence):
    """Reverse the DNA sequence"""
    reverse = ""
    for i in range(len(sequence) - 1, -1, -1):  # start at last index, go to 0
        reverse += sequence[i]
    return reverse

def reverse_complement(reverse):
    """Generate reverse complement of DNA sequence"""
    complement = ""
    for char in reverse:
        if char.lower() == "a":
            complement += "t"
        elif char.lower() == "t":
            complement += "a"
        elif char.lower() == "c":
            complement += "g"
        elif char.lower() == "g":
            complement += "c"
        else:
            complement += char
    return complement

def sequence_edit(complement):
    """Edit sequence to optimize for sensor applications"""
    current_sequence = complement.split(" ")
    new_sequence = current_sequence[:]
    tgg_indices = [i for i, codon in enumerate(current_sequence) if codon.lower() == "tgg"]
    if tgg_indices:
        middle_index = tgg_indices[len(tgg_indices) // 2]
        new_sequence[middle_index] = "uag"
    for i in range(len(new_sequence)):
        codon = new_sequence[i].lower()
        if not (tgg_indices and i == middle_index):
            if codon == "taa":
                new_sequence[i] = "gaa"
            elif codon == "tag":
                new_sequence[i] = "gag"
            elif codon == "tga":
                new_sequence[i] = "gga"
            elif codon == "atg":
                new_sequence[i] = "agg"
            else:
                new_sequence[i] = codon
    edited_sequence = make_sequence(new_sequence)
    return edited_sequence

main()

Program 6: Flow Cytometry Data Analysis

This Google Colab notebook was used to analyze our flow cytometry data from HBX, AKR1B10, GPC3, and AND-gated experiments. It performs statistical analysis, generates annotated plots, and calculates fold changes for our experimental results.

Key Features:
  • Statistical analysis with t-tests on replicate datasets
  • Annotated plotting for publication-ready visualizations
  • Fold change calculation and standardized gating

Usage: Input flow cytometry data → Output statistical analysis and annotated plots

flow_cytometry_analysis.py
# -*- coding: utf-8 -*-
"""RADARoutput_plotting_template.ipynb

Automatically generated by Colab.

Original file is located at
    https://colab.research.google.com/drive/1lti9eKx7lz_DEt0c3L70LSz1aiHnVvHw
"""

#### IMPORT NECESSARY PYTHON PACKAGES ###

!pip install statannotations

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from IPython.display import display
from statannotations.Annotator import Annotator

"""**Now we need to upload the data file into the colab. Run the code below and then upload the csv file with your data**"""

from google.colab import files

# Upload a file from local computer
uploaded = files.upload()

data = pd.read_csv('sample raw data for plotting - Sheet1.csv')


# print out data to see what we are working with

data.head()

# Calculate the average of the three 'normalized GFP output' columns and create a new column 'Average GFP Output'
data['Average GFP Output'] = data[['normalized GFP output', 'normalized GFP output.1', 'normalized GFP output.2']].mean(axis=1)

# Reshape the data for plotting
data_melted = data.melt(id_vars=['Unnamed: 0'], value_vars=['normalized GFP output', 'normalized GFP output.1', 'normalized GFP output.2'], var_name='Replicate', value_name='GFP Output')

# Create the figure and axes
fig, ax = plt.subplots(figsize=(5, 5))

# Create the swarm plot
sns.swarmplot(x='Unnamed: 0', y='GFP Output', data=data_melted, ax=ax, color='gray')

# Add markers for the average GFP output
sns.pointplot(x='Unnamed: 0', y='Average GFP Output', data=data, ax=ax, color='black', linestyles='', marker='_', markersize=20, markeredgewidth=3)


# Set labels and title
ax.set_xlabel('Sample')
ax.set_ylabel('GFP Output (normalized to background)')

# Rotate x-axis labels for better readability
plt.xticks(rotation=90, ha='right')

# Display the plot
plt.tight_layout()
plt.show()

"""# Task
Create a new column in the dataframe `data` that is the average of columns 2, 3, and 4. Generate a swarm plot using columns 2, 3, and 4, with each replicate plotted across the row. Add a bold dash marker representing the average of the three replicates for each row. Create a second plot using the same data, perform independent t-tests comparing the first and second samples, and the third and fourth samples. Display the statistical test results as stars on the second plot and print the p-values for these comparisons.
"""

# Reshape the data for plotting
data_melted = data.melt(id_vars=['Unnamed: 0'], value_vars=['normalized GFP output', 'normalized GFP output.1', 'normalized GFP output.2'], var_name='Replicate', value_name='GFP Output')

# Confirm the structure of the melted DataFrame by displaying its head
display(data_melted.head())

# Create the figure and axes
fig, ax = plt.subplots(figsize=(5, 5))

# Create the swarm plot
sns.swarmplot(x='Unnamed: 0', y='GFP Output', data=data_melted, ax=ax, color='gray')

# Add markers for the average GFP output
sns.pointplot(x='Unnamed: 0', y='Average GFP Output', data=data, ax=ax, color='black', linestyles='', marker='_', markersize=20, markeredgewidth=3)

# Set labels and title
ax.set_xlabel('Sample')
ax.set_ylabel('GFP Output (normalized to background)')

# Rotate x-axis labels for better readability
plt.xticks(rotation=90, ha='right')

# Adjust layout
plt.tight_layout()

# Display the plot
plt.show()

"""## Define pairs for statistical testing

### Subtask:
Specify the pairs of samples to be compared using independent t-tests.

**Reasoning**:
Create a list of tuples specifying the pairs of samples for independent t-tests based on the 'Unnamed: 0' column in the `data` DataFrame.
"""

# Get the sample names from the 'Unnamed: 0' column
sample_names = data['Unnamed: 0'].tolist()

# Specify the pairs for comparison
pairs = [(sample_names[0], sample_names[1]), (sample_names[2], sample_names[3])]

# Print the pairs to verify
print(pairs)

"""## Add statistical annotations to the plot

### Subtask:
Use `statannotations` to add significance stars to the specified pairs on the plot.

**Reasoning**:
Create an Annotator object, set the statistical test, and apply the annotations to the plot.
"""

# Create an Annotator object
annotator = Annotator(ax, pairs, data=data_melted, x='Unnamed: 0', y='GFP Output')

# Configure the statistical test
annotator.configure(test='t-test_ind', text_format='star', loc='outside')

# Run the test and add annotations
annotator.apply_and_annotate()

# Display the plot with annotations
fig

"""## Perform statistical tests and print p-values

### Subtask:
Manually perform independent t-tests for the specified pairs and print the resulting p-values.

**Reasoning**:
Perform independent t-tests for the specified pairs and print the resulting p-values.
"""

from scipy.stats import ttest_ind

# Filter data for the first pair (complete match -trigger vs complete match +trigger)
complete_match_minus_trigger = data_melted[data_melted['Unnamed: 0'] == 'complete match -trigger']['GFP Output']
complete_match_plus_trigger = data_melted[data_melted['Unnamed: 0'] == 'complete match +trigger']['GFP Output']

# Perform independent t-test for the first pair
ttest_complete = ttest_ind(complete_match_minus_trigger, complete_match_plus_trigger)

# Print the p-value for the first pair
print(f"P-value for 'complete match -trigger' vs 'complete match +trigger': {ttest_complete.pvalue}")

# Filter data for the second pair (partial match -trigger vs partial match +trigger)
partial_match_minus_trigger = data_melted[data_melted['Unnamed: 0'] == 'partial match -trigger']['GFP Output']
partial_match_plus_trigger = data_melted[data_melted['Unnamed: 0'] == 'partial match +trigger']['GFP Output']

# Perform independent t-test for the second pair
ttest_partial = ttest_ind(partial_match_minus_trigger, partial_match_plus_trigger)

# Print the p-value for the second pair
print(f"P-value for 'partial match -trigger' vs 'partial match +trigger': {ttest_partial.pvalue}")

"""## Summary:

### Data Analysis Key Findings

*   The average GFP output for each sample was calculated and added as a new column (`Average GFP Output`) to the `data` DataFrame.
*   A swarm plot was successfully generated displaying individual data points and the average GFP output for each sample, with replicate data plotted across rows.
*   Independent t-tests were performed comparing 'complete match -trigger' with 'complete match +trigger' and 'partial match -trigger' with 'partial match +trigger'.
*   The p-value for the comparison between 'complete match -trigger' and 'complete match +trigger' was approximately 0.0174.
*   The p-value for the comparison between 'partial match -trigger' and 'partial match +trigger' was approximately 0.0185.
*   Significance stars based on the t-test results were successfully added to the plot.

### Insights or Next Steps

*   Both comparisons showed statistically significant differences in GFP output (p < 0.05), suggesting that the "+trigger" condition has a significant impact on GFP expression for both complete and partial matches.
*   Further investigation into the magnitude of the GFP output change and potential biological implications of the observed differences between complete and partial matches under the "+trigger" condition could be valuable.

"""