Notebook
Lab Records
Included in this section:
- Lab Notebook PDF - Complete experimental records and documentation
- Computational Tools Used in the Notebook - The code and tools that supported our experimental work
Computational Tools Used in the Notebook
Here are the computational tools that were mentioned and used throughout our lab notebook:
Program 5: Plasmid Design & Sequence Modification Tool
This Google Colab notebook was used to modify NCBI sequences (like GPC3 from NM_004484.3) to meet our RADAR sensor design criteria. The tool performs automated sequence cleaning, codon-level analysis, and smart editing to generate functional sensor constructs.
Key Features:
- NCBI sequence processing for gene bank sequences
- RADAR sensor optimization and codon editing
- Plasmid-ready output for synthetic biology applications
Usage: Input NCBI gene sequences → Output RADAR sensor-optimized sequences
dna_sensor_generator.py
def main():
# Input DNA sequence below
dna = "" # the sequence you want to be made into a sensor seq
dna = clean_sequence(dna)
codons = to_codons(dna)
sequence = make_sequence(codons)
print("Spaced sequence:", sequence)
reverse = reverse_sequence(sequence)
print("Reversed:", reverse)
complement = reverse_complement(reverse)
print("Reverse complement:", complement)
edited_sequence = sequence_edit(complement)
print("Edited sequence:", edited_sequence)
def clean_sequence(seq):
"""Remove non-alphabetic characters from DNA sequence"""
cleaned = ""
for char in seq:
if char.isalpha():
cleaned += char
return cleaned
def to_codons(seq):
"""Convert DNA sequence to codon triplets"""
codons = []
remainder = len(seq) % 3
print(f"{remainder} letters were excluded from your sequence.")
length = len(seq) - remainder
i = 0
while i + 2 < length:
codon = seq[i] + seq[i+1] + seq[i+2]
codons.append(codon)
i = i + 3
return codons
def make_sequence(codons):
"""Create spaced sequence from codon list"""
spaced_sequence = codons[0]
for i in range(1, len(codons)):
spaced_sequence = spaced_sequence + " " + codons[i]
return spaced_sequence
def reverse_sequence(sequence):
"""Reverse the DNA sequence"""
reverse = ""
for i in range(len(sequence) - 1, -1, -1): # start at last index, go to 0
reverse += sequence[i]
return reverse
def reverse_complement(reverse):
"""Generate reverse complement of DNA sequence"""
complement = ""
for char in reverse:
if char.lower() == "a":
complement += "t"
elif char.lower() == "t":
complement += "a"
elif char.lower() == "c":
complement += "g"
elif char.lower() == "g":
complement += "c"
else:
complement += char
return complement
def sequence_edit(complement):
"""Edit sequence to optimize for sensor applications"""
current_sequence = complement.split(" ")
new_sequence = current_sequence[:]
tgg_indices = [i for i, codon in enumerate(current_sequence) if codon.lower() == "tgg"]
if tgg_indices:
middle_index = tgg_indices[len(tgg_indices) // 2]
new_sequence[middle_index] = "uag"
for i in range(len(new_sequence)):
codon = new_sequence[i].lower()
if not (tgg_indices and i == middle_index):
if codon == "taa":
new_sequence[i] = "gaa"
elif codon == "tag":
new_sequence[i] = "gag"
elif codon == "tga":
new_sequence[i] = "gga"
elif codon == "atg":
new_sequence[i] = "agg"
else:
new_sequence[i] = codon
edited_sequence = make_sequence(new_sequence)
return edited_sequence
main()
Program 6: Flow Cytometry Data Analysis
This Google Colab notebook was used to analyze our flow cytometry data from HBX, AKR1B10, GPC3, and AND-gated experiments. It performs statistical analysis, generates annotated plots, and calculates fold changes for our experimental results.
Key Features:
- Statistical analysis with t-tests on replicate datasets
- Annotated plotting for publication-ready visualizations
- Fold change calculation and standardized gating
Usage: Input flow cytometry data → Output statistical analysis and annotated plots
flow_cytometry_analysis.py
# -*- coding: utf-8 -*-
"""RADARoutput_plotting_template.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1lti9eKx7lz_DEt0c3L70LSz1aiHnVvHw
"""
#### IMPORT NECESSARY PYTHON PACKAGES ###
!pip install statannotations
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from IPython.display import display
from statannotations.Annotator import Annotator
"""**Now we need to upload the data file into the colab. Run the code below and then upload the csv file with your data**"""
from google.colab import files
# Upload a file from local computer
uploaded = files.upload()
data = pd.read_csv('sample raw data for plotting - Sheet1.csv')
# print out data to see what we are working with
data.head()
# Calculate the average of the three 'normalized GFP output' columns and create a new column 'Average GFP Output'
data['Average GFP Output'] = data[['normalized GFP output', 'normalized GFP output.1', 'normalized GFP output.2']].mean(axis=1)
# Reshape the data for plotting
data_melted = data.melt(id_vars=['Unnamed: 0'], value_vars=['normalized GFP output', 'normalized GFP output.1', 'normalized GFP output.2'], var_name='Replicate', value_name='GFP Output')
# Create the figure and axes
fig, ax = plt.subplots(figsize=(5, 5))
# Create the swarm plot
sns.swarmplot(x='Unnamed: 0', y='GFP Output', data=data_melted, ax=ax, color='gray')
# Add markers for the average GFP output
sns.pointplot(x='Unnamed: 0', y='Average GFP Output', data=data, ax=ax, color='black', linestyles='', marker='_', markersize=20, markeredgewidth=3)
# Set labels and title
ax.set_xlabel('Sample')
ax.set_ylabel('GFP Output (normalized to background)')
# Rotate x-axis labels for better readability
plt.xticks(rotation=90, ha='right')
# Display the plot
plt.tight_layout()
plt.show()
"""# Task
Create a new column in the dataframe `data` that is the average of columns 2, 3, and 4. Generate a swarm plot using columns 2, 3, and 4, with each replicate plotted across the row. Add a bold dash marker representing the average of the three replicates for each row. Create a second plot using the same data, perform independent t-tests comparing the first and second samples, and the third and fourth samples. Display the statistical test results as stars on the second plot and print the p-values for these comparisons.
"""
# Reshape the data for plotting
data_melted = data.melt(id_vars=['Unnamed: 0'], value_vars=['normalized GFP output', 'normalized GFP output.1', 'normalized GFP output.2'], var_name='Replicate', value_name='GFP Output')
# Confirm the structure of the melted DataFrame by displaying its head
display(data_melted.head())
# Create the figure and axes
fig, ax = plt.subplots(figsize=(5, 5))
# Create the swarm plot
sns.swarmplot(x='Unnamed: 0', y='GFP Output', data=data_melted, ax=ax, color='gray')
# Add markers for the average GFP output
sns.pointplot(x='Unnamed: 0', y='Average GFP Output', data=data, ax=ax, color='black', linestyles='', marker='_', markersize=20, markeredgewidth=3)
# Set labels and title
ax.set_xlabel('Sample')
ax.set_ylabel('GFP Output (normalized to background)')
# Rotate x-axis labels for better readability
plt.xticks(rotation=90, ha='right')
# Adjust layout
plt.tight_layout()
# Display the plot
plt.show()
"""## Define pairs for statistical testing
### Subtask:
Specify the pairs of samples to be compared using independent t-tests.
**Reasoning**:
Create a list of tuples specifying the pairs of samples for independent t-tests based on the 'Unnamed: 0' column in the `data` DataFrame.
"""
# Get the sample names from the 'Unnamed: 0' column
sample_names = data['Unnamed: 0'].tolist()
# Specify the pairs for comparison
pairs = [(sample_names[0], sample_names[1]), (sample_names[2], sample_names[3])]
# Print the pairs to verify
print(pairs)
"""## Add statistical annotations to the plot
### Subtask:
Use `statannotations` to add significance stars to the specified pairs on the plot.
**Reasoning**:
Create an Annotator object, set the statistical test, and apply the annotations to the plot.
"""
# Create an Annotator object
annotator = Annotator(ax, pairs, data=data_melted, x='Unnamed: 0', y='GFP Output')
# Configure the statistical test
annotator.configure(test='t-test_ind', text_format='star', loc='outside')
# Run the test and add annotations
annotator.apply_and_annotate()
# Display the plot with annotations
fig
"""## Perform statistical tests and print p-values
### Subtask:
Manually perform independent t-tests for the specified pairs and print the resulting p-values.
**Reasoning**:
Perform independent t-tests for the specified pairs and print the resulting p-values.
"""
from scipy.stats import ttest_ind
# Filter data for the first pair (complete match -trigger vs complete match +trigger)
complete_match_minus_trigger = data_melted[data_melted['Unnamed: 0'] == 'complete match -trigger']['GFP Output']
complete_match_plus_trigger = data_melted[data_melted['Unnamed: 0'] == 'complete match +trigger']['GFP Output']
# Perform independent t-test for the first pair
ttest_complete = ttest_ind(complete_match_minus_trigger, complete_match_plus_trigger)
# Print the p-value for the first pair
print(f"P-value for 'complete match -trigger' vs 'complete match +trigger': {ttest_complete.pvalue}")
# Filter data for the second pair (partial match -trigger vs partial match +trigger)
partial_match_minus_trigger = data_melted[data_melted['Unnamed: 0'] == 'partial match -trigger']['GFP Output']
partial_match_plus_trigger = data_melted[data_melted['Unnamed: 0'] == 'partial match +trigger']['GFP Output']
# Perform independent t-test for the second pair
ttest_partial = ttest_ind(partial_match_minus_trigger, partial_match_plus_trigger)
# Print the p-value for the second pair
print(f"P-value for 'partial match -trigger' vs 'partial match +trigger': {ttest_partial.pvalue}")
"""## Summary:
### Data Analysis Key Findings
* The average GFP output for each sample was calculated and added as a new column (`Average GFP Output`) to the `data` DataFrame.
* A swarm plot was successfully generated displaying individual data points and the average GFP output for each sample, with replicate data plotted across rows.
* Independent t-tests were performed comparing 'complete match -trigger' with 'complete match +trigger' and 'partial match -trigger' with 'partial match +trigger'.
* The p-value for the comparison between 'complete match -trigger' and 'complete match +trigger' was approximately 0.0174.
* The p-value for the comparison between 'partial match -trigger' and 'partial match +trigger' was approximately 0.0185.
* Significance stars based on the t-test results were successfully added to the plot.
### Insights or Next Steps
* Both comparisons showed statistically significant differences in GFP output (p < 0.05), suggesting that the "+trigger" condition has a significant impact on GFP expression for both complete and partial matches.
* Further investigation into the magnitude of the GFP output change and potential biological implications of the observed differences between complete and partial matches under the "+trigger" condition could be valuable.
"""