| NNU-CHINA - iGEM 2025

Aim

We aim to present a simple interface for our clients—large hospitals and community health centers—enabling them to gain intuitive insights into data through straightforward methods and accurately determine whether pregnant women require folic acid supplementation.

Analysis Process

1. Preliminary Analysis

We first collected 50 sample sets, including 13 sets of the wild-type CC genotype, 23 sets of the heterozygous CT genotype, and 14 sets of the homozygous mutant TT genotype. By adding two probes (FAM and ROX) to the test samples and observing the amplification curves of these probes via qPCR (quantitative Polymerase Chain Reaction), we found that the trends of amplification curves varied significantly across different genotypes.(Figure1-Figure3)

Figure1

Figure2

Figure3

For example, in the wild-type CC genotype, the ΔRn (normalized fluorescence intensity difference) of FAM was clearly higher than that of ROX at the 40th cycle. In the homozygous mutant TT genotype, the result was the opposite: the ΔRn of ROX was significantly higher than that of FAM at the 40th cycle. In the heterozygous CT genotype, there was no fixed magnitude relationship between the ΔRn values of the two probes, and the disparity between their ΔRn values at the 40th cycle was not significant. Therefore, we need not only a plot that converts input data into smooth amplification curves but also a graph that allows more intuitive genotype judgment based on the ΔRn trends of the sample’s probes.

2. Secondary Analysis

Since the ΔRn of the FAM signal is consistently greater than that of the ROX signal in the CC genotype, and the ΔRn of the FAM signal is consistently smaller than that of the ROX signal in the TT genotype, we drew an analogy to positive and negative numbers. We selected the value of "FAM - ROX" as the x-axis coordinate: if x > 0, the genotype of the blood sample cannot be the TT type; similarly, if x < 0, the genotype of the sample cannot be the CC type.

Next, we needed to distinguish between the CT and CC genotypes, as well as between the CT and TT genotypes. Through data analysis, we found that although the magnitude relationship between the ΔRn values of FAM and ROX cannot be determined in the CT genotype, the difference in ΔRn between the two probes in the CT genotype is much smaller than that in the CC and TT genotypes.

3. Definition of Difference Value

How should we define this difference? For the magnitude difference between the two sets of ΔRn values, our initial approach was "ratio analysis"—dividing the larger ΔRn by the smaller ΔRn. However, regardless of the genotype, the smaller ΔRn value may be negative; moreover, due to the different FQ probes (fluorescent quantitative probes) used, the ratio fluctuates extremely widely. Therefore, we ultimately adopted "difference value analysis": using the ΔRn values at the 40th cycle, we first calculated the difference by subtracting the smaller value from the larger one, then divided this difference by the larger value. Since the larger value is always positive, the final calculated results (referred to as "relative difference") are all positive.

Analysis of the relative difference showed that the relative difference of the CT genotype is typically between 0–30%, while that of the CC and TT genotypes is approximately 80–110% (when the smaller ΔRn value is negative, the relative difference can exceed 100%).

Presentation of Differences and Visual Region Division

Box plot analysis further revealed that the upper fluctuation limit of the relative difference for both CC and TT genotypes is around 110%: the lower fluctuation limit is approximately 70% for the CC genotype and around 80% for the TT genotype. In contrast, the fluctuation range of the relative difference for the CT genotype is 0–50%. Since the relative differences of the CC and TT genotypes both fall into the "larger range" category, we set the boundary for this category at 70% (the lowest lower limit among the two); for the CT genotype, we set the boundary at 50%. To simplify the genotype judgment process and enhance visual clarity, we used 60%—the midpoint between 50% and 70%—as the partition boundary on the coordinate axis. This allows for more intuitive differentiation and judgment between CT and CC, or CT and TT. For instance, if the ΔRn of FAM > the ΔRn of ROX and the relative difference > 60%, we can determine the sample’s genotype as the wild-type CC.

Figure4

After establishing the basic judgment criteria, we further observed that in the CT genotype, the absolute value of the "FAM - ROX" value does not exceed 50,000, while in the CC and TT genotypes, the "FAM - ROX" value is above 5,000. Based on this, we refined the plot segmentation:

If a data point meets x (FAM - ROX value) > 50,000 and y (relative difference) > 0.6, it is determined to be the wild-type CC;

If it meets x (FAM - ROX value) < -50,000 and y (relative difference)> 0.6, it is determined to be the homozygous mutant TT;

If it meets -50,000 < x (FAM - ROX value) < 50,000 and y (relative difference) < 0.6, it is determined to be the heterozygous CT.

Figure5

For other blank regions, we made estimations based on the sign of x (FAM - ROX value) and the magnitude of y (relative difference). We used similar colors to mark "probable genotypes" in areas outside the threshold criteria, avoiding difficulties in interpreting extreme or borderline data.

Modelling Verification

The Software We Developed Consists of Three Parts: Main View, Amplification Plot, and Genotype Determination

Part 1 is Main View.The Main View is composed of three columns of data tables, with each table containing 40 data points.

The first column represents cycle number, with default values ranging from 1 to 40.The second and third columns respectively show the △Rn (normalized fluorescence intensity difference) values of the FAM and ROX probes (for the sample being tested) from the 1st to the 40th cycle, with default values ranging from 41 to 80 and 81 to 120 respectively.

Figure6 The main view

Below the Main View (Part 1), there are three buttons, whose functions are as follows:

1. Data paste field (80 data points)
Clicking this button allows you to directly paste 80 sets of copied data into the positions corresponding to the respective cycles of FAM and ROX in the second and third columns, which will automatically update the original default values. Please check whether there are exactly 80 data points when pasting. If the number of data points is not 80, the program will pop up a prompt: "Data count mismatch! 80 numbers required (40 groups of FAM/ROX), but actually recognized [X]." Please check the data format to ensure it contains only numbers and standard delimiters (where [X] is the actual number of data points detected in your clipboard). This helps you locate the cause of pasting errors.

2. Amplification Plot
Clicking this button will automatically generate the corresponding Amplification Plot based on the data in the three tables of Part 1. If you find that the Amplification Plot shows two straight lines, please check whether you have completed the data pasting and replacement in the first step (i.e., using the "Data paste field" button). We mark the FAM probe in blue and the ROX probe in red. If everything is correct, you will see two smooth curves (consistent with the amplification curve characteristics described in the preliminary analysis of Presentation of Differences and Visual Region Division ).

3. Genotype Determination
Clicking this button will display a graph with the same background as Figure 3 (refer to Presentation of Differences and Visual Region Division ). A single data point will be shown on the graph, and its position and marked genotype correspond to the sample’s genotype, as follows:

If the sample is of the wild-type CC genotype, the data point will appear in the red area in the upper left corner, marked with the genotype "CC".
If the sample is of the homozygous mutant TT genotype, the data point will appear in the blue area in the upper right corner, marked with the genotype "TT".
Similarly, if the sample is of the heterozygous CT genotype, the data point will appear in the purple area in the lower part of the middle section, marked with the genotype "CT".

This graph can very intuitively output the genotype of the input blood sample in accordance with the analytical judgment method we provided (i.e., the judgment logic based on "FAM - ROX" value and relative difference in Presentation of Differences and Visual Region Division ).

To verify the success of our software model construction, we used new blood samples that were not previously involved in data analysis and model building (to avoid sample overlap and ensure the objectivity of verification).

Figure7

In addition, we further tested 20 new samples for each genotype (CC, CT, and TT).Figure 7 shows the software output results for three different genotype samples. The accuracy rate of genotype determination was 100% for all three genotypes. This confirms that the model has a certain scientific basis for application and practical significance.

Code Demonstration


function ShowAndPastePlot()
% Input Interface
fig_input = figure('Name', 'Input Interface', 'Position', [100, 100, 700, 800]);

% Input Box
for i = 1:40
  uicontrol('Style','text','String',(['Cycle',num2str(i),':']),'Position', [30, 750-20*i, 120, 20]);
  uicontrol('Style','edit','Tag',(['X',num2str(i)]),'Position', [160, 750-20*i, 80, 20],'String',num2str(i));
  uicontrol('Style','text','String',(['FAM',num2str(i),':']),'Position', [280, 750-20*i, 120, 20]);
  uicontrol('Style','edit','Tag',(['Y1',num2str(i)]),'Position', [410, 750-20*i, 80, 20],'String',num2str(i+40));
  uicontrol('Style','text','String',(['ROX',num2str(i),':']),'Position', [530, 750-20*i, 120, 20]);
  uicontrol('Style','edit','Tag',(['Y2',num2str(i)]),'Position', [660, 750-20*i, 80, 20],'String',num2str(i+80));
end

% Paste Button
uicontrol('Style','pushbutton',...
  'String','Data paste field (80 data points)',...
  'Position', [80, 30, 160, 30],...
  'Callback', @pasteToInput);

% Melting Curve
uicontrol('Style','pushbutton',...
  'String','Amplification Plot',...
  'Position', [250, 30, 180, 30],...
  'Callback', @drawOverlappedCurves);

% Genotype Determination
uicontrol('Style','pushbutton',...
  'String','Genotype Determination',...
  'Position', [440, 30, 160, 30],...
  'Callback', @showCalculatedPoint);

% Paste Function（Data Distribution Mode：1→Y1(1)，2→Y2(1)，3→Y1(2)，4→Y2(2)...）
function pasteToInput(~,~)
  % clipboard content reading
  clipboardData = clipboard('paste');
  
  % Raw Data Preprocessing
  clipboardData = regexprep(clipboardData, '[，,;；\t\n\r]', ' ');  % Replace commas, semicolons, tabs, and newlines with spaces
  clipboardData = regexprep(clipboardData, '\s+', ' ');  % Collapse multiple spaces into a single space
  
  % Using sscanf to extract numbers more reliably
  nums = sscanf(clipboardData, '%f');
  
  % Remove possible whitespace values
  nums = nums(~isnan(nums));
  
  % Check the data count and process (80 numbers: 40 groups of Y1 and Y2)
  if length(nums) == 80
      % Data Distribution：1→Y1(1)，2→Y2(1)，3→Y1(2)，4→Y2(2)...
      for i = 1:40
          % Calculate the index of the current group in nums (odd number → Y1, even number → Y2)
          y1_index = 2*i - 1;  % 1,3,5,...,79
          y2_index = 2*i;      % 2,4,6,...,80
          
          set(findobj(fig_input, 'Tag',(['Y1',num2str(i)])), 'String', num2str(nums(y1_index)));
          set(findobj(fig_input, 'Tag',(['Y2',num2str(i)])), 'String', num2str(nums(y2_index)));
      end
      msgbox('Pasting Completed Successfully！');
  else
      % Error Message
      msgbox(['Data count mismatch! 80 numbers required (40 groups of FAM/ROX), but actually recognized ', num2str(length(nums)),' datas.'  ...
  ' Please check the data format to ensure it contains only numbers and standard delimiters']);
      % Command-line debugging information
      fprintf('Count of extracted numbers：%d\n', length(nums));
      fprintf('The first ten data points：%s\n', num2str(nums(1:min(10, length(nums)))));
  end
end

% Display of the smooth curve
function drawOverlappedCurves(~,~)
  x = 1:40; % Fix the X-axis range to 1–40
  y1 = zeros(1,40); y2 = zeros(1,40);
  % Read the data of the two curves
  for i = 1:40
      y1(i) = str2double(get(findobj(fig_input,'Tag',(['Y1',num2str(i)])),'String'));
      y2(i) = str2double(get(findobj(fig_input,'Tag',(['Y2',num2str(i)])),'String'));
  end
  fprintf('The first five data points of Curve 1：%s | The first five data points of Curve 2：%s\n', num2str(y1(1:5)), num2str(y2(1:5)));

  % Smooth Curve 
  x_smooth = linspace(1,40,200); 
  y1_smooth = interp1(x,y1,x_smooth,'linear'); 
  y2_smooth = interp1(x,y2,x_smooth,'linear');

  % 1. Create a new plotting window
  fig_plot = figure('Name', 'Smooth Curve', 'Position', [200, 200, 800, 500]);
  set(fig_plot, 'Color', 'white');

  % 2. Create axes
  ax = axes('Parent', fig_plot);
  set(ax, 'Color', 'white');
  hold(ax, 'on');

  % 3. Plotting of smooth curves
  plot(ax, x_smooth, y1_smooth, '-', 'LineWidth',1, 'Color','blue', 'DisplayName','FAM');
  plot(ax, x_smooth, y2_smooth, '-', 'LineWidth',1, 'Color','red', 'DisplayName','ROX');

  % 4. Icon Label
  xlabel(ax, 'Cycle', 'FontSize',12, 'Color','black');
  ylabel(ax, '△Rn', 'FontSize',12, 'Color','black');
  title(ax, 'Amplification Plot', 'FontSize',14, 'Color','black');
  grid(ax, 'on'); grid(ax, 'minor');
  legend(ax, 'Location', 'best', 'FontSize',11);
  axis(ax, 'tight');

  hold(ax, 'off');
end

% Genotype Determination
function showCalculatedPoint(~,~)
  % Retrieve the 40th data group
  y1_40 = str2double(get(findobj(fig_input,'Tag','Y140'),'String'));
  y2_40 = str2double(get(findobj(fig_input,'Tag','Y240'),'String'));
  
  % Calculate the x and y coordinates
  x_coord = y1_40 - y2_40;
  max_val = max(y1_40, y2_40);
  
  % Avoid division by zero situations
  if max_val == 0
      y_coord = 0;
  else
      y_coord = abs(x_coord) / max_val;
  end
  
  % Create a new chart window
  fig_point = figure('Name', 'Genotype Determination', 'Position', [200, 200, 800, 500]);
  set(fig_point, 'Color', 'white');
  
  % Create axes and set the range
  ax = axes('Parent', fig_point);
  set(ax, 'Color', 'white');
  xlim(ax, [-300000, 300000]);
  ylim(ax, [0, 1.2]);
  hold(ax, 'on');
  
  % Color definition (hex to RGB conversion)
  color1 = hex2rgb('#FDABF7');   % y:0.6-1.2, x:-300000--50000
  color2 = hex2rgb('#FED4FB');   % y:0.6-1.2, x:-50000-0
  color3 = hex2rgb('#DCF2F5');   % y:0.6-1.2, x:0-50000
  color4 = hex2rgb('#B9E6EC');   % y:0.6-1.2, x:50000-300000
  color5 = hex2rgb('#E6DFEE');   % y:0-0.6, x:-300000--50000和50000-300000
  color6 = hex2rgb('#CABBDB');   % y:0-0.6, x:-50000-50000
  
  % Background color setting
  patch(ax, [-300000, -50000, -50000, -300000], [0.6, 0.6, 1.2, 1.2], color1, 'EdgeColor', 'none');
  patch(ax, [-50000, 0, 0, -50000], [0.6, 0.6, 1.2, 1.2], color2, 'EdgeColor', 'none');
  patch(ax, [0, 50000, 50000, 0], [0.6, 0.6, 1.2, 1.2], color3, 'EdgeColor', 'none');
  patch(ax, [50000, 300000, 300000, 50000], [0.6, 0.6, 1.2, 1.2], color4, 'EdgeColor', 'none');
  patch(ax, [-300000, -50000, -50000, -300000], [0, 0, 0.6, 0.6], color5, 'EdgeColor', 'none');
  patch(ax, [50000, 300000, 300000, 50000], [0, 0, 0.6, 0.6], color5, 'EdgeColor', 'none');
  patch(ax, [-50000, 50000, 50000, -50000], [0, 0, 0.6, 0.6], color6, 'EdgeColor', 'none');
  
  % Initialize the properties of data points
  marker = 'o';          % The default shape is a circle
  color = 'green';       % The default color is green
  name = '';             % The default name
  
  %  Set data point properties based on different conditions
  if y_coord > 0.6
      if x_coord > 50000
          % y>0.6 and x>50000：Triangle，#1A6FDF，CC
          marker = '^';
          color = hex2rgb('#1A6FDF');
          name = 'CC';
      elseif x_coord < -50000
          % y>0.6 and x<-50000：Square，#FF0000，TT
          marker = 's';
          color = hex2rgb('#FF0000');
          name = 'TT';
      end
  end
  
  if y_coord < 0.6
      if 50000 > x_coord > -50000 
          % y<0.6 and -50000

Useful Links

Contact Us