S o f t w a r e E n g i n e e r i n g

Contributors

Index

Version 1

We designed an interface that enables researchers to iterate through protein design cycles using LEAPS-Software through dialogue with a Large Language Model (LLM). We refer to this interface as the “conversational interface”. The aim of the conversational interface is to allow researchers to safely launch LEAPS-Software experiments by simply expressing requirements in natural language, without requiring prior knowledge of highly specialized settings. In other words, the design philosophy minimizes the operational burden on researchers by using the LLM as an orchestrator that transforms natural language input from users into control commands understood by the system.

Each turn of the dialogue proceeds as follows:

Dataset Input

Researchers submit datasets in CSV or TSV format to the LLM. The LLM validates these files and confirms whether they conform to the expected format.

Goal Setting

Researchers submit target values to the LLM in natural language. The LLM validates the input and confirms whether it conforms to the expected format.

Result Verification

Researchers review results and consult with the LLM. This enables feedback for appropriate parameter configuration.

By controlling these turns through the LLM, we aimed to provide flexible input and enable application use without a learning curve. Finally, while the conversational interface increases flexibility, its high degree of freedom can lead to scattered configurations. To address this, we adopted a feature called Tool Calling, described below.

Design

Here we present how we planned the system.

Backend

The backend was built on three premises: “identical inputs reach identical results (reproducibility)”, “what happened can be explained retrospectively (auditability)”, and “failures can be safely reversed (reversibility)”. We defined pairs of natural language intent and corresponding functions as schemas using JSON Schema, and validated behavior by defining the API before the code.

Frontend

The frontend was designed with the core principle that researchers can proceed from input to execution without screen transitions. We first drafted a user journey in text form and verified whether conversations and results could be conducted as intended. We decided to develop the prototype only after confirming that appropriate control was possible on the backend.

Build

Here we present how we implemented the system.

Backend

The backend prioritized modern technologies, adopting Neon as the database and UploadThing as storage. By using storage for file preservation and the database for file management, we followed the principle of separation of responsibilities.

We now explain Tool Calling for LLMs, which is the core of this system. This is a feature that allows the LLM to analyze the intent of researchers’ natural language and call optimal functions. For example, consider a case where a user is dissatisfied with results. The user requires information to make improvements. When the user asks the LLM a question with the intent to identify the cause, the LLM calls a function to identify the cause. This allows researchers to give instructions flexibly without a learning curve, while the underlying system converges to a deterministic configuration.

Frontend

The frontend prioritized modern technologies, adopting Next.js as the framework and shadcn as the UI library. Using the SWR fetching library, we achieved optimistic updates and differential rendering to improve response speed.

We designed the UI to avoid interrupting the research flow. The conversational interface reduces the learning cost because requirements can be expressed in natural language. Additionally, using an LLM provides understanding of context in continuous research. To leverage this advantage, we implemented a chat-based UI.

Test

Here we present how we validated the system.

Backend

For the backend, we centered validation on scenario-driven testing that assumed actual research, preparing cases including ambiguous instructions. Tool Calling was tested to cover all branches. Reproducibility was confirmed by verifying that identical instructions yielded identical results. Auditability was confirmed by asking why a particular function was called and checking whether it matched the intent of the instruction. Reversibility was confirmed by verifying that correct fallback occurred even when errors arose.

Frontend

For the frontend, we centered on actual user testing. From the initial mock stage, we conducted usage assuming production conditions. In observations, we meticulously recorded only at what moments setbacks occurred.

Learn

Here we present how we improved the system.

Backend

For the backend, we controlled input and output based on Tool Calling validation. However, in reality, it became clear that covering all patterns was impossible, leading to combinatorial explosion, making it impractical to exhaustively address all pathways. Additionally, we considered that improving operability easily induces goal drift, potentially turning user behavior into merely a conversation tool with the LLM.

Frontend

For the frontend, we improved UI/UX based on feedback. However, in reality, user-driven dialogue made “the first step” unclear, leaving users uncertain about where to begin. This revealed the limitations of the conversational interface and emphasized the importance of UI/UX that eliminates confusion.

Version 2

We designed an interface that provides “a clear pathway” centered on “don’t make users think UI”, enabling researchers to advance through the LEAPS-Software protein design cycle via stepwise forms. We refer to this interface as the “form-based interface”. Based on insights from Version 1’s conversational interface, which showed that confusion easily arose at the initial stage, the form-based interface structured the items to be input. In other words, researchers can reach their destination deterministically simply by answering questions, and LEAPS-Software settings are automatically snapshotted. Consequently, the design philosophy minimizes room for goal drift.

Each step of the form proceeds as follows:

Dataset Input

Researchers input datasets in CSV or TSV format into text boxes. These are immediately validated to confirm whether they conform to the expected format.

Goal Setting

Researchers input target values via select boxes. These are immediately validated to confirm whether they conform to the expected format.

Advanced Settings

Researchers can easily modify advanced settings. This increases user flexibility and ensures research adaptability.

By providing these steps, users can proceed to execution without confusion. Even if incorrect input occurs, users can self-correct based on appropriate feedback.

Design

Here we present how we planned the system.

Backend

The backend prioritized deterministic input and deterministic output to match the form-based interface. First, we defined necessary parameters as schemas and saved configurations as files when forms were submitted. Next, we designed task management, which was not considered in Version 1. We also defined appropriate access controls to strengthen security beyond the previous version.

Frontend

The frontend was planned to complete the process from input to output without screen transitions, based on the principle of progressive disclosure. By repeating the cycle of creating prototypes and receiving feedback, we achieved rapid improvement. Validation performed at each step was realized by defining typed schemas, designed to reject invalid input in forms before submission. Additionally, we resolved the reduced flexibility imposed by forms by allowing advanced settings.

Furthermore, to prevent unintended use of LEAPS-Software, we decided to disclose information regarding usage.

Build

Here we present how we implemented the system.

Backend

The backend prioritized modern technologies, adopting Neon as the database and Vercel Blob as storage. By using storage for file preservation and the database for file management, we followed the principle of separation of responsibilities.

Tasks were assigned states to achieve optimal use of computational resources. Specifically, we managed them by preparing state variables with “pending / running / success / failure / interrupted” and managing them in a queue.

Frontend

The frontend prioritized modern technologies, adopting Next.js as the framework and shadcn as the UI library. For validation, we used React Hook Form and zod for type-safe management. We thoroughly implemented error feedback through immediate validation.

Test

Here we present how we validated the system.

Backend

For the backend, we conducted validation by actual user testing under conditions as close to actual operation as possible. We had users directly input CSV files collected from the database and confirmed the entire process from job acceptance to queue addition and state transitions. We deliberately interrupted the network and observed whether resumption operated deterministically.

Frontend

For the frontend, we centered on usage assuming production conditions. We observed whether researchers encountering the form for the first time could reach from new registration to result verification without interruption. When errors occurred, we focused on whether users could understand from the screen alone what input to correct and how, evaluating the effectiveness of explanatory text and placeholders. The results were reflected in the high correction rate on the spot, with many cases where users could correct datasets without guidance.

Learn

Backend

Compared to Version 1, Version 2 achieved further strengthening of ambiguity elimination. By switching from the conversational to the form-based interface, the combinatorial explosion caused by Tool Calling converged, allowing straightforward navigation from identical input to identical output. Because snapshots are always retained, recovery from failures can now be performed mechanically according to procedures. Furthermore, by moving jobs to state management, allocation of computational resources became easier to manage.

Frontend

The barrier of “the first step is invisible” revealed in Version 1 was resolved in Version 2’s form-based interface. Input items are progressively disclosed, and necessary options are presented directly, allowing researchers to proceed without confusion. Immediate validation does not point out errors collectively afterward but encourages fine-tuning on the spot, breaking the chain of setbacks. While there was concern about reduced flexibility, by preparing a layer of advanced settings, experienced users can delve deeper only when necessary. Consequently, the time to execution was shortened. The form functioned as “don’t make users think UI,” achieving the low learning cost intended in Version 1 in a more straightforward manner.

The repository used to create this website is available at gitlab.igem.org/2025/tsukuba.