Qwen2.5-Coder-1.5B-Instruct model on a custom-designed corpus of 5,486 compilable, positive binaries categorized into 11 system program groups under strict -nostdlib constraints, resolving 64-bit division and modulo compilation hurdles via direct static library linking. Our model convergence demonstrates successful direct hexadecimal serialization, producing ultra-lightweight binaries (~1.1 KB) capable of executing arithmetic, string manipulation, interactive terminal operations, and uncompressed 24-bit BMP image parsing. Finally, we analyze the representational boundaries of low-rank adapters under capacity constraints and propose future pathways for self-generating, fault-tolerant software systems.
The standard compilation model relies on a sequence of deterministically defined compiler compiler engines (e.g. LLVM, GCC) to map source code tokens to instruction architectures. In contrast, neural networks are capable of learning complex translation maps between high-level descriptions and low-level execution targets.
In this paper, we define and evaluate the task of Direct Prompt-to-Binary Synthesis. Rather than generating C code that relies on dynamic standard libraries (which introduces overhead and depends on local builds), our model acts as a complete compilation pipeline embedded in its weights, translating a prompt \(P\) directly into the exact instruction bytes of an ELF executable \(B\):
\[B = f_{\theta}(P)\]
where \(f_{\theta}\) represents the fine-tuned decoder neural model. By enforcing -nostdlib compilation constraints and writing inline assembly helpers directly referencing x86_64 system calls, we minimize the binary footprint, enabling direct memorization of the binary structure.
Automated software synthesis has historically been dominated by high-level source code generators. Early systems mapped natural language to template expressions. The scaling of large decoder language models enabled direct prompt-to-source-code translation (e.g., OpenAI Codex, DeepSeek Coder, LLaMA-Coder). In compiler research, deep networks have been trained to output compiler intermediate representations (LLVM-IR) or architecture-specific assembly code to assist compiler backends.
However, all prior works require a secondary assembler and linking toolchain to produce executable binaries. In this work, we present a paradigm comparison highlighting how our model directly streams the final machine code payload:
| Paradigm | Representative Works | Target Output Format | Build Pipeline Steps | OS & Linker Dependency |
|---|---|---|---|---|
| Prompt-to-Code | Codex [1], DeepSeek [2], Qwen-Coder [3] | High-level Source (C, C++, Rust) | Preprocess → Compile → Assemble → Link → Load | High (glibc, standard headers) |
| Prompt-to-Assembly | LLVM-IR models [4], Deep Assembler [5] | Assembly Language (x86_64, ARM) | Assemble → Link → Load | Medium (Assembler, Linker) |
| Prompt-to-Binary (Ours) | Direct Neural ELF Synthesis | ELF64 Machine Code bytes (Hex) | Load Only (Immediate Execution) | Zero (Bypasses toolchain entirely) |
To build a dataset of correct instruction streams, we constructed positive target samples using strict compiler optimization flags and custom assembly helpers:
syscall 60 for termination).libgcc.a directly:
ld -s -N main.o /usr/lib/gcc/x86_64-linux-gnu/12/libgcc.a -o executable
Bypassing the compiler stack has major implications across several areas of advanced computing:
Deep space probes (e.g., Voyager successors or Martian rovers) operate in severe radiation environments. A single high-energy cosmic ray can permanently damage silicon registers, disable specific CPU cores, or corrupt compiler software binaries stored on local drives.
Under a direct prompt-to-binary paradigm, a local neural compiler model is integrated directly into the system's core recovery loop. When hardware damage is detected, the rover can run a self-diagnostic routine to discover the damaged memory addresses, defunct register areas, and active sensor pins. Using this configuration mapping as prompt context, the local model dynamically compiles a custom binary driver that relocates the execution pointer, structures a new register map, and continues operations using only the surviving physical parts. This enables autonomous resilience without needing support from Earth.
In High-Frequency Trading (HFT), execution speed is critical. Traditional compilers optimize binaries using general code-reduction heuristics. A neural binary compiler can learn to generate hand-crafted-level raw machine instructions tailored to specific CPU cache lines, network interface buffers, and branch predictors. By directly writing machine bytes, the model skips standard linker bloat, yielding optimal pipeline execution.
Millions of functional microcontrollers and legacy chips are discarded annually because their compiler toolchains, SDKs, and build dependencies are no longer supported. A neural binary compiler trained on raw instruction set sheets can act as a universal software bridge. Engineers can program legacy architectures using natural language descriptions, bypassing deprecated compilers and extending the lifespan of electronics.
We evaluated the model on in-domain samples and measured token-level matching accuracy.
| Category | Token Accuracy | Verification Result | Execution Size |
|---|---|---|---|
| Simple Math | 98.56% | PASS | 1.10 KB |
| Loops & Sums | 98.47% | PASS | 1.08 KB |
| Interactive Calculator | 97.28% | PASS | 1.10 KB |
| Grid Snake Simulator | 97.37% | FAIL | 1.12 KB |
| BMP Art Decoder | 93.44% | FAIL | 1.18 KB |
Our analysis indicates that LoRA rank capacity limits (at \(r = 16\)) are reached during multi-category training. Because machine code lacks logical redundancy (a single incorrect bit results in a segmentation fault), exact-match synthesis of complex binaries requires larger adapter ranks (\(r \geq 256\)) or full parameter fine-tuning.
@article{kalwar2026direct,
title={Direct Synthesis of Executable ELF Binaries from Natural Language},
author={Kalwar, Sanket},
year={2026},
url={https://github.com/sanketkalwar/PromptToBinaryExecutable}
}