Introducing REE: Reproducible Execution Environment

A REE run produces two outputs: the generated text and a receipt. The receipt binds the job inputs to the job output, including the model, prompt, configuration, and generated result.

Introducing REE: Reproducible Execution Environment

If an AI result matters, someone else should be able to reproduce it. Right now, that is harder than it sounds.

Modern models usually run on GPUs using kernels that are fast but not reproducible across machines. Two people can use the same model, the same prompt, and the same settings, yet still end up with different outputs once the job moves between different hardware. For chat and creative work, that is usually fine. For benchmarks, outsourced inference, model judgements, or market settlement, it is not.

Introducing REE: Gensyn’s Reproducible Execution Environment. REE is a containerised runtime for AI inference that makes runs reproducible across supported hardware, and emits a receipt that other people can check for themselves.

Why this is hard

GPUs get their speed from parallel execution. To maximise throughput, they do not enforce a single fixed order for every floating-point operation. That is usually the right trade-off for performance, but it also means tiny numerical differences can appear from run to run. Those differences can compound over the course of a model run, resulting in a completely different output.

This is why “deterministic enough” often falls apart the moment a workload leaves one machine. Standard determinism settings can help you get repeatable behaviour on the same hardware. They do not solve the harder problem, which is getting the same result on different GPU types with different low-level implementations.

If someone gives you an AI result that you cannot rerun and reproduce yourself, you still have to take their word for it.

What REE does

REE packages the full reproducible inference pipeline into a single workflow.

At the top, the SDK handles model export, compilation, execution, and decoding. The compiler converts ONNX models into PyTorch modules and routes supported operations through reproducible implementations. Under that sits RepOps, Gensyn’s reproducible operator layer, which fixes reduction order, uses correctly rounded maths for key functions, and does the extra precision work needed to keep outputs stable across supported hardware.

You do not need to think about those layers separately when you use REE. It ships as a containerised workflow with a terminal UI. You pick a supported model, provide a prompt, run the job, and get an output that another party can reproduce.

Receipts, validation and verification

A REE run produces two outputs: the generated text and a receipt. The receipt binds the job inputs to the job output, including the model, prompt, configuration, and generated result.

From there, the other party has two options.

They can validate the receipt, which checks that the receipt is internally consistent and has not been tampered with.

Or they can verify it, which means rerunning the same job on their own hardware and confirming they get the same output.

That second step is the point. Verification is not a screenshot, a hosted log, or an API response that asks you to trust the operator. It is independent re-execution.

A receipt is not a lie detector. It does not tell you whether the prompt was truthful, whether the underlying model is correct, or whether a benchmark was well designed. It tells you something narrower but more useful: this output really came from this model, with these inputs, under a runtime another party can reproduce.

Why this matters

The first use case is evaluation. If a model score changes, teams should be able to rerun the exact job instead of arguing about environment drift, silent dependency changes, or whether someone used a slightly different setup.

The second is third-party inference. If one party is paying another to run a model, they should be able to check the result, rather than relying on reputation alone.

The third is any product where model output becomes part of a shared decision. That already matters for verifiable judging and market settlement, and it will matter more as AI moves deeper into systems where multiple parties need a clean audit trail.

This is the gap REE is built to close. It turns “trust me, I ran it” into “trust, but verify”.

Try it now

REE is available today and supports 40+ open-source models. Just follow these steps to get started.

Visit the website.

If you'd like to chat about a potential integration or use case for REE, get in touch with us.