Efficient Generative Transformer Operators for Million-Point PDEs

1 Sorbonne Université, CNRS, ISIR, F-75005 Paris, France
2 Criteo AI Lab, Paris, France
* Equal contribution.

Abstract

We introduce ECHO, a transformer–operator framework for generating million- point PDE trajectories. While existing neural operators (NOs) have shown promise for solving partial differential equations, they remain limited in practice due to poor scalability on dense grids, error accumulation during dynamic unrolling, and task-specific design. ECHO addresses these challenges through three key innovations. (i) It employs a hierarchical convolutional encode–decode architecture that achieves a 100× spatio-temporal compression while preserving fidelity on mesh points. (ii) It incorporates a training and adaptation strategy that enables high-resolution PDE solution generation from sparse input grids. (iii) It adopts a generative modeling paradigm that learns complete trajectory segments, mitigating long-horizon error drift. The training strategy decouples representation learning from downstream task supervision, allowing the model to tackle multiple tasks such as trajectory generation, forward and inverse problems, and interpolation. The generative model further supports both conditional and unconditional generation. We demonstrate state-of-the-art performance on million-point simulations across diverse PDE systems featuring complex geometries, high-frequency dynamics, and long-term horizons.

The ECHO framework

ECHO is a transformer-based operator built on an encode–generate–decode framework designed for efficient spatio-temporal PDE modeling at scale. It allows us to handle million-point trajectories on arbitrary domains (see Figure bellow). ECHO is the first generative transformer operator addressing under a unified formalism forward and inverse tasks, while operating in a compressed latent space, allowing scaling to high-resolution inputs from arbitrary domains.

ECHO Framework

The design of our model ECHO, follow 3 keys principles:

  • (i) Hierarchical spatio-temporal compression: For realistic deployment, compression must act jointly on space and time. We advocate deep encoder–decoders that reduce resolution hierarchically, yielding compact yet faithful spatio-temporal latents.
  • (ii) Rethinking the auto-regressive process: Next-frame(s) prediction remains the dominant training paradigm for the process, while suffering from error drifts. We introduce a robust procedure that generates entire trajectory segments conditioned on selected frames. It captures long-range temporal dependencies and enforces horizon-wide consistency.
  • (iii) deterministic to generative modeling: we leverage a stochastic modeling formulation for generating trajectory distributions. This allows us to deal with partial or noisy observations, and to cope with the physical information loss inherent to the compression step.

The figure below illustrates the benefits of principles (i)–(iii): (left) our spatio-temporal encoder achieves a compression ratio versus relative L2 error that is markedly superior to state-of-the-art baselines enabling large scale applications; (center) its trajectory-generation procedure is far less prone to error accumulation, enabling long-horizon forecasts; and (right) the generative modeling paradigm outperforms deterministic alternatives.

Motivation for ECHO.

Comparison with baselines

Quantitative comparison of neural solver with baselines Quantitative comparison of neural solver with baselines

Scaling and Generative Capabilities

Quantitative comparison of neural solver with baselines

Visualizations

BibTeX