Related Work

Thermal scene reconstruction has recently seen substantial advances driven by the convergence of radiance field modeling, generative diffusion techniques, and physically grounded thermal modeling. Our work builds upon and extends three foundational directions: dynamic 3D reconstruction from drone imagery, thermodynamics aware modeling for time varying thermal phenomena, and perceptual guidance for infrared image super-resolution.

Dynamic Drone based Reconstruction.

Tang et al. introduced DroneSplat [Fig. 1a], a robust 3D Gaussian Splatting (3DGS) framework tailored to aerial imagery collected by drones in uncontrolled environments. They address two critical challenges: dynamic distractors (e.g., vehicles, people) and sparse viewpoints typical of single flight drone captures. Their method integrates adaptive masking via segmentation and statistical residuals, and uses a voxel guided optimization strategy initialized by multi-view stereo. This combination allows the system to suppress non-static elements and improve novel view rendering under limited coverage. Figure 2a shows how DroneSplat improves geometric consistency and removes artifacts compared to standard 3DGS [Fig. 2b] in dynamic scenes.

Figure 1.a: Comparison between DroneSplat and vanilla 3D Gaussian Splatting (3DGS) on drone captured imagery. Given input views (top left) from in-the-wild drone footage, DroneSplat accurately reconstructs static scene geometry and removes dynamic distractors such as moving vehicles. Renderings from both input and novel viewpoints (middle and bottom rows) show that DroneSplat produces cleaner, temporally consistent reconstructions, while 3DGS exhibits ghosting artifacts and distortion in dynamic regions. The bottom visualization shows the sparse camera trajectories used to generate the 3D model. Figure adapted from Tang et al. [2025]
Figure 2a: DroneSplat
Figure 2b: 3DGS

Thermodynamics guided 4D Thermal Modeling.

While prior work has focused largely on static thermal reconstruction, Yang et al. proposed NTR-Gaussian [Fig. 3a], a framework that models thermal dynamics over time. By leveraging aerial TIR imagery and synthetic RGB frames, their method predicts temperature evolution across night scenes using 4D Gaussian Splatting informed by physical laws. A pair of networks estimate emissivity, heat capacity, and convective heat transfer, enabling temperature to be inferred at any future time using numerical integration. This approach represents the first physically consistent modeling of time varying thermal fields in outdoor 3D scenes. Figure 3b shows the thermodynamic theory behind NTR-Gaussian, showing the interplay between solar absorption and nighttime radiative convective heat dissipation.

Figure 3a: On the left: The NTR-Gaussian framework learns a time varying thermal representation using discrete, time stamped aerial TIR images aligned with synthetic RGB views, leveraging 3D Gaussian Splatting for continuous spatiotemporal modeling. On the right: The NTR dataset includes multi temporal aerial TIR and corresponding synthetic RGB imagery across four distinct outdoor scenes, enabling training and evaluation of dynamic thermal reconstruction methods. Figure adapted from Yang et al. [2025]

Figure 3b: Thermodynamic theory of outdoor temperature variation: During the day, objects absorb heat from solar radiation, at night, they primarily release it through radiative emission and convective exchange. Figure adapted from Yang et al. [2025]

Perceptually guided Diffusion for Infrared Super-Resolution.

Li et al. proposed DifIISR, a diffusion based model for infrared image super-resolution (IISR) that goes beyond visual fidelity to incorporate perceptual relevance for downstream tasks like segmentation and detection. The method introduces gradient based guidance during the reverse diffusion process, using both visual frequency priors and perceptual features from pretrained models such as VGG and SAM. As a result, DifIISR achieves improved performance not just in PSNR and SSIM, but also in high level tasks under low light or noisy infrared conditions. Figure 4a demonstrates how their method preserves fine texture and semantic integrity compared to prior IISR methods.

difiisr
Figure 4a: Qualitative comparison of infrared image super-resolution methods. DifIISR demonstrates superior visual quality and structural consistency compared to existing methods. Figure adapted from Li et al. [2025].

Conclusion

To build a real-time 3D thermal Gaussian splatting pipeline, it is essential to integrate the complementary strengths of the three aforementioned directions. First, handling dynamic aerial imagery in uncontrolled environments requires robust scene modeling under sparse and inconsistent viewpoints, as addressed by DroneSplat. Second, accurate thermal reconstruction demands temporal and temperature aware representations grounded in physical principles, as demonstrated by NTR-Gaussian. Finally, the inherent limitations of infrared imagery such as low contrast, noise, and loss of detail, must be mitigated through perceptually guided enhancement techniques, as introduced in DifIISR. These works highlight essential tasks that we aim to adapt and integrate into a unified system for real-time, high fidelity 3D thermal reconstruction.