Thermal scene reconstruction has recently seen substantial advances driven by the convergence of radiance field modeling, generative diffusion techniques, and physically grounded thermal modeling. Our work builds upon and extends three foundational directions: dynamic 3D reconstruction from drone imagery, thermodynamics aware modeling for time varying thermal phenomena, and perceptual guidance for infrared image super-resolution.
Dynamic Drone based Reconstruction.
Tang et al. introduced DroneSplat [Fig. 1a], a robust 3D Gaussian Splatting (3DGS) framework tailored to aerial imagery collected by drones in uncontrolled environments. They address two critical challenges: dynamic distractors (e.g., vehicles, people) and sparse viewpoints typical of single flight drone captures. Their method integrates adaptive masking via segmentation and statistical residuals, and uses a voxel guided optimization strategy initialized by multi-view stereo. This combination allows the system to suppress non-static elements and improve novel view rendering under limited coverage. Figure 2a shows how DroneSplat improves geometric consistency and removes artifacts compared to standard 3DGS [Fig. 2b] in dynamic scenes.



Thermodynamics guided 4D Thermal Modeling.
While prior work has focused largely on static thermal reconstruction, Yang et al. proposed NTR-Gaussian [Fig. 3a], a framework that models thermal dynamics over time. By leveraging aerial TIR imagery and synthetic RGB frames, their method predicts temperature evolution across night scenes using 4D Gaussian Splatting informed by physical laws. A pair of networks estimate emissivity, heat capacity, and convective heat transfer, enabling temperature to be inferred at any future time using numerical integration. This approach represents the first physically consistent modeling of time varying thermal fields in outdoor 3D scenes. Figure 3b shows the thermodynamic theory behind NTR-Gaussian, showing the interplay between solar absorption and nighttime radiative convective heat dissipation.


Perceptually guided Diffusion for Infrared Super-Resolution.
Li et al. proposed DifIISR, a diffusion based model for infrared image super-resolution (IISR) that goes beyond visual fidelity to incorporate perceptual relevance for downstream tasks like segmentation and detection. The method introduces gradient based guidance during the reverse diffusion process, using both visual frequency priors and perceptual features from pretrained models such as VGG and SAM. As a result, DifIISR achieves improved performance not just in PSNR and SSIM, but also in high level tasks under low light or noisy infrared conditions. Figure 4a demonstrates how their method preserves fine texture and semantic integrity compared to prior IISR methods.

Conclusion
To build a real-time 3D thermal Gaussian splatting pipeline, it is essential to integrate the complementary strengths of the three aforementioned directions. First, handling dynamic aerial imagery in uncontrolled environments requires robust scene modeling under sparse and inconsistent viewpoints, as addressed by DroneSplat. Second, accurate thermal reconstruction demands temporal and temperature aware representations grounded in physical principles, as demonstrated by NTR-Gaussian. Finally, the inherent limitations of infrared imagery such as low contrast, noise, and loss of detail, must be mitigated through perceptually guided enhancement techniques, as introduced in DifIISR. These works highlight essential tasks that we aim to adapt and integrate into a unified system for real-time, high fidelity 3D thermal reconstruction.
