A SYSTEMATIC REVIEW OF ENERGY EFFICIENT DEPLOYMENT STRATEGIES FOR LARGE LANGUAGE MODELS IN REAL WORLD APPLICATIONS


Sweta Kumari, Dr. Shashank Swami
Department of Computer Science & Engineering, Vikrant University, Gwalior, Madhya Pradesh
Abstract
The widespread use of Large Language Models (LLMs) in practice has become an issue of mounting concern regarding energy usage and sustainability, as inference loads rapidly begin to comprise a larger portion of the total cost of the AI systems. It is a review of the literature published on energy-efficient deployment strategies of LLMs addressing the energy challenges at the deployment stage, methods of optimizing models at the deployment stage, and system-level and hardware-aware serving strategies. The inference-focused workloads, heterogeneity in the infrastructure between the cloud, edge, and on-device, and overheads at the system level are discussed as the main contributors to deployment-time energy wastefulness. The review also evaluates the use of model compression, parameter-efficient adaptation, conditional computation, adaptive serving models, and accelerator-aware execution to reduce energy expenditure within real-world restrictions. In general, the article points to the necessity of more coherent and comprehensive deployment models and universalized energy assessment procedures as the only way to make LLMs usable in large-scale production settings in a scalable and sustainable manner.
Keywords: Large Language Models; Energy-Efficient Deployment; Inference Optimization; Green AI; Edge–Cloud Computing
Journal Name :
EPRA International Journal of Multidisciplinary Research (IJMR)

VIEW PDF
Published on : 2026-02-19

Vol : 12
Issue : 2
Month : February
Year : 2026
Copyright © 2026 EPRA JOURNALS. All rights reserved
Developed by Peace Soft