• Journal of Electronic Science and Technology
  • Vol. 23, Issue 1, 100290 (2025)
Jian-Dong Yao, Wen-Bin Hao, Zhi-Gao Meng*, Bo Xie..., Jian-Hua Chen and Jia-Qi Wei|Show fewer author(s)
Author Affiliations
  • State Grid Sichuan Electric Power Company Chengdu Power Supply Company, Chengdu, 610041, China
  • show less
    DOI: 10.1016/j.jnlest.2024.100290 Cite this Article
    Jian-Dong Yao, Wen-Bin Hao, Zhi-Gao Meng, Bo Xie, Jian-Hua Chen, Jia-Qi Wei. Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks[J]. Journal of Electronic Science and Technology, 2025, 23(1): 100290 Copy Citation Text show less
    Interaction dynamics between the DSO and VPPs within the MAMDP framework.
    Fig. 1. Interaction dynamics between the DSO and VPPs within the MAMDP framework.
    Average cumulative reward for all agents across training episodes: MARL learning curve (above) and zoomed-in view of final 5000 episodes (below).
    Fig. 2. Average cumulative reward for all agents across training episodes: MARL learning curve (above) and zoomed-in view of final 5000 episodes (below).
    DSO’s pricing and net demand over a week.
    Fig. 3. DSO’s pricing and net demand over a week.
    Temporal dynamics of key state variables in the VPP network over a representative week.
    Fig. 4. Temporal dynamics of key state variables in the VPP network over a representative week.
    Computational time and solution quality as the number of VPPs increases from 10 to 200.
    Fig. 5. Computational time and solution quality as the number of VPPs increases from 10 to 200.
    System’s performance over a 30-day period following a permanent 15% reduction in average renewable generation capacity.
    Fig. 6. System’s performance over a 30-day period following a permanent 15% reduction in average renewable generation capacity.
    Parameter categoryParameter nameValueDescription
    MARL hyperparameterActor learning rate1×10–4Learning rate for actor network updates
    Critic learning rate5×10–4Learning rate for critic network updates
    Discount factor (γ)0.99Discount factor for future rewards
    Exploration rate (ε)0.1Initial exploration rate for epsilon-greedy policy
    Replay buffer size1×106Capacity of experience replay buffer
    Batch size256Number of samples per training iteration
    Network architectureActor network[64, 32]Hidden layer sizes for the actor network
    Critic network[128, 64]Hidden layer sizes for the critic network
    System parametersNumber of VPPs10Total number of VPPs in the network
    Simulation time steps8760Number of hourly time steps (1 year)
    Battery capacity1000 kWhEnergy storage capacity per VPP
    Renewable generation limit500 kWMaximum renewable generation capacity per VPP
    Grid frequency limits[49.8, 50.2] HzAllowable range for grid frequency
    Table 1. MARL and system parameters.
    ModelReduction in costs (%)Increase in VPP profits (%)
    MARL (ours)18.7322.46
    Stackelberg game12.5815.29
    MPC14.9217.81
    SARL16.0519.37
    Table 2. Economic efficiency comparison.
    ModelConvergence time (h)Scalability (max VPPs)
    MARL (ours)8.64127
    Stackelberg game2.3143
    MPC5.1776
    SARL11.8992
    Table 3. Computational performance comparison.
    ModelScenario changesUnexpected eventsOverall score
    MARL (ours)89.2783.1586.21
    Stackelberg game62.4358.7960.61
    MPC75.6871.9273.80
    SARL81.3676.5478.95
    Table 4. Adaptability score (0–100).
    Parameter–50%–25%Base+25%+50%
    Number of VPPs–8.73–3.4202.914.68
    Renewable energy penetration–12.56–5.8707.2311.95
    Price volatility5.322.140–2.89–6.71
    Table 5. Sensitivity analysis results (percentage change in system performance).
    ConditionBatteryFlexible loadRenewable curtailment
    High demand78.4%89.2%2.3%
    Low demand34.6%12.7%15.8%
    High renewable82.1%8.9%7.5%
    Low renewable45.3%67.8%0.1%
    Table 6. VPP resource utilization (percentage of capacity).
    Event typeMetricMARLStackelbergMPCSARL
    Renewable drop (30%)Cost increase8.37%14.62%11.28%10.05%
    Stability index–3.21%–7.89%–5.43%–4.76%
    Recovery time (h)2.344.813.673.12
    Demand spike (25%)Cost increase6.93%12.37%9.84%8.51%
    Stability index–2.78%–6.42%–4.95%–3.89%
    Recovery time (h)1.873.952.832.41
    Price forecast error (20%)Cost increase4.52%9.76%7.31%6.18%
    Stability index–1.43%–4.28%–3.12%–2.35%
    Recovery time (h)1.262.732.051.68
    Table 7. System performance under unexpected events (percentage deviation from normal operations).
    Forecast errorMARL cost increaseMARL stability indexMARL recovery time (h)
    5%1.28%–0.54%0.37
    10%2.76%–1.19%0.83
    15%4.65%–2.03%1.42
    20%6.93%–3.11%2.18
    25%9.87%–4.46%3.09
    Table 8. System response to renewable generation forecast errors.
    Jian-Dong Yao, Wen-Bin Hao, Zhi-Gao Meng, Bo Xie, Jian-Hua Chen, Jia-Qi Wei. Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks[J]. Journal of Electronic Science and Technology, 2025, 23(1): 100290
    Download Citation