Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks

Jian-Dong Yao; Wen-Bin Hao; Zhi-Gao Meng; Bo Xie; Jian-Hua Chen; Jia-Qi Wei

doi:10.1016/j.jnlest.2024.100290

Journals >Journal of Electronic Science and Technology >Volume 23 >Issue 1 >Page 100290 > Article

Journal of Electronic Science and Technology
Vol. 23, Issue 1, 100290 (2025)

Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks

Jian-Dong Yao, Wen-Bin Hao, Zhi-Gao Meng^*, Bo Xie..., Jian-Hua Chen and Jia-Qi Wei|Show fewer author(s)

Author Affiliations

State Grid Sichuan Electric Power Company Chengdu Power Supply Company, Chengdu, 610041, China

show less

DOI: 10.1016/j.jnlest.2024.100290 Cite this Article

Jian-Dong Yao, Wen-Bin Hao, Zhi-Gao Meng, Bo Xie, Jian-Hua Chen, Jia-Qi Wei. Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks[J]. Journal of Electronic Science and Technology, 2025, 23(1): 100290 Copy Citation Text

EndNote(RIS)

BibTex

Plain Text

show less

Fig. 1. Interaction dynamics between the DSO and VPPs within the MAMDP framework.

Download full size | View in the Article

Average cumulative reward for all agents across training episodes: MARL learning curve (above) and zoomed-in view of final 5000 episodes (below).

Fig. 2. Average cumulative reward for all agents across training episodes: MARL learning curve (above) and zoomed-in view of final 5000 episodes (below).

Download full size | View in the Article

Fig. 3. DSO’s pricing and net demand over a week.

Download full size | View in the Article

Fig. 4. Temporal dynamics of key state variables in the VPP network over a representative week.

Download full size | View in the Article

Fig. 5. Computational time and solution quality as the number of VPPs increases from 10 to 200.

Download full size | View in the Article

Fig. 6. System’s performance over a 30-day period following a permanent 15% reduction in average renewable generation capacity.

Download full size | View in the Article

Parameter category	Parameter name	Value	Description
MARL hyperparameter	Actor learning rate	1×10^–4	Learning rate for actor network updates
	Critic learning rate	5×10^–4	Learning rate for critic network updates
	Discount factor (γ)	0.99	Discount factor for future rewards
	Exploration rate (ε)	0.1	Initial exploration rate for epsilon-greedy policy
	Replay buffer size	1×10⁶	Capacity of experience replay buffer
	Batch size	256	Number of samples per training iteration
Network architecture	Actor network	[64, 32]	Hidden layer sizes for the actor network
Network architecture	Critic network	[128, 64]	Hidden layer sizes for the critic network
System parameters	Number of VPPs	10	Total number of VPPs in the network
	Simulation time steps	8760	Number of hourly time steps (1 year)
	Battery capacity	1000 kWh	Energy storage capacity per VPP
	Renewable generation limit	500 kW	Maximum renewable generation capacity per VPP
	Grid frequency limits	[49.8, 50.2] Hz	Allowable range for grid frequency

Table 1. MARL and system parameters.

View in the Article

Model	Reduction in costs (%)	Increase in VPP profits (%)
MARL (ours)	18.73	22.46
Stackelberg game	12.58	15.29
MPC	14.92	17.81
SARL	16.05	19.37

Table 2. Economic efficiency comparison.

View in the Article

Model	Convergence time (h)	Scalability (max VPPs)
MARL (ours)	8.64	127
Stackelberg game	2.31	43
MPC	5.17	76
SARL	11.89	92

Table 3. Computational performance comparison.

View in the Article

Model	Scenario changes	Unexpected events	Overall score
MARL (ours)	89.27	83.15	86.21
Stackelberg game	62.43	58.79	60.61
MPC	75.68	71.92	73.80
SARL	81.36	76.54	78.95

Table 4. Adaptability score (0–100).

View in the Article

Parameter	–50%	–25%	Base	+25%	+50%
Number of VPPs	–8.73	–3.42	0	2.91	4.68
Renewable energy penetration	–12.56	–5.87	0	7.23	11.95
Price volatility	5.32	2.14	0	–2.89	–6.71

Table 5. Sensitivity analysis results (percentage change in system performance).

View in the Article

Condition	Battery	Flexible load	Renewable curtailment
High demand	78.4%	89.2%	2.3%
Low demand	34.6%	12.7%	15.8%
High renewable	82.1%	8.9%	7.5%
Low renewable	45.3%	67.8%	0.1%

Table 6. VPP resource utilization (percentage of capacity).

View in the Article

Event type	Metric	MARL	Stackelberg	MPC	SARL
Renewable drop (30%)	Cost increase	8.37%	14.62%	11.28%	10.05%
	Stability index	–3.21%	–7.89%	–5.43%	–4.76%
	Recovery time (h)	2.34	4.81	3.67	3.12
Demand spike (25%)	Cost increase	6.93%	12.37%	9.84%	8.51%
	Stability index	–2.78%	–6.42%	–4.95%	–3.89%
	Recovery time (h)	1.87	3.95	2.83	2.41
Price forecast error (20%)	Cost increase	4.52%	9.76%	7.31%	6.18%
	Stability index	–1.43%	–4.28%	–3.12%	–2.35%
	Recovery time (h)	1.26	2.73	2.05	1.68

Table 7. System performance under unexpected events (percentage deviation from normal operations).

View in the Article

Forecast error	MARL cost increase	MARL stability index	MARL recovery time (h)
5%	1.28%	–0.54%	0.37
10%	2.76%	–1.19%	0.83
15%	4.65%	–2.03%	1.42
20%	6.93%	–3.11%	2.18
25%	9.87%	–4.46%	3.09