[1] Asmus P.. Microgrids. virtual power plants and our distributed energy future, Electr. J., 23, 72-82(2010).
[4] P. Lombardi, M. Powalko, K. Rudion, Optimal operation of a virtual power plant, in: Proc. of IEEE Power & Energy Society General Meeting, Calgary, Canada, 2009, pp. 1–6.
[6] Khaksari A., Steriotis K., Makris P., Tsaousoglou G., Efthymiopoulos N., Varvarigos E.. Electricity market equilibria analysis on the value of demand-side flexibility portfolios’ mix and the strategic demand aggregators’ market power. Sustain. Energy Grids, 38, 101399(2024).
[8] Y.H. Peng, Big Data, Machine Learning Challenges of High Dimensionality in Financial Administration, Ph.D. dissertation, University of Brasilia, Brasilia, 2019.
[16] Saad W., Han Z., Poor H.V., Başar T.. Game-theoretic methods for the smart grid: an overview of microgrid systems. demand-side management, and smart grid communications, IEEE Signal Proc. Mag., 29, 86-105(2012).
[17] Papavasiliou A., Mou Y.-T., Cambier L., Scieur D.. Application of stochastic dual dynamic programming to the real-time dispatch of storage under renewable supply uncertainty. IEEE T. Sustain. Energ., 9, 547-558(2019).
[18] L. Buşoniu, R. Babuška, B. De Schutter, Multiagent reinfcement learning: An overview, in: D. Srinivasan, L.C. Jain (Eds.), Innovations in MultiAgent Systems Application1, Springer, Berlin, Germany, 2010, pp. 183–221.
[19] J.N. Foerster, I.M. Assael, N. de Freitas, S. Whiteson, Learning to communicate with deep multiagent reinfcement learning, in: Proc. of the 30th Intl. Conf. on Neural Infmation Processing Systems, Red Hook, USA, 2016, pp. 2145–2153.
[20] K.Q. Zhang, Z.R. Yang, T. Başar, Multiagent reinfcement learning: a ive overview of theies algithms, in: K.G. Vamvoudakis, Y. Wan, F.L. Lewis, D. Cansever (Eds.), Hbook of Reinfcement Learning Control, Springer, Cham, Germany, 2021, pp. 321–384.
[21] Nguyen T.T., Nguyen N.D., Nahavandi S.. Deep reinforcement learning for multiagent systems: a review of challenges. solutions, and applications, IEEE T. Cybernetics, 50, 3826-3839(2020).
[22] Hu J.-L., Wellman M.P.. Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res., 4, 1039-1069(2003).
[23] M. Lauer, M.A. Riedmiller, An algithm f distributed reinfcement learning in cooperative multiagent systems, in: Proc. of the 17th Intl. Conf. on Machine Learning, Stanfd, USA, 2000, pp. 535–542.
[24] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, I. Mdatch, Multiagent actcritic f mixed cooperativecompetitive environments, in: Proc. of the 31st Intl. Conf. on Neural Infmation Processing Systems, Long Beach, USA, 2017, pp. 6382–6393.
[25] M. Tokic, G. Palm, Valuedifference based explation: adaptive control between epsilongreedy softmax, in: Proc. of the 34th Annual German Conf. on AI, Berlin, Germany, 2011, pp. 335–346.
[27] Liu W.-R., Zhuang P., Liang H., Peng J., Huang Z.-W.. Distributed economic dispatch in microgrids based on cooperative reinforcement learning. IEEE T. Neur. Net. Lear., 29, 2192-2203(2018).
[30] Wang H.-W., Huang T.-W., Liao X.-F., Abu-Rub H., Chen G.. Reinforcement learning in energy trading game among smart microgrids. IEEE T. Ind. Electron., 63, 5109-5119(2016).
[31] Y.D. Yang, J.Y. Hao, M.Y. Sun, Z. Wang, C.J. Fan, G. Strbac, Recurrent deep multiagent Qlearning f autonomous brokers in smart grid, in: Proc. of the 27th Intl. Joint Conf. on Artificial Intelligence, Stockholm, Sweden, 2018, pp. 569–575.
[35] Yang Q.-L., Wang G., Sadeghi A., Giannakis G.B., Sun J.. Two-timescale voltage control in distribution grids using deep reinforcement learning. IEEE T. Smart Grid, 11, 2313-2323(2019).
[39] P. Vytelingum, S.D. Ramchurn, T.D. Voice, A. Rogers, N.R. Jennings, Trading agents f the smart electricity grid, in: Proc. of the 9th Intl. Conf. on Autonomous Agents Multiagent Systems, Tonto, Canada, 2010, pp. 897–904.
[42] G. Dalal, K. Dvijotham, M. Vecerik, T. Hester, C. Paduraru, Y. Tassa, Safe explation in continuous action spaces [Online]. Available, https:arxiv.gabs1801.08757, January 2018.
[43] Busoniu L., Babuska R., De Schutter B., comprehensive survey of multiagent reinforcement learning A, T. Systems IEEE. Man. and Cybernetics, Part C (Applications and Reviews), 38, 156-172(2008).
[47] T.P. Lillicrap, J.J. Hunt, A. Pritzel, et al., Continuous control with deep reinfcement learning [Online]. Available, https:arxiv.gabs1509.02971, July 2019.
[48] J.N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, S. Whiteson, Counterfactual multiagent policy gradients, in: Proc. of the 32nd AAAI Conf. on Artificial Intelligence, New leans, USA, 2018, pp. 2974–2982.
[49] A. Vaswani, N. Shazeer, N. Parmar, et al., Attention is all you need, in: Proc. of the 31st Intl. Conf. on Neural Infmation Processing Systems, Long Beach, USA, 2017, pp. 6000–6010.
[50] J.C. Jiang, C. Dun, T.J. Huang, Z.Q. Lu, Graph convolutional reinfcement learning, in: Proc. of the 8th Intl. Conf. on Learning Representations, Addis Ababa, Ethiopia, 2020, pp. 1–13.
[51] S. Iqbal, F. Sha, Actattentioncritic f multiagent reinfcement learning, in: Proc. of the 36th Intl. Conf. on Machine Learning, Long Beach, USA, 2019, pp. 2961–2970.
[52] T. Schaul, J. Quan, I. Antonoglou, D. Silver, Priitized experience replay [Online]. Available, https:arxiv.gabs1511.05952, February 2016.
[53] M. Plappert, R. Houthooft, P. Dhariwal, et al., Parameter space noise f explation, in: Proc. of the 6th Intl. Conf. on Learning Representations, Vancouver, Canada, 2018, pp. 1–18.
[55] D.P. Kingma, J. Ba, Adam: a method f stochastic optimization, in: Proc. of the 3rd Intl. Conf. on Learning Representations, San Diego, USA, 2015, p. 6.
[56] T.P. Lillicrap, J.J. Hunt, A. Pritzel, et al., Continuous control with deep reinfcement learning, in: Proc. of the 4th Intl. Conf. on Learning Representations, San Juan, Puerto Rico, 2016, pp. 1–14.
[57] J. Nocedal, S.J. Wright, Numerical Optimization, 2nd ed., Springer, New Yk, USA, 2006.
[58] J. Achiam, D. Held, A. Tamar, P. Abbeel, Constrained policy optimization, in: Proc. of the 34th Intl. Conf. on Machine Learning, Sydney, Australia, 2017, pp. 22–31.
[59] F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, Safe modelbased reinfcement learning with stability guarantees, in: Proc. of the 31st Intl. Conf. on Neural Infmation Processing Systems, Long Beach, USA, 2017, pp. 908–919.
[60] Z. Sheebaelhamd, K. Zisis, A. Nisioti, D. Gkouletsos, D. Pavllo, J. Kohler, Safe deep reinfcement learning f multiagent systems with continuous action spaces, arXiv preprint, arXiv: 2108.03952 (2021).
[61] A.P. Dobos, PV Watts Version 5 Manual, National Renewable Energy Labaty, Washington, USA, 2014.
[63] S. Ong, N. Clark, Commercial Residential Hourly Load Profiles f All TMY3 Locations in the United States, DOE Open Energy Data Initiative (OEDI), National Renewable Energy Lab. (NREL), Golden, USA, 2014.
[65] Dong L., Tu S.-Q., Li Y., Pu T.-J.. A stackelberg game model for dynamic pricing and energy management of multiple virtual power plants using metamodel-based optimization method. Power Syst. Technol., 44, 973-981(2020).
[68] J. Snoek, H. Larochelle, R.P. Adams, Practical Bayesian optimization of machine learning algithms, in: Proc. of the 25th Intl. Conf. on Neural Infmation Processing Systems, Lake Tahoe, USA, 2012, pp. 2951–2959.
[69] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, D. Meger, Deep reinfcement learning that matters, in: Proc. of the 32nd AAAI Conf. on Artificial Intelligence, New leans, USA, 2018, pp. 3207–3214.
[70] P. Mitz, R. Nishihara, S. Wang, et al., Ray: a distributed framewk f emerging AI applications, in: Proc. of the 13th USENIX Conf. on Operating Systems Design Implementation, Carlsbad, USA, 2018, pp. 561–577.
[74] Zhang Z.-D., Zhang D.-X., Qiu R.C.. Deep reinforcement learning for power system applications: an overview. CSEE J. Power Energy, 6, 213-225(2019).
[75] B. McMahan, E. Moe, D. Ramage, S. Hampson, B.A.Y Arcas, Communicationefficient learning of deep wks from decentralized data, in: Proc. of the 20th Intl. Conf. on Artificial Intelligence Statistics, Ft Lauderdale, USA, 2017, pp. 1273–1282.
[76] A. Gupta, C. Devin, Y.X. Liu, P. Abbeel, S. Levine, Learning invariant feature spaces to transfer skills with reinfcement learning, in: Proc. of the 5th Intl. Conf. on Learning Representations, Toulon, France, 2017, pp. 1–14.