Author Affiliations
1School of Information, Beijing Wuzi University, Beijing 101149, China2National Center for Materials Service Safety, University of Science and Technology Beijing, Beijing 100083, Chinashow less
Fig. 1. Network of concept graph.概念图谱网络示意图
Fig. 2. Network structure leading to the overlap of neighbor node sets.导致节点的邻居节点集合冗余的网络结构
Fig. 3. Hierarchical logical structure of the largest connected subnet.最大连通子网的分层逻辑结构
Fig. 4. Cumulative degree distribution of the largest connected subnet.概念图谱最大连通子网累积度分布
Fig. 5. Relationship between degree and k-shell.
节点度与k-shell分解中心性关系
Fig. 6. Time cost of NetworkX for avg(l).
NetworkX计算平均路径所需时间
Fig. 7. Relationships of RealAvg(l) and AppAvg(l) to n.
平均路径精确值、近似值与节点数的关系
Fig. 8. Average clustering coefficient distribution corresponding to degree.度值对应的平均聚类系数分布
Fig. 9. Analysis of degree and average degree of neighbor nodes.度-邻点平均度相关性分析
Fig. 10. Size of the giant component when nodes are removed.知识丢失对概念图谱完整性的影响
k | Concept | | k | Instance | | k | SubConcept | Quantity | Proportion | | Quantity | Proportion | | Quantity | Proportion | 1 | 2061496 | 0.464809 | | 1 | 9610146 | 0.831317 | | 1 | 18 | 1.91208 × 10–5 | 2 | 1165725 | 0.262838 | | 2 | 1014355 | 0.087746 | | 2 | 140735 | 0.149498132 | 3 | 538652 | 0.121451 | | 3 | 312310 | 0.027016 | | 3 | 186330 | 0.197932191 | 4 | 252438 | 0.056918 | | 4 | 156703 | 0.013555 | | 4 | 125273 | 0.133073361 | 5 | 130975 | 0.029531 | | 5 | 96100 | 0.008313 | | 5 | 78365 | 0.083244546 | 6 | 74760 | 0.016856 | | 6 | 64809 | 0.005606 | | 6 | 52113 | 0.055357915 | 7 | 46336 | 0.010447 | | 7 | 46409 | 0.004015 | | 7 | 37615 | 0.039957169 | 8 | 31509 | 0.007104 | | 8 | 34801 | 0.00301 | | 8 | 29314 | 0.031139292 | 9 | 22506 | 0.005074 | | 9 | 27177 | 0.002351 | | 9 | 23419 | 0.024877229 | 10 | 16510 | 0.003723 | | 10 | 21928 | 0.001897 | | 10 | 19484 | 0.020697208 | 11 | 12921 | 0.002913 | | 11 | 18095 | 0.001565 | | 11 | 16465 | 0.017490224 | 12 | 10012 | 0.002257 | | 12 | 14935 | 0.001292 | | 12 | 14188 | 0.015071443 | 13 | 8188 | 0.001846 | | 13 | 12688 | 0.001098 | | 13 | 12491 | 0.013268776 | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() | | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() | | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() | 32773 | 1 | 2.25 × 10–7 | | 6716 | 1 | 8.65 × 10–8 | | 364276 | 1 | 1.06227 × 10–6 | Total | 4435143 | 1 | | Total | 11560144 | 1 | | Total | 941383 | 1 |
|
Table 1. Degree distribution of the concept graph network.
概念图谱网络的节点度分布
Algorithm | Parameters | | Time complexity | Time cost | NetworkX | — | — | | — | 15 d以上 | SNEBF | m = 15 114 834
| n = 33 377 320
| m × 2n = 15 114 834 × 2n | 约5.22 a | SNESO | nl = 12
| n = 33 377 320
| nl × 3.2n = 19.2 × 2n | 3.49 min (实际运算3.80 min) |
|
Table 2. Time complexity of the subset extraction algorithms.
最大子网提取算法时间复杂度对比表
Algorithm | Parameters | Space complexity | Memory cost | NetworkX | — | — | 40 GB | ESNSO | SubNeti, NeighborsSet, MaxSubNet
| 31724479 | 5.23 GB |
|
Table 3. Space complexity of the algorithms.
算法空间复杂度和实际内存消耗对比表
k | Concept | | Instance | | SubConcept | | Total | Quantity | Percentage | Quantity | Percentage | Quantity | Percentage | Quantity | 1 | 1468308 | 40.3 | | 8636049 | 82.0 | | 0 | 0.0 | | 10104357 | 2 | 1014641 | 27.9 | 968156 | 9.2 | 138875 | 14.8 | 2121672 | 3 | 503109 | 13.8 | 309716 | 2.9 | 185665 | 19.8 | 998490 | 4 | 242308 | 6.7 | 156248 | 1.5 | 125045 | 13.3 | 523601 | 5 | 127833 | 3.5 | 96027 | 0.9 | 78311 | 8.3 | 302171 | 6 | 73429 | 2.0 | 64778 | 0.6 | 52090 | 5.6 | 190297 | 7 | 45834 | 1.3 | 46398 | 0.4 | 37604 | 4.0 | 129836 | 8 | 31237 | 0.9 | 34799 | 0.3 | 29301 | 3.1 | 95337 | 9 | 22401 | 0.6 | 27173 | 0.3 | 23417 | 2.5 | 72991 | 10 | 16439 | 0.5 | 21925 | 0.2 | 19488 | 2.1 | 57852 | 11 | 12880 | 0.4 | 18095 | 0.2 | 16464 | 1.8 | 47439 | 12 | 9981 | 0.3 | 14934 | 0.1 | 14187 | 1.5 | 39102 | 13 | 8169 | 0.2 | 12688 | 0.1 | 12503 | 1.3 | 33360 | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() | Total | 3639631 | ··· | 10536663 | ··· | 938540 | ··· | 15114838 |
|
Table 4. Degree distribution analysis of the largest connected subnet.
node | k | | node | k | factor | 364343 | | Event | 113364 | feature | 204130 | | company | 110609 | issue | 202331 | | program | 93963 | product | 174283 | | technique | 92341 | item | 159164 | | application | 90644 | area | 144595 | | organization | 90605 | topic | 137781 | | Name | 87637 | service | 137398 | | Case | 85863 | activity | 124670 | | method | 84157 | information | 114500 | | project | 82122 |
|
Table 5. Top 20 nodes with the highest degree in core.
t | n | e | | t | n | e | | t | n | e | | t | n | e | 10 | 415491 | 936536 | | 60 | 27654 | 66704 | | 150 | 9492 | 19845 | | 600 | 1788 | 2637 | 20 | 119367 | 303323 | | 70 | 23089 | 54533 | | 200 | 6850 | 13315 | | 700 | 1453 | 2059 | 30 | 66272 | 169774 | | 80 | 19567 | 45510 | | 300 | 4336 | 7566 | | 800 | 1076 | 1487 | 40 | 45410 | 114629 | | 90 | 17042 | 38921 | | 400 | 3013 | 4920 | | 900 | 922 | 1222 | 50 | 34467 | 85085 | | 100 | 15086 | 33983 | | 500 | 2318 | 3577 | | 1000 | 770 | 994 |
|
Table 6. Threshold networks and the number of nodes.
Layer | Quantity | | Layer | Quantity | | Layer | Quantity | | Layer | Quantity | 1 | 2 | | 4 | 4639826 | | 7 | 11921 | | 10 | 73 | 2 | 553406 | | 5 | 639119 | | 8 | 1609 | | 11 | 16 | 3 | 9202185 | | 6 | 66347 | | 9 | 327 | | 12 | 3 |
|
Table 7. Subnet structure and quantity of nodes.
Layer | t = 10
| t = 20
| t = 30
| t = 50
| t = 100
| t = 200
| t = 300
| t = 500
| t = 1000
| 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 26993 | 10212 | 6226 | 3409 | 1534 | 748 | 456 | 258 | 88 | 3 | 223659 | 65471 | 34244 | 16456 | 6445 | 2437 | 1305 | 558 | 103 | 4 | 131967 | 36095 | 21646 | 12077 | 5712 | 2829 | 1722 | 832 | 205 | 5 | 28219 | 6590 | 3563 | 2037 | 1116 | 656 | 661 | 438 | 108 | 6 | 3745 | 821 | 505 | 418 | 206 | 129 | 157 | 187 | 128 | 7 | 766 | 143 | 74 | 55 | 62 | 41 | 30 | 29 | 89 | 8 | 98 | 25 | 12 | 11 | 7 | 4 | 3 | 13 | 29 | 9 | 27 | 6 | — | 2 | 2 | 3 | — | 1 | 13 | 10 | 12 | 2 | — | — | — | 1 | — | — | 5 | 11 | 3 | — | — | — | — | — | — | — | — |
|
Table 8. Layer structure and node number in each layer.
k | Nk | knn(k)
| | k | Nk | knn(k)
| 1 | 10104357 | 31235.02 | | 159164 | 1(item) | 122.577 | 2 | 2121672 | 13384.27 | | 174283 | 1(product) | 98.812 | 3 | 998490 | 10435.94 | | 202331 | 1(issue) | 91.266 | 4 | 523601 | 10231.12 | | 204130 | 1(feature) | 85.926 | 5 | 302171 | 10388.98 | | 364343 | 1(factor) | 56.088 | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() | | $\vdots$![]() ![]() | $\vdots$![]() ![]() | $\vdots$![]() ![]() |
|
Table 9. Part of Nk and knn(k).