Difference between revisions of "Principal Component Analysis"

From EosPedia
Jump to: navigation, search
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Principal Component Analysis''' is the method that calculates from vector data set which has elements of Multivariate to the axis (main axis) which has maximum variance when each vector data is projected into that axis on Multivariate Space, and  
+
'''Principal Component Analysis''' is the method that calculates from vector data set which has elements of Multivariate to the axis (principal axis) which has maximum variance when each vector data is projected into that axis on Multivariate Space, and calculates the axis sequentially that is orthogonal(no correlation) to it and has largest variance.
whose vector distribution is larger (main axis) from set of vector whose consist of multivariates to Multivariate Axis (Multivariate Space).
+
  
 
[[File:Fig-PCA.png]]
 
[[File:Fig-PCA.png]]
 +
 +
== Execution example of PCA ==
 +
=== PCA of each image ===
 +
<div>Classify multiple images by using [[mrcImagePCA]] mainly.</div>
 +
<br>
 +
 +
<div>[[:Media:Input-PCA.zip|Input file]]'s image</div>
 +
<div>[[File:Input-PCA.png]]</div>
 +
<div>rotate the image at 10 pattern (vertical), and add noise at 10 pattern (Horizontal). (Sum 100)</div>
 +
<br>
 +
 +
<div>First, calculates principal axis by using [[mrcImagePCA]].</div>
 +
<br>
 +
 +
<div>NO2_ROI_LIST's data</div>
 +
<pre>
 +
Target-1-0-0-0.nroi
 +
Target-1-0-0-1.nroi
 +
Target-1-0-0-2.nroi
 +
Target-1-0-0-3.nroi
 +
Target-1-0-0-4.nroi
 +
Target-1-0-0-5.nroi
 +
Target-1-0-0-6.nroi
 +
Target-1-0-0-7.nroi
 +
Target-1-0-0-8.nroi
 +
Target-1-0-0-9.nroi
 +
Target-37-0-0-0.nroi
 +
Target-37-0-0-1.nroi
 +
 +
...
 +
 +
Target-289-0-0-8.nroi
 +
Target-289-0-0-9.nroi
 +
Target-325-0-0-0.nroi
 +
Target-325-0-0-1.nroi
 +
Target-325-0-0-2.nroi
 +
Target-325-0-0-3.nroi
 +
Target-325-0-0-4.nroi
 +
Target-325-0-0-5.nroi
 +
Target-325-0-0-6.nroi
 +
Target-325-0-0-7.nroi
 +
Target-325-0-0-8.nroi
 +
Target-325-0-0-9.nroi
 +
</pre>
 +
<br>
 +
 +
<div>TEST_PCA_LIST's data</div>
 +
<pre>
 +
Target-1-0-0-0.tpca
 +
Target-1-0-0-1.tpca
 +
Target-1-0-0-2.tpca
 +
Target-1-0-0-3.tpca
 +
Target-1-0-0-4.tpca
 +
Target-1-0-0-5.tpca
 +
Target-1-0-0-6.tpca
 +
Target-1-0-0-7.tpca
 +
Target-1-0-0-8.tpca
 +
Target-1-0-0-9.tpca
 +
Target-37-0-0-0.tpca
 +
Target-37-0-0-1.tpca
 +
 +
...
 +
 +
Target-289-0-0-8.tpca
 +
Target-289-0-0-9.tpca
 +
Target-325-0-0-0.tpca
 +
Target-325-0-0-1.tpca
 +
Target-325-0-0-2.tpca
 +
Target-325-0-0-3.tpca
 +
Target-325-0-0-4.tpca
 +
Target-325-0-0-5.tpca
 +
Target-325-0-0-6.tpca
 +
Target-325-0-0-7.tpca
 +
Target-325-0-0-8.tpca
 +
Target-325-0-0-9.tpca
 +
</pre>
 +
<br>
 +
 +
<div>Command</div>
 +
<pre>
 +
mrcImagePCA -i NO2_ROI_LIST -o TEST_PCA_LIST -NX 39 -NY 39 -numE 20 -O EIGEN_INFO -E eigen -EPS 100;
 +
</pre>
 +
<br>
 +
 +
<div>Check the eigenvalues after command run.</div>
 +
<br>
 +
 +
<div>EIGEN_INFO's data</div>
 +
<pre>
 +
  0  485  13783745.48  16.25
 +
  1  600  6874158.21  24.36
 +
  2  997  6040647.42  31.48
 +
  3  529  5425460.64  37.88
 +
  4  834  4720681.32  43.45
 +
  5  879  3932086.98  48.08
 +
  6  842  3632776.78  52.37
 +
  7  645  3182620.81  56.12
 +
  8  566  2449230.98  59.01
 +
  9  1116  1328891.76  60.57
 +
  10  1031  1287023.24  62.09
 +
  11  579  1257054.49  63.57
 +
  12  1080  1214056.15  65.01
 +
  13  856  1161105.65  66.38
 +
  14  934  1144996.99  67.73
 +
 +
...
 +
 +
</pre>
 +
<div>Data are arranged in the order of height of the eigenvalues (3rd columns). See the figure below. In this case, Eigenvalues of ​​up to the 8th component is higher than others. You can see that it can be explained up to 60% dispersion.</div>
 +
<div>[[File:EigenValuePCA-mrcImagePCA.png|400px]]</div>
 +
<br>
 +
 +
<div>Look about the scatter plot at 1st ~ 3rd component.</div>
 +
<div>The file specified at [[mrcImagePCA]]'s option -o is stored the vector elements of each image in the order of height of the eigenvalues. Thus, by using upper level of this data, you can see which group the image is belong. In addition, by using [[mrcImageMakeDump]], [[mrcImage]]'s data can be output as [[ASCII]].</div>
 +
<br>
 +
 +
<div>[[:Media:Output-PCA.tsv.zip|Data that collected up to 10th components from each file]]</div>
 +
<pre>
 +
-1002.110000 1962.390000 2375.080000 3780.900000 1531.830000 -3511.960000 -524.329000 1190.540000 -1106.170000 337.342000
 +
-1111.780000 2439.510000 2452.540000 3826.020000 1630.650000 -3519.130000 -457.767000 1531.510000 -316.514000 -2399.750000
 +
-844.584000 2207.500000 2577.200000 3895.480000 1722.810000 -3401.740000 -573.914000 961.414000 -1120.780000 75.002400
 +
-897.296000 2107.620000 2308.710000 3974.960000 1590.460000 -3559.020000 -836.757000 1690.460000 -332.499000 46.332400
 +
-639.501000 2286.200000 2513.990000 3868.320000 1741.350000 -3316.310000 -553.213000 1443.870000 -1044.260000 560.677000
 +
-1015.980000 2549.020000 2049.920000 3854.560000 1503.460000 -3118.820000 -919.956000 1212.420000 -792.175000 1047.500000
 +
-892.673000 2168.280000 2455.920000 3951.430000 1400.510000 -3498.790000 -528.413000 1509.180000 -1141.580000 10.826800
 +
-799.775000 2190.870000 2994.040000 3730.140000 1208.160000 -3002.190000 -538.733000 800.946000 -1115.250000 380.240000
 +
-1061.460000 2100.710000 2348.670000 3881.800000 1573.210000 -3440.970000 -606.476000 1363.520000 -649.180000 422.532000
 +
-782.003000 2198.650000 2594.880000 3976.130000 1891.720000 -3371.260000 -531.849000 1410.830000 -957.755000 148.813000
 +
-4295.390000 4650.010000 2406.060000 -2699.870000 1602.340000 2108.780000 1198.180000 -963.790000 565.743000 256.211000
 +
-4371.810000 4724.700000 2581.440000 -2274.290000 1625.690000 1392.540000 1677.500000 -492.734000 770.713000 -2612.010000
 +
 +
...
 +
 +
533.205000 -1655.000000 3206.460000 1451.350000 -4268.840000 -120.817000 900.958000 -2478.230000 428.414000 361.083000
 +
667.314000 -1569.670000 2828.330000 1229.300000 -4102.610000 -108.603000 1067.760000 -2409.760000 875.312000 -147.388000
 +
703.368000 -1789.180000 3114.700000 1694.560000 -4408.100000 -286.116000 1112.690000 -2717.040000 595.316000 -136.777000
 +
546.967000 -2147.200000 3076.260000 1691.570000 -4386.750000 -567.557000 963.625000 -2624.400000 913.221000 -49.355200
 +
567.210000 -1505.020000 2555.640000 1290.990000 -4242.580000 -407.482000 1022.360000 -2779.230000 636.427000 308.664000
 +
727.232000 -1522.510000 2804.310000 1861.250000 -4377.870000 -163.006000 1417.020000 -2410.950000 776.618000 186.569000
 +
538.177000 -1556.390000 2774.820000 1342.150000 -4350.580000 -378.349000 1186.870000 -2627.670000 619.576000 60.747200
 +
466.937000 -1725.230000 3004.230000 1525.800000 -4514.230000 -370.642000 1165.460000 -2520.760000 654.709000 54.430900
 +
</pre>
 +
<br>
 +
 +
<div>Incidentally, by using [[:Media:Input-PCA.zip|Input file]]に対して[[:Media:Mikefile-PCA.zip|This Makefile]] and running at the following commands, you can execute the method of up to here.</div>
 +
<br>
 +
 +
<div>Collect 10 rows of this file as 1 angle's data, and display the scatter plot whose axis is as each column.</div>
 +
<table>
 +
<tr>
 +
<td><p align="Center">[[File:Output-PCA.png]]</p> </td>
 +
</tr>
 +
<tr>
 +
<td><p align="Center">the scatter plot by 1st(Vertical) and 2nd(Horizontal)</p> </td>
 +
</tr>
 +
</table>
 +
<br>
 +
 +
<table>
 +
<tr>
 +
<td><p align="Center">[[File:Output1-PCA.png]]</p> </td>
 +
</tr>
 +
<tr>
 +
<td><p align="Center">the scatter plot by 1st(Vertical) and 3rd(Horizontal)</p> </td>
 +
</tr>
 +
</table>
 +
<br>
 +
 +
<table>
 +
<tr>
 +
<td><p align="Center">[[File:Output2-PCA.png]]</p> </td>
 +
</tr>
 +
<tr>
 +
<td><p align="Center">the scatter plot by 2nd(Vertical) and 3rd(Horizontal)</p> </td>
 +
</tr>
 +
</table>
 +
<br>
 +
 +
<div>Classify the images based on the scatter plot. If its pattern is above 10, it means that on 3D they can be almost classified.</div>
 +
<br>

Latest revision as of 02:10, 8 August 2014

Principal Component Analysis is the method that calculates from vector data set which has elements of Multivariate to the axis (principal axis) which has maximum variance when each vector data is projected into that axis on Multivariate Space, and calculates the axis sequentially that is orthogonal(no correlation) to it and has largest variance.

Fig-PCA.png

Execution example of PCA

PCA of each image

Classify multiple images by using mrcImagePCA mainly.


Input file's image
Input-PCA.png
rotate the image at 10 pattern (vertical), and add noise at 10 pattern (Horizontal). (Sum 100)


First, calculates principal axis by using mrcImagePCA.


NO2_ROI_LIST's data
Target-1-0-0-0.nroi
Target-1-0-0-1.nroi
Target-1-0-0-2.nroi
Target-1-0-0-3.nroi
Target-1-0-0-4.nroi
Target-1-0-0-5.nroi
Target-1-0-0-6.nroi
Target-1-0-0-7.nroi
Target-1-0-0-8.nroi
Target-1-0-0-9.nroi
Target-37-0-0-0.nroi
Target-37-0-0-1.nroi

...

Target-289-0-0-8.nroi
Target-289-0-0-9.nroi
Target-325-0-0-0.nroi
Target-325-0-0-1.nroi
Target-325-0-0-2.nroi
Target-325-0-0-3.nroi
Target-325-0-0-4.nroi
Target-325-0-0-5.nroi
Target-325-0-0-6.nroi
Target-325-0-0-7.nroi
Target-325-0-0-8.nroi
Target-325-0-0-9.nroi


TEST_PCA_LIST's data
Target-1-0-0-0.tpca
Target-1-0-0-1.tpca
Target-1-0-0-2.tpca
Target-1-0-0-3.tpca
Target-1-0-0-4.tpca
Target-1-0-0-5.tpca
Target-1-0-0-6.tpca
Target-1-0-0-7.tpca
Target-1-0-0-8.tpca
Target-1-0-0-9.tpca
Target-37-0-0-0.tpca
Target-37-0-0-1.tpca

...

Target-289-0-0-8.tpca
Target-289-0-0-9.tpca
Target-325-0-0-0.tpca
Target-325-0-0-1.tpca
Target-325-0-0-2.tpca
Target-325-0-0-3.tpca
Target-325-0-0-4.tpca
Target-325-0-0-5.tpca
Target-325-0-0-6.tpca
Target-325-0-0-7.tpca
Target-325-0-0-8.tpca
Target-325-0-0-9.tpca


Command
mrcImagePCA -i NO2_ROI_LIST -o TEST_PCA_LIST -NX 39 -NY 39 -numE 20 -O EIGEN_INFO -E eigen -EPS 100;


Check the eigenvalues after command run.


EIGEN_INFO's data
   0   485  13783745.48  16.25
   1   600  6874158.21  24.36
   2   997  6040647.42  31.48
   3   529  5425460.64  37.88
   4   834  4720681.32  43.45
   5   879  3932086.98  48.08
   6   842  3632776.78  52.37
   7   645  3182620.81  56.12
   8   566  2449230.98  59.01
   9  1116  1328891.76  60.57
  10  1031  1287023.24  62.09
  11   579  1257054.49  63.57
  12  1080  1214056.15  65.01
  13   856  1161105.65  66.38
  14   934  1144996.99  67.73

...

Data are arranged in the order of height of the eigenvalues (3rd columns). See the figure below. In this case, Eigenvalues of ​​up to the 8th component is higher than others. You can see that it can be explained up to 60% dispersion.
EigenValuePCA-mrcImagePCA.png


Look about the scatter plot at 1st ~ 3rd component.
The file specified at mrcImagePCA's option -o is stored the vector elements of each image in the order of height of the eigenvalues. Thus, by using upper level of this data, you can see which group the image is belong. In addition, by using mrcImageMakeDump, mrcImage's data can be output as ASCII.


Data that collected up to 10th components from each file
	-1002.110000	1962.390000	2375.080000	3780.900000	1531.830000	-3511.960000	-524.329000	1190.540000	-1106.170000	337.342000
	-1111.780000	2439.510000	2452.540000	3826.020000	1630.650000	-3519.130000	-457.767000	1531.510000	-316.514000	-2399.750000
	-844.584000	2207.500000	2577.200000	3895.480000	1722.810000	-3401.740000	-573.914000	961.414000	-1120.780000	75.002400
	-897.296000	2107.620000	2308.710000	3974.960000	1590.460000	-3559.020000	-836.757000	1690.460000	-332.499000	46.332400
	-639.501000	2286.200000	2513.990000	3868.320000	1741.350000	-3316.310000	-553.213000	1443.870000	-1044.260000	560.677000
	-1015.980000	2549.020000	2049.920000	3854.560000	1503.460000	-3118.820000	-919.956000	1212.420000	-792.175000	1047.500000
	-892.673000	2168.280000	2455.920000	3951.430000	1400.510000	-3498.790000	-528.413000	1509.180000	-1141.580000	10.826800
	-799.775000	2190.870000	2994.040000	3730.140000	1208.160000	-3002.190000	-538.733000	800.946000	-1115.250000	380.240000
	-1061.460000	2100.710000	2348.670000	3881.800000	1573.210000	-3440.970000	-606.476000	1363.520000	-649.180000	422.532000
	-782.003000	2198.650000	2594.880000	3976.130000	1891.720000	-3371.260000	-531.849000	1410.830000	-957.755000	148.813000
	-4295.390000	4650.010000	2406.060000	-2699.870000	1602.340000	2108.780000	1198.180000	-963.790000	565.743000	256.211000
	-4371.810000	4724.700000	2581.440000	-2274.290000	1625.690000	1392.540000	1677.500000	-492.734000	770.713000	-2612.010000

...

	533.205000	-1655.000000	3206.460000	1451.350000	-4268.840000	-120.817000	900.958000	-2478.230000	428.414000	361.083000
	667.314000	-1569.670000	2828.330000	1229.300000	-4102.610000	-108.603000	1067.760000	-2409.760000	875.312000	-147.388000
	703.368000	-1789.180000	3114.700000	1694.560000	-4408.100000	-286.116000	1112.690000	-2717.040000	595.316000	-136.777000
	546.967000	-2147.200000	3076.260000	1691.570000	-4386.750000	-567.557000	963.625000	-2624.400000	913.221000	-49.355200
	567.210000	-1505.020000	2555.640000	1290.990000	-4242.580000	-407.482000	1022.360000	-2779.230000	636.427000	308.664000
	727.232000	-1522.510000	2804.310000	1861.250000	-4377.870000	-163.006000	1417.020000	-2410.950000	776.618000	186.569000
	538.177000	-1556.390000	2774.820000	1342.150000	-4350.580000	-378.349000	1186.870000	-2627.670000	619.576000	60.747200
	466.937000	-1725.230000	3004.230000	1525.800000	-4514.230000	-370.642000	1165.460000	-2520.760000	654.709000	54.430900


Incidentally, by using Input fileに対してThis Makefile and running at the following commands, you can execute the method of up to here.


Collect 10 rows of this file as 1 angle's data, and display the scatter plot whose axis is as each column.

Output-PCA.png

the scatter plot by 1st(Vertical) and 2nd(Horizontal)


Output1-PCA.png

the scatter plot by 1st(Vertical) and 3rd(Horizontal)


Output2-PCA.png

the scatter plot by 2nd(Vertical) and 3rd(Horizontal)


Classify the images based on the scatter plot. If its pattern is above 10, it means that on 3D they can be almost classified.