Principal Component Analysis

From EosPedia
Jump to: navigation, search

Principal Component Analysis is the method that calculates from vector data set which has elements of Multivariate to the axis (principal axis) which has maximum variance when each vector data is projected into that axis on Multivariate Space, and calculates the axis sequentially that is orthogonal(no correlation) to it and has largest variance.

Fig-PCA.png

Execution example of PCA

PCA of each image

Classify multiple images by using mrcImagePCA mainly.


Input file's image
Input-PCA.png
rotate the image at 10 pattern (vertical), and add noise at 10 pattern (Horizontal). (Sum 100)


First, calculates principal axis by using mrcImagePCA.


NO2_ROI_LIST's data
Target-1-0-0-0.nroi
Target-1-0-0-1.nroi
Target-1-0-0-2.nroi
Target-1-0-0-3.nroi
Target-1-0-0-4.nroi
Target-1-0-0-5.nroi
Target-1-0-0-6.nroi
Target-1-0-0-7.nroi
Target-1-0-0-8.nroi
Target-1-0-0-9.nroi
Target-37-0-0-0.nroi
Target-37-0-0-1.nroi

...

Target-289-0-0-8.nroi
Target-289-0-0-9.nroi
Target-325-0-0-0.nroi
Target-325-0-0-1.nroi
Target-325-0-0-2.nroi
Target-325-0-0-3.nroi
Target-325-0-0-4.nroi
Target-325-0-0-5.nroi
Target-325-0-0-6.nroi
Target-325-0-0-7.nroi
Target-325-0-0-8.nroi
Target-325-0-0-9.nroi


TEST_PCA_LIST's data
Target-1-0-0-0.tpca
Target-1-0-0-1.tpca
Target-1-0-0-2.tpca
Target-1-0-0-3.tpca
Target-1-0-0-4.tpca
Target-1-0-0-5.tpca
Target-1-0-0-6.tpca
Target-1-0-0-7.tpca
Target-1-0-0-8.tpca
Target-1-0-0-9.tpca
Target-37-0-0-0.tpca
Target-37-0-0-1.tpca

...

Target-289-0-0-8.tpca
Target-289-0-0-9.tpca
Target-325-0-0-0.tpca
Target-325-0-0-1.tpca
Target-325-0-0-2.tpca
Target-325-0-0-3.tpca
Target-325-0-0-4.tpca
Target-325-0-0-5.tpca
Target-325-0-0-6.tpca
Target-325-0-0-7.tpca
Target-325-0-0-8.tpca
Target-325-0-0-9.tpca


Command
mrcImagePCA -i NO2_ROI_LIST -o TEST_PCA_LIST -NX 39 -NY 39 -numE 20 -O EIGEN_INFO -E eigen -EPS 100;


Check the eigenvalues after command run.


EIGEN_INFO's data
   0   485  13783745.48  16.25
   1   600  6874158.21  24.36
   2   997  6040647.42  31.48
   3   529  5425460.64  37.88
   4   834  4720681.32  43.45
   5   879  3932086.98  48.08
   6   842  3632776.78  52.37
   7   645  3182620.81  56.12
   8   566  2449230.98  59.01
   9  1116  1328891.76  60.57
  10  1031  1287023.24  62.09
  11   579  1257054.49  63.57
  12  1080  1214056.15  65.01
  13   856  1161105.65  66.38
  14   934  1144996.99  67.73

...

Data are arranged in the order of height of the eigenvalues (3rd columns). See the figure below. In this case, Eigenvalues of ​​up to the 8th component is higher than others. You can see that it can be explained up to 60% dispersion.
EigenValuePCA-mrcImagePCA.png


Look about the scatter plot at 1st ~ 3rd component.
The file specified at mrcImagePCA's option -o is stored the vector elements of each image in the order of height of the eigenvalues. Thus, by using upper level of this data, you can see which group the image is belong. In addition, by using mrcImageMakeDump, mrcImage's data can be output as ASCII.


Data that collected up to 10th components from each file
	-1002.110000	1962.390000	2375.080000	3780.900000	1531.830000	-3511.960000	-524.329000	1190.540000	-1106.170000	337.342000
	-1111.780000	2439.510000	2452.540000	3826.020000	1630.650000	-3519.130000	-457.767000	1531.510000	-316.514000	-2399.750000
	-844.584000	2207.500000	2577.200000	3895.480000	1722.810000	-3401.740000	-573.914000	961.414000	-1120.780000	75.002400
	-897.296000	2107.620000	2308.710000	3974.960000	1590.460000	-3559.020000	-836.757000	1690.460000	-332.499000	46.332400
	-639.501000	2286.200000	2513.990000	3868.320000	1741.350000	-3316.310000	-553.213000	1443.870000	-1044.260000	560.677000
	-1015.980000	2549.020000	2049.920000	3854.560000	1503.460000	-3118.820000	-919.956000	1212.420000	-792.175000	1047.500000
	-892.673000	2168.280000	2455.920000	3951.430000	1400.510000	-3498.790000	-528.413000	1509.180000	-1141.580000	10.826800
	-799.775000	2190.870000	2994.040000	3730.140000	1208.160000	-3002.190000	-538.733000	800.946000	-1115.250000	380.240000
	-1061.460000	2100.710000	2348.670000	3881.800000	1573.210000	-3440.970000	-606.476000	1363.520000	-649.180000	422.532000
	-782.003000	2198.650000	2594.880000	3976.130000	1891.720000	-3371.260000	-531.849000	1410.830000	-957.755000	148.813000
	-4295.390000	4650.010000	2406.060000	-2699.870000	1602.340000	2108.780000	1198.180000	-963.790000	565.743000	256.211000
	-4371.810000	4724.700000	2581.440000	-2274.290000	1625.690000	1392.540000	1677.500000	-492.734000	770.713000	-2612.010000

...

	533.205000	-1655.000000	3206.460000	1451.350000	-4268.840000	-120.817000	900.958000	-2478.230000	428.414000	361.083000
	667.314000	-1569.670000	2828.330000	1229.300000	-4102.610000	-108.603000	1067.760000	-2409.760000	875.312000	-147.388000
	703.368000	-1789.180000	3114.700000	1694.560000	-4408.100000	-286.116000	1112.690000	-2717.040000	595.316000	-136.777000
	546.967000	-2147.200000	3076.260000	1691.570000	-4386.750000	-567.557000	963.625000	-2624.400000	913.221000	-49.355200
	567.210000	-1505.020000	2555.640000	1290.990000	-4242.580000	-407.482000	1022.360000	-2779.230000	636.427000	308.664000
	727.232000	-1522.510000	2804.310000	1861.250000	-4377.870000	-163.006000	1417.020000	-2410.950000	776.618000	186.569000
	538.177000	-1556.390000	2774.820000	1342.150000	-4350.580000	-378.349000	1186.870000	-2627.670000	619.576000	60.747200
	466.937000	-1725.230000	3004.230000	1525.800000	-4514.230000	-370.642000	1165.460000	-2520.760000	654.709000	54.430900


Incidentally, by using Input fileに対してThis Makefile and running at the following commands, you can execute the method of up to here.


Collect 10 rows of this file as 1 angle's data, and display the scatter plot whose axis is as each column.

Output-PCA.png

the scatter plot by 1st(Vertical) and 2nd(Horizontal)


Output1-PCA.png

the scatter plot by 1st(Vertical) and 3rd(Horizontal)


Output2-PCA.png

the scatter plot by 2nd(Vertical) and 3rd(Horizontal)


Classify the images based on the scatter plot. If its pattern is above 10, it means that on 3D they can be almost classified.