A Novel Clustered Support Vector Machine with Reduced Support Vectors for Big Data Classification
Gokkul Nath T.S and Ramanathan. R*
Department of Electronics and Communication Engineering,
Amrita School of Engineering, Coimbatore,
Amrita Vishwa Vidhyapeetham, Amrita University, India.
Email: r_ramanathan@cb.amrita.edu*.
Abstract— Big data classification demands support vector models with huge number of support vectors, which are prone to overfitting and complex in nature. In this paper, we propose a method to solve the overfitting problem and improve the generalization of the model by reducing the number of support vectors. Using the proposed approach, the number of support vector has been reduced on average of 90% of the original count when trained using conventional SVM approach. The Discussed method proves to improve the performance of the conventional method by reducing the number of support vectors at cost of accuracy. This method can be used in applications involving long term prediction like weather/climate prediction and time critical applications which require rapid performance with a compensated accuracy.
I.INTRODUCTION
Support vector machines (SVM) are powerful binary classifiers based on optimal hyper
planes with hard or soft margins. SVM based solutions have successfully been applied to various
problems ranging from text categorization[1,2], face recognition, detection, and verification [3,4],
speech recognition, to bankruptcy prediction[5], bioinformatics[6], remote sensed data analysis[7],
information and image retrieval, time series forecasting, information security and etc. SVMs
classifiers are fast, have capability to use kernels, solutions obtained are sparse in nature and
have no local maxima. Further, optimizing margin can be controlled easily. SVM overcomes various
traditional issues like overfitting , curse of dimensionality and etc. SVMs have well established
theoretical foundation and implementation. They are gaining popularity and have rapid development
because they possess various captivating features which include: Excellent mathematical
illustrations, good generalization capabilities and reassuring good performance. Reduction in number
of support vectors is crucial because large number of support vectors makes the model more complex
and the model becomes prone to
II.CONVENTIONAL SVM METHOD
Vladimir Vapnik along with his
A. Linear SVM Model: The Hyperplane separating the positive and negative examples is a line given by the equation:
Figure 1: Linear SVM Model [16]
v = w. x + b
- Vector normal to the Hyperplane.
PDF version available at
– Input vector. b- Bias term.
The Hyperplane separating the positive and negative instances are given by the plane w. x + b = 0 . The support vectors lie on the planes w. x + b = ±1
The margin |
= |
|
‖ ‖ |
Maximization of margin can be expressed by the following optimization problem:
1 |
‖ ‖2 Such that |
. − > 1. |
|
2 |
|||
|
|
,
B.Nonlinear SVM: In this case, Hyperplane separating the positive and negative examples set is obtained by applying the kernel trick ie. Mapping the given data to higher dimensions such that positive examples and negative examples set are linearly separated by a hyperplane.Figure 2 shows that nonlinear data mapped to higher dimension
using kenel function and construction of hyperplane in higher dimensional space.
Figure 2: Example for Nonlinear SVM Model [13]
C. Types Of SVM Models:
1) Hard Margin SVM:
If the data set is linearly separable, two hyperplanes are selected such that distance between them is large and no misclassification is allowed. When data becomes linearly inseparable the width of margin becomes very small consequently the performance of the classifier decreases. Thus, hard margins usually preferred only when the data is linearly separable.
2)Soft Margin SVM
When the data set is nonlinear and inseparable, a loss function is introduced such that a tradeoff between width of the margin and` misclassification error is established. Misclassification errors can be separation of data on wrong side of Hyperplane. Soft margins can be applied to all problems.
D. Stratergy to Solve SVM Problem:
SVMs are constructed by reduction of machine learning problems into optimization
problem and these optimization tasks are solved using various techniques such as linear programming,
quadratic programming,
A huge quadratic programming (QP) optimization problem must be solved in order to train a SVM. Hence, to reduce the computation complexity this huge QP problem is broken down successively into smaller QP problem. These smaller QP problems can be solved analytically henceforth reducing the computation complexity and training time.
E. OpenSource SVM Libraries:
SVMlight,
III.PROPOSED CLUSTERED SVM
Reduction in number of Support Vector is essential because it reduces the computation complexity of the model, which in turn gives the user the ability to implement real time applications on low power computing devices and reduces hardware requirement. Further, it also reduces the latency time as reduction in number of support vectors makes the classification process faster and easier for upgrading /reconfiguration of SVM model if required. When Conventional SVM Method is used to models typically have nominal number of support vectors which contain components that are not important. Thus the model becomes complex and inefficient. Hence, the proposed method can be used to supress the number of support vector by extracting only the essential support vectors which leads to less complex model. Hyperplane parameter (C) and bias term (Gamma) are to be computed which define the model. Figure 3 depicts the steps involved in reduction of support vectors by applying the proposed method.
PDF version available at
Figure 3: Flowchart of Proposed Method
Process is commenced by loading the training instances along with its corresponding
class labels in the required syntax. Grid search is performed to obtain optimum C and Gamma
parameters. The C and gamma parameter are found from the respective cross validation Accuracies
obtained and the values are taken. The C and gamma parameters are again
IV. EXPERIMENTATION AND VALIDATION
To verify the effectiveness and efficiency of the proposed method, historical climate data of Trichy, Delhi, Coimbatore,
Calcutta, Bombay and Chennai were taken. These data were used to train SVM
(Conventional
Nomenclature
Vectors |
1000 |
|
|
Conventional |
|
Clustered |
1010 |
||
|
|
|
|||||||
|
|
|
|||||||
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
||
800 |
749 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
Support |
600 |
|
543 |
607 |
620 |
|
|
||
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
of |
400 |
|
|
|
|
|
|
397 |
|
|
|
|
|
|
|
|
|
||
Number |
|
|
|
|
|
|
|
|
|
200 |
86 |
40 |
55 |
22 |
48 |
111 |
|||
|
|
|
|
||||||
|
0 |
|
|
|
|
|
|
|
|
|
TRY |
|
DLI |
CBE |
|
CAL |
BOM |
CHN |
|
|
|
|
|
Figure 4: Comparison of Number of Support vectors
Figure 4 illustrates the correlation between the number of support vectors for the model obtained when conventional SVM approach and clustered approach are used. It is inferred that Chennai has highest number of support vectors when trained using conventional method (1010) and these get reduced to 111(89% Reduction) when clustered approach is followed. Delhi has the highest (92.6%) reduction of number support vectors from 543 to 40 support vectors. On an average the proposed method has 90% reduction in number of support vectors.
|
95 |
|
|
|
|
Conventional |
|
Clustered |
91.98 |
|
||||
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
||||||||
|
|
89.38 |
89.24 |
89.87 |
89.03 |
91.67 |
|
|||||||
|
|
|
||||||||||||
|
|
|
|
|||||||||||
|
90 |
|
|
|
|
|
|
|||||||
|
|
|
84.69 |
|
87.46 |
|
|
|
|
|||||
Accuracy |
85 |
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
82.63 82.34 |
|
|
|
|
|
||||
80 |
|
|
|
|
|
|
80.32 |
|
||||||
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
77.78 |
|
|
75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TRY |
|
|
DLI |
CBE |
|
CAL |
BOM |
CHN |
||||
|
|
|
|
|
|
Figure 5: Comparison of Accuracy
Figure 5 illustrates the comparison between accuracies for the model obtained when conventional SVM approach and clustered approach are followed. The performance of the
PDF version available at
proposed approach is almost similar to that of conventional method. Models of Delhi and Bombay obtained using clustered approach (89.87% & 91.97%) performed better than the conventional SVM method (89.2% & 91.6%) while Trichy and Chennai had reduction in accuracy of 4.6% & 2.53% respectively along with the reduction in support vectors. Hence, we can infer that the proposed approach has performance similar to that of conventional SVM method with a maximum deviation of 4% in accuracy.
|
|
CBE |
DLI |
|
TRY |
|
800 |
CHN |
CAL |
|
BOM |
|
|
|
|
|
|
|
700 |
|
|
|
|
|
600 |
|
|
|
|
|
500 |
|
|
|
|
nSV |
400 |
|
|
|
|
|
|
|
|
|
|
|
300 |
|
|
|
|
|
200 |
|
|
|
|
|
100 |
|
|
|
|
|
0 |
|
|
|
|
|
0 |
10 |
20 |
30 |
40 |
Cluster Size
Figure 6: Number of Support Vectors vs. Cluster Size
Using the proposed clustering approach, trivial reduction in number of support
vectors can be observed. Figure 6 illustrates the reduction of number of support vector as the
cluster size increases. Due to increase in number of clusters the redundant and
|
|
CBE |
DLI |
|
TRY |
100 |
CHN |
CAL |
|
BOM |
|
|
|
|
|
||
|
95 |
|
|
|
|
|
90 |
|
|
|
|
Accuracy |
85 |
|
|
|
|
80 |
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
|
|
|
|
70 |
|
|
|
|
|
65 |
|
|
|
|
|
0 |
10 |
20 |
30 |
40 |
Cluster Size
Figure 7: Accuracy vs. Cluster Size
Figure 7 explains the relationship between cluster size and accuracy. We infer that the accuracy increases as the cluster
size increases, reaches a maximum value where extracted support vectors are only essential and then decreases. When cluster size is increased after saturation in extraction of essential support vectors there is no trivial improvement in accuracy of the model. In certain cases like Bombay, the accuracy might decrease because of extraction after saturation will miss out certain essential support vectors during the extraction process. Hence, the cluster size must be chosen in an appropriate way such that the requirements are met.
CBE DLI TRY
CHN CAL BOM
450
400
350
Time 300
250
Clustering 200
150
100
50
0
0 |
10 |
20 |
30 |
40 |
Cluster Size
Figure 8: Clustering Time vs. Cluster Size
Figure 8 shows the relationship between clustering time overhead and cluster size for training the model. It can be observed that as the cluster size increases the clustering time increases exponentially as the number of combinations of clusters increases.
V.CONCLUSIONS
Thus the proposed method reduces the computational complexity of the model by reducing the number of support vectors at the cost of extra clustering time overhead and computational resources. The results were verified by applying the proposed method to Weather Prediction Problem of different cities using historical climate data. It was observed there was an average of 90% reduction in number of support vectors with a deviation of 5% in accuracy. Thus, the proposed method proves to be better than the conventional SVM method. The future scope of the proposed method involves extension to multi class problems and application to various time critical applications which require rapid performance with a compensated accuracy.
PDF version available at
REFERENCES
[1]Joachims T, “Text categorization with support vector machines,” Learning with many relevant features. Springer Berlin Heidelberg; Apr 21 1998.
[2]R. Ramanathan, Ponmathavan, S., Valliappan, N., Thaneshwaran, L., Nair, A. S., and Soman, K. P., “Optical character recognition for English and Tamil using support vector machines”, in ACT 2009 - International Conference on Advances in Computing, Control and Telecommunication Technologies, Trivandrum, Kerala, 2009.
[3]Ganapathiraju A, Hamaker JE, Picone
J, “Applications of support vector machines to speech recognition,” IEEE Transactions on Signal
Processing, vol.52, no.8,
[4]R. Ramanathan, Nair, A. S., Sagar, V. V., Sriram, N., and Soman, K. P., “A support vector machines approach for efficient facial expression recognition”, in ARTCom 2009 - International Conference on Advances in Recent Technologies in Communication and Computing, Kottayam, Kerala, 2009.
[5]Shin KS, Lee TS, Kim HJ, “An
application of support vector machines in bankruptcy prediction model,” Expert Systems with
Applications, vol.28, no.1,
[6]Zhou X, Tuck DP,
[7]Xia XL, Lyu MR, Lok TM, Huang GB,
“Methods of decreasing the number of support vectors via
[8]Adankon MM, Cheriet M, “Model
selection for the
[9]Khemchandani R, Chandra S, “Twin
support vector machines for pattern classification,” in IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol.29, no.5,
[10]Segata N, Blanzieri E, “Fast local
support vector machines for large datasets,” In Machine Learning and Data Mining in Pattern
Recognition, Springer Berlin Heidelberg (pp.
[11]Tian Y, Shi Y, Liu X, “Recent
advances on support vector machines research,” in Technological and Economic Development of
Economy. vol.18, no.1,
[12]
[13]Andrew Moore (2003).Support Vector Machines Slides [Online], Available FTP: http://www.autonlab.org/tutorials/svm15.pdf.
PDF version available at