Sun-Yuan Kung
Dept. of Electrical Engineering
Princeton University
USA
Kernel learning machines for compressive privacy protection of internet/cloud data
Abstract
In the internet era, we experience a phenomenon of "digital everything". Due to its quantitative (volume and velocity) and qualitative (variety) challenges, it is imperative to address various computational aspects of big data. For big data, the curse of high feature dimensionality is causing grave concerns on computational complexity and over-training. We shall explore various projection methods, e.g., principal component analysis (PCA), for dimension reduction - a prelude to visualization of big data and privacy preservation.
In the internet era, we benefit greatly from the combination of packet switching, bandwidth, processing and storage capacities in the cloud. However, "big data" often has a connotation of "big brother", since the data being collected on consumers like us is growing exponentially, attacks on our privacy are becoming a real threat. New technologies are needed to better assure our privacy protection when we upload personal data to the cloud. To this end, we shall explore joint optimization over three design spaces: (a) feature space, (b) classification space, and (c) privacy space. This prompts a new paradigm called compressive privacy (CP) to explore information systems which simultaneously perform utility space maximization (deliver intended data mining, classification, and learning tasks) and privacy space minimization (safeguard personal/private information).
An important development in CP is a "privatizing" discriminant component analysis (DCA), which offers a compression scheme to enhance privacy protection in contextual and collaborative learning environment. DCA can be viewed as a supervised PCA which can simultaneously rank order the (1) the sensitive components and (2) the privatized components.
Big data analysis usually involves nonlinear data analysis, the two most promising approaches for which are kernel learning machine (KLM) and deep learning machine (DLM). The safest possible protection is to withhold the privy data from sharing in the first place. This schema however presents a formidable challenge in developing a machine learning tool for incomplete data analysis (IDA). Fortunately, KLM can naturally be extended to all types of nonvectorial data analysis including IDA. Moreover, KLM can facilitate:
- Intrinsic space and privacy: Reduce the number of training vectors need to be stored in the cloud (SVM) or even make it unnecessary to share any training data via the intrinsic kernel approach.
- Auto-encoder for privacy: Compare two nonlinear auto-encoders (KLM and DLM) for data minimizer.
- Kernel learning machine for privacy: For example, partially-specified feature vectors can be pairwise correlated to yield a similarity or kernel function for kernel learning machine (KLM). Thereafter, SVM or KRR supervised learning classifiers may be trained and deployed for prediction applications.
This course will strive to cover various theoretical foundations and their associated privacy relevant applications including: estimation and classification, information theory, statistical analysis and subspace optimization.
Course materials
- S.-Y. Kung. Machine learning for compressive privacy. Slides from the EWSCS 2016 course. [pdf]
- Videos from the lectures.
- S.-Y. Kung. Kernel Methods and Machine Learning. Cambridge Univ. Press, 2014. [doi link]
- S.-Y. Kung. Discriminant component analysis for privacy protection and visualization of big data. Multimedia Tools and Appl., to appear. [do link]
Last changed
May 7, 2016 12:52 Europe/Helsinki (GMT +03:00)
by
local organizers, ewscs16(at)cs.ioc.ee
EWSCS'16 page:
//cs.ioc.ee/ewscs/2016/