21st Estonian Winter School in Computer Science (EWSCS)
XXI Eesti Arvutiteaduse Talvekool

Palmse, Estonia, February 28 - March 4, 2016

Sun-Yuan Kung

Dept. of Electrical Engineering
Princeton University
USA

Kernel learning machines for compressive privacy protection of internet/cloud data

Abstract

In the internet era, we experience a phenomenon of "digital everything". Due to its quantitative (volume and velocity) and qualitative (variety) challenges, it is imperative to address various computational aspects of big data. For big data, the curse of high feature dimensionality is causing grave concerns on computational complexity and over-training. We shall explore various projection methods, e.g., principal component analysis (PCA), for dimension reduction - a prelude to visualization of big data and privacy preservation.

In the internet era, we benefit greatly from the combination of packet switching, bandwidth, processing and storage capacities in the cloud. However, "big data" often has a connotation of "big brother", since the data being collected on consumers like us is growing exponentially, attacks on our privacy are becoming a real threat. New technologies are needed to better assure our privacy protection when we upload personal data to the cloud. To this end, we shall explore joint optimization over three design spaces: (a) feature space, (b) classification space, and (c) privacy space. This prompts a new paradigm called compressive privacy (CP) to explore information systems which simultaneously perform utility space maximization (deliver intended data mining, classification, and learning tasks) and privacy space minimization (safeguard personal/private information).

An important development in CP is a "privatizing" discriminant component analysis (DCA), which offers a compression scheme to enhance privacy protection in contextual and collaborative learning environment. DCA can be viewed as a supervised PCA which can simultaneously rank order the (1) the sensitive components and (2) the privatized components.

Big data analysis usually involves nonlinear data analysis, the two most promising approaches for which are kernel learning machine (KLM) and deep learning machine (DLM). The safest possible protection is to withhold the privy data from sharing in the first place. This schema however presents a formidable challenge in developing a machine learning tool for incomplete data analysis (IDA). Fortunately, KLM can naturally be extended to all types of nonvectorial data analysis including IDA. Moreover, KLM can facilitate:

This course will strive to cover various theoretical foundations and their associated privacy relevant applications including: estimation and classification, information theory, statistical analysis and subspace optimization.

Course materials

Valid CSS! Valid XHTML 1.0 Strict Last changed May 7, 2016 12:52 Europe/Helsinki (GMT +03:00) by local organizers, ewscs16(at)cs.ioc.ee
EWSCS'16 page: http://cs.ioc.ee/ewscs/2016/