Data Sciences Workshop
You can download some of the presentations. See below.


Keynote speakers 

Dr. Vidal

Dr. René Vidal , Associate Professor of Biomedical Engineering
Johns Hopkins University, USA

Title: Algebraic, Sparse and Low Rank Subspace Clustering ( Slides: pdf )

Video: Watch Here

Abstract: In the era of data deluge, the development of methods for discovering structure in high-dimensional data is becoming increasingly important. Traditional approaches often assume that the data is sampled from a single low-dimensional manifold. However, in many applications in signal/image processing, machine learning and computer vision, data in multiple classes lie in multiple low-dimensional subspaces of a high-dimensional ambient space. In this talk, I will present methods from algebraic geometry, sparse representation theory and rank minimization for clustering and classification of data in multiple low-dimensional subspaces. I will show how these methods can be extended to handle noise, outliers as well as missing data. I will also present applications of these methods to video segmentation and face clustering.

Biography: Professor Vidal received B.S. degree in Electrical Engineering (highest honors) from the Pontificia Universidad Catolica de Chile in 1997 and his M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences from the University of California at Berkeley in 2000 and 2003, respectively, and has been on the faculty of the Center for Imaging Science in the Department of Biomedical Engineering of The Johns Hopkins University since 2004, where he is currently an Associate Professor. He was co-editor of the book ''Dynamical Vision" and has co-authored more than 180 articles in biomedical image analysis, computer vision, machine learning, hybrid systems, robotics and signal processing. He has received many awards for his work including the 2012 J.K. Aggarwal Prize, the 2009 ONR Young Investigator Award, the 2009 Sloan Research Fellowship, the 2005 NFS CAREER Award, and best paper awards at ICCV-3DRR 2013, PSIVT 2013, CDC 2012, MICCAI 2012, CDC 2011 and ECCV 2004. Dr. Vidal has been Associate Editor of Medical Image Analysis, the IEEE Transactions on Pattern Analysis and Machine Intelligence, the SIAM Journal on Imaging Sciences and the Journal of Mathematical Imaging and Vision, Program Chair for ICCV 2015, CVPR 2014, WMVC 2009 and PSIVT 2007, and Area Chair for MICCAI 2013 and 2014, ICCV 2007, 2011 and 2013, and CVPR 2005 and 2013, and program committee member for all major conferences in computer vision. machine learning and medical imaging. He is a fellow of the IEEE and a member of the ACM and SIAM.

Dr. Aldroubi Dr. Akram Aldroubi , Professor of Mathematics
Vanderbilt University, USA

Title: Subspace Segmentation and Its Applications ( Slides: pdf )

Video: Watch here

Abstract: The subspace segmentation problem is fundamental in many applications. The goal is to cluster data drawn from an unknown union of subspaces. We will state the problem and describe its connection to other areas of mathematics and engineering. We then review the mathematical and algorithmic methods created to solve this problem and some of its particular cases. We also describe the problem of motion tracking in videos and its connection to the subspace segmentation problem and compare the various techniques for solving it.

Biography: Akram Aldroubi is a Professor of Mathematics at Vanderbilt University, and he is a Fellow of the American Mathematical Society. He has authored and co-authored over 100 research publications related to modern harmonic analysis and its applications. He is the Co-editor in Chief of the international Journal on Sampling Theory in Signal and Image Processing (STSIP), and he is on the editorial board of several other mathematical journals.


Invited speakers

Dr. LiuDr. Guangcan Liu, Professor 
School of Information and Control
Nanjing University of Information Science and Technology, China

Title: Robust Subspace Clustering in High Dimension: A Deterministic Result ( Slides: pptx )

Video: Watch here

 It is of great interest to explore the problem of Robust Subspace Clustering: Given a collection of data points approximately drawn from a union of multiple subspaces, the goal is to segment the points into their respective subspaces and remove possible errors as well. In general, without any presumptions about the data, it is virtually hard to resolve this problem for sure. Fortunately, today's data is often high-dimensional and massive, and thus very often the sum of those multiple subspace together has a rank of fairly low, i.e., the union of multiple subspaces could be regarded as a single low-dimensional subspace. This fact drives us to propose a simple yet effectual method for subspace clustering. Similar to prevalent clustering methods, our method also adopts a two-stage framework: It firstly learns an affinity matrix from the given data points and then uses spectral clustering techniques to produce the final clustering results. The inference process of the affinity matrix is formulated as a nuclear norm minimization problem, termed Low-Rank Representation (LRR), which seeks the lowest rank representation among all the candidates that can represent each data point as a linear combination of the other points. It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: Under certain conditions, it is proved that LRR can exactly recover the authentic row projector from a given set of data points possibly contaminated by outliers. Since the subspace membership of the data points is provably determined by the authentic row projector, this further implies that LRR can well solve the robust subspace clustering problem under certain conditions.

Biography: Dr. Guangcan Liu received the bachelor's degree in mathematics and the Ph.D. degree in computer science and engineering from Shanghai Jiao Tong University, Shanghai, China, in 2004 and 2010, respectively. He was a Post-Doctoral Researcher with the National University of Singapore, Singapore, from 2011 to 2012, the University of Illinois at Urbana-Champaign, Champaign, IL, USA, from 2012 to 2013, Cornell University, Ithaca, NY, USA, from 2013 to 2014, and Rutgers University, Piscataway, NJ, USA, in 2014. Since 2014, he has been a Professor with the School of Information and Control, Nanjing University of Information Science and Technology, Nanjing, China. His research interests mainly include machine learning, computer vision, and image processing.

Dr. LittleDr. Anna Little, Assistant Professor
Department of Mathematics
Jacksonville University, USA

Title: Estimating the Intrinsic Dimension of High-Dimensional Data Sets ( Slides: pdf )

Video: Watch here

Abstract: This talk discusses a novel approach for estimating the intrinsic dimension of noisy, high-dimensional point clouds. A general class of sets which are locally well-approximated by k dimensional planes but which are embedded in a D>>k dimensional Euclidean space are considered. The dimension is estimated via a new multiscale algorithm that generalizes principal component analysis (PCA). The classical PCA approach recovers the dimension when the data is linear but fails when the data is non-linear, overestimating the intrinsic dimension. This new multiscale algorithm exploits the low-dimensional structure of the data, so that its power depends on k rather than D, and is robust to small sample size, noise, and non-linearities in the data.

Biography: Anna Little has served as an assistant professor of Mathematics at Jacksonville University in Jacksonville, FL since fall 2012. She got her undergraduate degree at Samford University in 2006 and a PhD in mathematics from Duke University in 2011, where she worked under Dr. Mauro Maggioni to develop a multiscale algorithm for intrinsic dimension estimation of high-dimensional data sets. In addition to high-dimensional data analysis, her research interests include multiscale methods, clustering algorithms, statistics, and machine learning.

Dr. LuDr. Yue M. Lu , Assistant Professor
School of Engineering and Applied Sciences
Harvard University

Title: Randomized Kaczmarz Algorithm and Its Cousins: Exact Performance Analysis and Large System Dynamics ( Slides: pdf )

Video: Watch here

Abstract: Randomized Kaczmarz algorithm (RKA) is a simple but efficient method for solving large-scale over-determined systems through random iterative projections. Although the algorithm has been used for some time, it was only recently that Strohmer and Vershynin established its exponential convergence in the mean square sense. A flurry of work followed on performance bounds and the optimization of the algorithm. In this talk, I will present an exact analysis of the algorithm for both noisy and noiseless cases. In particular, I will show how to compute the exact mean square error (MSE) in the value reconstructed by RKA using a simple 'lifting trick': the empirical MSE is the trace of the empirical error covariance, whose evolution can be described by a random linear dynamical system in a higher dimensional lifted space. For the noiseless case, I will show how to compute the error exponent, i.e., the exponential decay rate of the MSE, and describe how to optimize the row-selection probabilities to speed up convergence. The typical convergence of the algorithm is much faster than the decay rate of the MSE suggests; I will define a "quenched" error exponent to characterize the typical convergence and apply statistical physics-based bounds to approximate it. Our analysis agrees with numerical results, which also indicate that previous upper bounds in the literature for both the noisy and noiseless cases can often be several orders of magnitude too high. Finally, I will show how to extend our analysis to other related randomized algorithms, both in finite dimensions as well as in the infinite dimensional large system limit.
Joint work with Ameya Agaskar (Harvard & MIT Lincoln Laboratory) and Chuang Wang (Harvard).

Biography: Yue M. Lu attended the University of Illinois at Urbana-Champaign, where he received the M.Sc. degree in Mathematics and the Ph.D. degree in Electrical Engineering, both in 2007. He was a Research Assistant at the University of Illinois at Urbana-Champaign, and a postdoctoral researcher at the Audiovisual Communications Laboratory at Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland. Since September 2010, he has been an Assistant Professor of Electrical Engineering at Harvard University, directing the Signals, Information, and Networks Group (SING) at the School of Engineering and Applied Sciences. 
He received the Most Innovative Paper Award of IEEE International Conference on Image Processing (ICIP) in 2006, the Best Student Paper Award of IEEE ICIP in 2007, and the Best Student Presentation Award at the 31st SIAM SEAS Conference in 2007. Student papers supervised and coauthored by him won the Best Student Paper Award of IEEE International Conference on Acoustics, Speech and Signal Processing in 2011 and the Best Student Paper Award of IEEE Global Conference on Signals and Information Processing (GlobalSIP) in 2014.
He has been an Associate Editor of the IEEE Transactions on Image Processing since December 2014, and an Elected Member of the IEEE Image, Video, and Multidimensional Signal Processing Technical Committee since January 2015.

Dr. HongDr. Don Hong , Professor
Center for Computational Science and Department of Mathematical Sciences
Middle Tennessee State University
Vanderbilt University

Title: High Dimensional Data Analysis with Applications in IMS and fMRI Processing ( Slides: pdf )

Video: Watch here

Abstract: Many high dimensional data sets such as imaging mass spectrometry (IMS) and functional magnetic resonance imaging (fMRI) data are of the hyper-spectral imaging (HSI) type. Advanced mathematical tools and statistical techniques can not only provide significance analysis of experimental data sets but also can help in finding new data features/patterns, guiding biological experiments designs, as well as leading computational tools development. In this talk, we would like to discuss challenges in HSI type data processing and report some recent progress using statistical computing methods for hype-spectral imaging type medical data processing, especially on IMS cancer data analysis and fMRI applications in AD and autism study.
This is a joint work with Qiang Wu, Lu Xiong, Jingsai Liang, and Xin Yang.

Biography: Don Hong earned his Ph.D. in Mathematics from Texas A&M University in 1993 and has held a postdoctoral position at the University of Texas-Austin and served on the faculty at East Tennessee State University. He has been a professor of the Center for Computational Sciences and the Department of Mathematical Sciences of Middle Tennessee State University (MTSU) since 2005. He was co-editor of the book ''Quantitative Medical Data Analysis Using Mathematical Tools and
Statistical Techniques" and has co-authored the book "Real Analysis with Introduction to Wavelets" and over 50 articles in computational sciences. Dr. Hong is on the editorial board of several computational science journals including the Journal of Health and Medical Informatics, International Journal of Computational Mathematics, Journal of Applied Functional Analysis, International Journal of Mathematics and Computer Science, and American Research Journal of Mathematics. He also serves as the coordinator of actuarial science program at MTSU.

Dr. Balzano Dr. Laura Balzano , Assistant Professor
Department of Electrical Engineering and Computer Science
University of Michigan, Ann Arbor

Title: Subspace Clustering with Missing Data ( Slides: pdf )

Video: Watch here

Abstract: Many big data problems require algorithms that can handle missing data. For a subspace or union of subspaces model, we are fortunately able to leverage results on incomplete data projections to estimate the model in the presence of missing data. This talk will discuss two algorithms based on these ideas. We will also discuss theoretical results on when it is possible to identify a union of subspaces given incomplete data.

Biography: Laura Balzano is an assistant professor in Electrical Engineering and Computer Science at the University of Michigan. Laura received her BS, MS, and Ph.D. in Electrical Engineering from Rice University, the University of California in Los Angeles, and the University of Wisconsin, respectively. She received the Outstanding MS Degree of the year award from the UCLA EE Department, and the Best Dissertation award from the University of Wisconsin ECE Department. She has worked as a software engineer at Applied Signal Technology, Inc on signal processing software for massive data. Her PhD was supported by a 3M fellowship. Her main research focus is on statistical signal processing, estimation, optimization, and modeling with highly incomplete or corrupted data, and its applications in computer vision, network monitoring, and environmental sensing.

Dr. Baudray Dr. Jerome Baudry , Associate Professor
Department of Biochemistry & Cellular and Molecular Biology
University of Tennessee
UT/ORNL Center for Molecular Biophysics
Institute of Biomedical Engineering

Title: Supercomputer-Based Drug Discovery: Finding the Needle in the Data Haystack

Video: Watch here

Abstract: Virtual screening is a computational biology technique that has long been used, for instance in the pharmaceutical industry, to discover molecules that can bind to protein targets. Recent technological and fundamental developments based on the availability of petaflop supercomputers can revolutionize this approach and tackle complex system problems that are the hallmark of biology .
However, the amount - and the complexity - of the data that must be analyzed and understood is very challenging. I will present real case applications that aim at clustering the data in chemical and biological spaces that represent the function and properties of the biomolecules of interest. We will discuss how to translate this avalanche of data into biological knowledge through PCA analysis and complex systems approaches.

Biography: Professor Jerome Baudry joined the Center for Molecular Biophysics in 2008 as an Assistant Professor at the University of Tennessee, Knoxville; Department of Biochemistry & Cell and Molecular Biology. Dr Baudry obtained his Ph.D. in Molecular Biophysics with the highest Honors from the University of Paris-06, France (University Pierre and Marie Curie ). He subsequently joined the group of Klaus Schulten at the University of Illinois at Urbana-Champaign as a post-doc. After his post-doctoral work, Dr. Baudry worked in the pharmaceutical industry as a Research Scientist, and then accepted a Senior Research Scientist position back in Illinois. Prior to his appointment in Tennessee, Dr. Baudry was Research Assistant Professor in the School of Chemical Sciences at the University of Illinois, Urbana-Champaign. The Baudry laboratory develops and applies methods and protocols in computational molecular biophysics for structure-based molecular discovery. The lab works on several targets relevant to human and animal health as well as on targets of agrochemical interest. The theoretical approach is complemented by close collaborations with experimental groups.

Dr. Tim Wallace, Ph.D.
Computer and Information Systems Engineering
Tennessee State University

Title: Application of Subspace Clustering in DNA Sequence Analysis

Video: Watch here

Abstract: Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups lie within a union of subspaces for unique clusters of the orthologous groups. In this talk, we will discuss the recent experimental findings and compare to the main hypothesis and predictions, as well as simulations from a perfect binary random mutation tree.
This work includes contributions from Dr. A. Sekmen and Dr. X. Wang.

Biography: Dr. Wallace recently completed his doctorate at Tennessee State University in 2014 under advisement of Dr. Ali Sekmen with committee members from the Departments of Computer Science and Electrical and Computer Engineering. His current activities include research and development of statistical algorithms in information theoretic areas such as predictive analytics and high-dimensional data mining. In addition, his research interests include methods and techniques for diverse computational challenges in mathematics, biology, biomedical imaging, and molecular radiation physics.

Dr. Pimentel Daniel Pimentel, Ph.D. Candidate
Advisor: Dr. Robert Nowak
Electrical and Computer Engineering
University of Wisconsin-Madison

Title: On the Difficulties of Subspace Clustering with Missing Data ( Slides: pdf )

Video: Watch here

Abstract: We love subspaces. We observe a phenomenon and try to find a line that explains it. We get our hands on some data, and try to find a subspace that fits it. But sometimes one subspace is not enough. Data are often better explained by multiple lines, or more generally, unions of subspaces. Hence the importance of subspace clustering: infer the set of subspaces that best fit a dataset.
In many relevant applications missing data are common, thus subspace clustering with missing data (SCMD) is a task we would very much like to perform. Nevertheless, the sample complexity of SCMD remains an important open problem. In this talk I will discuss the difficulties of this task and introduce the problem of subspace identifiability from canonical projections, which sheds new light into the SCMD problem.

Michael NorthingtonMichael Northington, Ph.D. Candidate
Advisor: Dr. Alexander Powell
Department of Mathematics
Vanderbilt University

Title: Balian Low Type Uncertainty Principles for Shift Invariant Spaces with Extra Invariance ( Slides: pdf )

Video: Watch here

Abstract: Shift invariant subspaces of L^2(R^d), such as certain spline spaces, wavelet spaces, and Paley-Wiener spaces, are commonly used in applications. Recently, there has been interest in studying finitely generated shift invariant spaces which are endowed with extra invariance by some non-integer translation. We will introduce the theory of shift invariant spaces and explain how this extra invariance assumption causes obstructions to the localization of the generators of the space.

Biography: Michael Northington is a graduate assistant at Vanderbilt University where he is co-advised by Alexander Powell and Doug Hardin. He received a BS from Austin Peay State University and an MS from the University of Mississippi. His current areas of research are applied harmonic analysis, inverse problems, and machine learning.