PI: Yiming Yang, Carnegie Mellon University
Students involved : Hanxiao Liu, Yuexin Wu, Wei-Cheng Chang, Jingzhou Liu, Ruochen Xu
Important problems in the big-data era involve predictions based on heterogeneous sources of information and the dependency structures in data. In recommendation systems, for example, predictions need to be made not only based on observed user ratings over items (movies, books, music, shopping products, etc.), but also based on information such as demographical data of users and textual descriptions of items. In event detection from textual data (news stories, tweets, maintenance reports, legal documents, etc.), joint inference must be based on who (agents), what (event types or topics), where (locations) and when (dates), and also based on the connections among agents (in social networks), topics (in an event-type ontology), locations (in a map) and temporal co-occurrences. The fundamental research questions therefore include: 1) how to develop a unified optimization framework for predictions based on heterogeneous information and dependency structures in various kinds of tasks, 2) how to make the inference computationally tractable when the combined space of model parameters is extremely large, and 3) how to significantly enhance the prediction power of the system by leveraging massively available unlabeled data in addition to human-annotated training data which are often sparse.
This project addresses the above challenges via the following approaches:
The proposed work has yielded significant impacts on both machine learning algorithms and real-world applications in multiple fields, as illustrated below.
Transductive Learning over Graphs
We developed a novel graph-based transductive learning framework, namely Transductive Learning over Product Graph (TOP), which simultaneously extracts multi-type associations from different sources of data, maps heterogeneous types of objects and relations onto a unified product graph, and performs joint inference about topic labels of documents via transductive label propagation over the product graph. This approach is particularly effective in transductive learning scenario where labeled documents are very sparse and unlabeled documents are massively available, and when the manifold structures are highly informative but varying in different fields of co-occurrence data.
In our experiments with an Enzyme multi-source dataset (445 compounds, 664 proteins) and a subset of DBLP publication records (34K users, 11K papers and 22 venues) (Figure 1), CGRL successfully scaled to the large cross-graph inference problem, and outperformed other representative approaches significantly (Figure 3) (Hanxiao Liu and Yiming Yang, ICML 2016).
Fig 1. Prediction of associations among heterogeneous graphs on the Enzyme (left) and DBLP (right) datasets. The blue edges represent the within-graph relations and the red edges represent the cross-graph interactions.
To further enhance the performance of the model while performing label propagation over the heterogeneous graphs, we develop a new framework (GCMC) (Figure 2) that leverages the power of graph convolutional neural networks to learn adaptive feature representations over all nodes (KDIR 2018). We also reduce the computation cost of the neural networks via first-order Chebyshev polynomial approximation. Such modification enjoys both the flexibility of label propagation and the fast computation of the feature learning process.
Fig 2. Architecture of the Graph Convolutional Matrix Completion (GCMC) network. The input bipartite graph B is used to extract features/signals which are used by graph convolutional neural networks with Chebyshev polynomial approximation. The prediction is made by performing dot product between the final feature representations.As a complementary direction, we also developed a novel nonparametric framework (Hanxiao Liu and Yiming Yang, AISTATS 2016) for semi-supervised learning and for optimizing the Laplacian spectrum of the data manifold simultaneously. The new technique can be incorporated into any homogeneous transductive learning algorithm, including our own works in ICML’16 (over the product graphs). The new formulation leads to a convex optimization problem that can be efficiently solved via the bundle method, and can be interpreted as to asymptotically minimize the generalization error bound of semi-supervised learning with respect to the graph spectrum. Experiments over benchmark datasets in various domains (text, image, audio) show advantageous performance of the proposed method over existing graph-based semi-supervised learning algorithms.
Fig 3. The results of TOP (our method), LTKM (low-rank tensor kernel machine), NN (nearest neighbor), RSVM (ranking SVM), TF (tensor factorization) and GRTF (graph-regularized tensor factorization) on benchmark data.
Analogical Learning for Multi-label Relational Learning
For knowledge base completion we developed a novel framework that explicitly imposes analogical structures in multi-relational embedding (Figure 4). Our model enjoys both theoretical power and computational scalability, and significantly outperformed a large number of representative baseline methods on benchmark datasets. It also offered a unified view by subsuming several representative methods recently developed in machine learning for multi-relational learning(ICML 2017).
Fig 4. Commutative diagram for the analogy between the Solar System (red) and the Rutherford-Bohr Model (blue) (atom system). The new relation (nucleus attract charge) is inferred by analogy from the existing mirror structures.
We also enhanced the power of knowledge transferring through graph-based kernel induction (AAAI 2017). Our new framework does not require a shared feature space but instead used a parallel corpus to calibrate domain-specific kernels into a unified kernel for label propagation across languages/domains for semi-supervised learning based on labeled and unlabeled data. Our experiments on benchmark datasets showed advantageous performance of the proposed method over that of other state-of-the-art transfer learning methods (Figure 5).
Fig 5. The results of KerTL (our method) and other state-of-the-art methods on benchmark data APR and MNIST.Other accomplishments partially under the NSF grant include the development of a deep learning framework for extreme multi-label text classification (SIGIR 2017), a large-scale kernel approximation algorithm (IJCAI 2017), and our cross-lingual distillation framework for text classification (ACL 2017).
Epidemiological Trend Prediction
Under the support of the National Science Foundation (NSF) of the United States and the Japanese Science and Technology (JST), we have recently initiated an interdisciplinary collaboration with the epidemiologists at the Hokkaido University in Japan. Specifically, we have collected influenza data from different regions in USA and Japan, and developed first neural network framework in the field of epidemiological modeling for trend forecasting (Wu SIGIR 2018). As illustrated in Figure 6, our model uses a CNN layer to effectively capture the correlations among the influenza status of different regions in each week, and a RNN layer to model long- and short-term dependencies among the trend over time. Residual links are added to speed-up the training process and reduce the possibility of overfitting.
Fig 6. Our deep learning framework: The top portion is the multi-dimensional time series data; each dashed box is the vector representation of the influenza measures in different regions of a country. The middle portion consists of the CNN modules, and the bottom portion consists of the RNN modules with residual links in-between. The blue curves (solid) on the top portion are the trends in the past, and the red curves (dashed) are the curves in the future. The system is designed to forecast the epidemic trends at each current time based on the past signals.We tested our model on the collected USA/Japan influenza data, and compared the results of our methods with representative methods in time series analysis, including widely-used autoregression models such as GAR (global autoregression), VAR (vector autoregression) and Gaussian process regression models. The results show that our new approach performed consistently better, as shown in Figure 7.
Fig 7. The results of CNNRNN-Res (our method) and other methods on datasets Japan-Prefectures and US-Regions when prediction horizon is set to 2, measured in Pearson Correlation (the higher the better).A similar model which utilized graph information for general time-series prediction is also developed partially under the NSF grant (Lai SIGIR 2018).
Mon Oct 8 12:04:44 EDT 2018