## Publications

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

 Selected papers
 Learning Generative Adversarial Networks from Multiple Data Sources Trung Le, Quan Hoang, Hung Vu, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI), aug 2019. [ | ] Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair. @INPROCEEDINGS { le_etal_ijcai19_learningGAN,    AUTHOR = { Trung Le and Quan Hoang and Hung Vu and Tu Dinh Nguyen and Hung Bui and Dinh Phung },    TITLE = { Learning Generative Adversarial Networks from Multiple Data Sources },    BOOKTITLE = { Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI) },    YEAR = { 2019 },    MONTH = { aug },    ABSTRACT = { Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair. },} C
 Three-Player Wasserstein GAN via Amortised Duality Nhan Dam, Quan Hoang, Trung Le, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI), aug 2019. [ | ] We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method. @INPROCEEDINGS { dam_etal_ijcai19_3pwgan,    AUTHOR = { Nhan Dam and Quan Hoang and Trung Le and Tu Dinh Nguyen and Hung Bui and Dinh Phung },    TITLE = { Three-Player {W}asserstein {GAN} via Amortised Duality },    BOOKTITLE = { Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI) },    YEAR = { 2019 },    MONTH = { aug },    ABSTRACT = { We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method. },} C
 Learning How to Active Learn by Dreaming Thuy-Trang Vu, Ming Liu, Dinh Phung and Gholamreza Haffari. In In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italia, jul 2019. [ | ] @INPROCEEDINGS { vu_etal_acl19_learning,    AUTHOR = { Thuy-Trang Vu and Ming Liu and Dinh Phung and Gholamreza Haffari },    TITLE = { Learning How to Active Learn by Dreaming },    BOOKTITLE = { In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL) },    YEAR = { 2019 },    ADDRESS = { Florence, Italia },    MONTH = { jul },} C
 A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, USA, jun 2019. [ | | pdf] In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. @INPROCEEDINGS { nguyen_etal_naaclhtl19_acapsule,    AUTHOR = { Dai Quoc Nguyen and Thanh Vu and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },    TITLE = { A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization },    BOOKTITLE = { In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL) },    YEAR = { 2019 },    ADDRESS = { Minneapolis, USA },    MONTH = { jun },    ABSTRACT = { In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. },    FILE = { :nguyen_etal_naaclhtl19_acapsule - A Capsule Network Based Embedding Model for Knowledge Graph Completion and Search Personalization.pdf:PDF },    URL = { https://arxiv.org/abs/1808.04122 },} C
 Probabilistic Multilevel Clustering via Composite Transportation Distance Viet Huynh, Nhat Ho, Dinh Phung and Michael I. Jordan. In In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan, apr 2019. [ | | pdf] We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach. @INPROCEEDINGS { ho_etal_aistats19_probabilistic,    AUTHOR = { Viet Huynh and Nhat Ho and Dinh Phung and Michael I. Jordan },    TITLE = { Probabilistic Multilevel Clustering via Composite Transportation Distance },    BOOKTITLE = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },    YEAR = { 2019 },    ADDRESS = { Okinawa, Japan },    MONTH = { apr },    ABSTRACT = { We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach. },    FILE = { :ho_etal_aistats19_probabilistic - Probabilistic Multilevel Clustering Via Composite Transportation Distance.pdf:PDF },    JOURNAL = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },    URL = { https://arxiv.org/abs/1810.11911 },} C
 Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel and Lizhen Qu. In International Conference on Learning Representations (ICLR), 2019. [ | | pdf] @INPROCEEDINGS { le_etal_iclr18_maximal,    AUTHOR = { Tue Le and Tuan Nguyen and Trung Le and Dinh Phung and Paul Montague and Olivier De Vel and Lizhen Qu },    TITLE = { Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection },    BOOKTITLE = { International Conference on Learning Representations (ICLR) },    YEAR = { 2019 },    FILE = { :le_etal_iclr18_maximal - Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection.pdf:PDF },    URL = { https://openreview.net/forum?id=ByloIiCqYQ },} C
 Robust Anomaly Detection in Videos using Multilevel Representations Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, USA, 2019. [ | | pdf] @INPROCEEDINGS { vu_etal_aaai19_robustanomaly,    AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung },    TITLE = { Robust Anomaly Detection in Videos using Multilevel Representations },    BOOKTITLE = { In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI) },    YEAR = { 2019 },    ADDRESS = { Honolulu, USA },    FILE = { :vu_etal_aaai19_robustanomaly - Robust Anomaly Detection in Videos Using Multilevel Representations.pdf:PDF },    GROUPS = { Anomaly Detection },    URL = { https://github.com/SeaOtter/vad_gan },} C
 Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data Khanh Nguyen, Trung Le, Tu Nguyen, Geoff Webb and Dinh Phung. In Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), London, UK, aug 2018. [ | ] Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to eﬀectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals. @INPROCEEDINGS { nguyen_etal_kdd18_robustbayesian,    AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Geoff Webb and Dinh Phung },    TITLE = { Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data },    BOOKTITLE = { Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD) },    YEAR = { 2018 },    ADDRESS = { London, UK },    MONTH = { aug },    PUBLISHER = { ACM },    ABSTRACT = { Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to eﬀectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals. },    FILE = { :nguyen_etal_kdd18_robustbayesian - Robust Bayesian Kernel Machine Via Stein Variational Gradient Descent for Big Data.pdf:PDF },} C
 MGAN: Training Generative Adversarial Nets with Multiple Generators Quan Hoang, Tu Dinh Nguyen, Trung Le and Dinh Phung. In International Conference on Learning Representations (ICLR), 2018. [ | | pdf] We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. @INPROCEEDINGS { hoang_etal_iclr18_mgan,    AUTHOR = { Quan Hoang and Tu Dinh Nguyen and Trung Le and Dinh Phung },    TITLE = { {MGAN}: Training Generative Adversarial Nets with Multiple Generators },    BOOKTITLE = { International Conference on Learning Representations (ICLR) },    YEAR = { 2018 },    ABSTRACT = { We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. },    FILE = { :hoang_etal_iclr18_mgan - MGAN_ Training Generative Adversarial Nets with Multiple Generators.pdf:PDF },    URL = { https://openreview.net/forum?id=rkmu5b0a- },} C
 Geometric enclosing networks Trung Le, Hung Vu, Tu Dinh Nguyen and Dinh Phung. In Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI), jul 2018. [ | ] Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. @INPROCEEDINGS { le_etal_ijcai18_geometric,    AUTHOR = { Trung Le and Hung Vu and Tu Dinh Nguyen and Dinh Phung },    TITLE = { Geometric enclosing networks },    BOOKTITLE = { Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI) },    YEAR = { 2018 },    MONTH = { jul },    ABSTRACT = { Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. },    FILE = { :le_etal_ijcai18_geometric - Geometric Enclosing Networks.pdf:PDF },} C
 A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018. [ | | pdf] We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. @INPROCEEDINGS { nguyen_etal_naacl18_anovelembedding,    AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },    TITLE = { A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network },    BOOKTITLE = { Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) },    YEAR = { 2018 },    ABSTRACT = { We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. },    FILE = { :nguyen_etal_naacl18_anovelembedding - A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network.pdf:PDF },    URL = { https://arxiv.org/abs/1712.02121 },} C
 Learning Graph Representation via Frequent Subgraphs Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In Proc. of SIAM Int. Conf. on Data Mining (SDM), 2018. [ | ] @INPROCEEDINGS { nguyen_etal_sdm18_learning,    AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung },    TITLE = { Learning Graph Representation via Frequent Subgraphs },    BOOKTITLE = { Proc. of SIAM Int. Conf. on Data Mining (SDM) },    YEAR = { 2018 },    PUBLISHER = { SIAM },    FILE = { :nguyen_etal_sdm18_learning - Learning Graph Representation Via Frequent Subgraphs.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2018.01.12 },} C
 Dual Discriminator Generative Adversarial Nets Tu Dinh Nguyen, Trung Le, Hung Vu and Dinh Phung. In Advances in Neural Information Processing Systems 29 (NIPS), 2017. [ | ] We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database. @INPROCEEDINGS { tu_etal_nips17_d2gan,    AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Vu and Dinh Phung },    TITLE = { Dual Discriminator Generative Adversarial Nets },    BOOKTITLE = { Advances in Neural Information Processing Systems 29 (NIPS) },    YEAR = { 2017 },    ABSTRACT = { We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database. },    FILE = { :tu_etal_nips17_d2gan - Dual Discriminator Generative Adversarial Nets.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.09.06 },} C
 GoGP: Fast Online Regression with Gaussian Processes Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. In International Conference on Data Mining (ICDM), 2017. [ | ] One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. @INPROCEEDINGS { le_etal_icdm17_gogp,    AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung },    TITLE = { {GoGP}: Fast Online Regression with Gaussian Processes },    BOOKTITLE = { International Conference on Data Mining (ICDM) },    YEAR = { 2017 },    ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. },    FILE = { :le_etal_icdm17_gogp - GoGP_ Fast Online Regression with Gaussian Processes.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.09.01 },} C
 Supervised Restricted Boltzmann Machines Tu Dinh Nguyen, Dinh Phung, Viet Huynh and Trung Le. In In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2017. [ | | pdf] We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation. @INPROCEEDINGS { nguyen_etal_uai17supervised,    AUTHOR = { Tu Dinh Nguyen and Dinh Phung and Viet Huynh and Trung Le },    TITLE = { Supervised Restricted Boltzmann Machines },    BOOKTITLE = { In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI) },    YEAR = { 2017 },    ABSTRACT = { We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation. },    FILE = { :nguyen_etal_uai17supervised - Supervised Restricted Boltzmann Machines.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.08.29 },    URL = { http://auai.org/uai2017/proceedings/papers/106.pdf },} C
 Multilevel clustering via Wasserstein means Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, Viet Huynh and Dinh Phung. In Proc. of ICML (ICML), 2017. [ | | pdf] We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. @INPROCEEDINGS { ho_etal_icml17multilevel,    AUTHOR = { Nhat Ho and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Viet Huynh and Dinh Phung },    TITLE = { Multilevel clustering via Wasserstein means },    BOOKTITLE = { Proc. of ICML (ICML) },    YEAR = { 2017 },    ABSTRACT = { We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. },    FILE = { :ho_etal_icml17multilevel - Multilevel Clustering Via Wasserstein Means.pdf:PDF },    URL = { http://proceedings.mlr.press/v70/ho17a.html },} C
 Approximation Vector Machines for Large-scale Online Learning Trung Le, Tu Dinh Nguyen, Vu Nguyen and Dinh Q. Phung. Journal of Machine Learning Research (JMLR), 2017. [ | | pdf] One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. @ARTICLE { le_etal_jmlr17approximation,    AUTHOR = { Trung Le and Tu Dinh Nguyen and Vu Nguyen and Dinh Q. Phung },    TITLE = { Approximation Vector Machines for Large-scale Online Learning },    JOURNAL = { Journal of Machine Learning Research (JMLR) },    YEAR = { 2017 },    ABSTRACT = { One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. },    FILE = { :le_etal_jmlr17approximation - Approximation Vector Machines for Large Scale Online Learning.pdf:PDF },    KEYWORDS = { kernel, online learning, large-scale machine learning, sparsity, big data, core set, stochastic gradient descent, convergence analysis },    URL = { https://arxiv.org/abs/1604.06518 },} J
 Discriminative Bayesian Nonparametric Clustering Vu Nguyen, Dinh Phung, Trung Le, Svetha Venkatesh and Hung Bui. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf] We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models. @INPROCEEDINGS { nguyen_etal_ijcai17discriminative,    AUTHOR = { Vu Nguyen and Dinh Phung and Trung Le and Svetha Venkatesh and Hung Bui },    TITLE = { Discriminative Bayesian Nonparametric Clustering },    BOOKTITLE = { Proc. of International Joint Conference on Artificial Intelligence (IJCAI) },    YEAR = { 2017 },    ABSTRACT = { We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models. },    FILE = { :nguyen_etal_ijcai17discriminative - Discriminative Bayesian Nonparametric Clustering.pdf:PDF },    URL = { https://www.ijcai.org/proceedings/2017/355 },} C
 Large-scale Online Kernel Learning with Random Feature Reparameterization Tu Dinh Nguyen, Trung Le, Hung Bui and Dinh Phung. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf] A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency. @INPROCEEDINGS { tu_etal_ijcai17_rrf,    AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Bui and Dinh Phung },    TITLE = { Large-scale Online Kernel Learning with Random Feature Reparameterization },    BOOKTITLE = { Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) },    YEAR = { 2017 },    SERIES = { IJCAI'17 },    ABSTRACT = { A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency. },    FILE = { :tu_etal_ijcai17_rrf - Large Scale Online Kernel Learning with Random Feature Reparameterization.pdf:PDF },    LOCATION = { Melbourne, Australia },    NUMPAGES = { 7 },    URL = { https://www.ijcai.org/proceedings/2017/354 },} C
 Model-Based Multiple Instance Learning Ba-Ngu Vo, Dinh Phung, Quang N. Tran and Ba-Tuong Vo. arXiv, Mar. 2017. [ | | pdf] While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. @ARTICLE { vo_etal_arxiv17modelbased,    AUTHOR = { Ba-Ngu Vo and Dinh Phung and Quang N. Tran and Ba-Tuong Vo },    TITLE = { Model-Based Multiple Instance Learning },    JOURNAL = { arXiv },    YEAR = { 2017 },    MONTH = { Mar. },    ABSTRACT = { While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. },    FILE = { :vo_etal_arxiv17modelbased - Model Based Multiple Instance Learning.pdf:PDF },    KEYWORDS = { Multiple instance learning, point pattern, point process, random finite set, classification, novelty detection, clustering },    URL = { https://arxiv.org/pdf/1703.02155.pdf },} J
 Hierarchical semi-Markov conditional random fields for deep recursive sequential data Truyen Tran, Dinh Phung, Hung H. Bui and Svetha Venkatesh. Artificial Intelligence (AIJ), Feb. 2017. (Accepted). [ | | pdf] We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. @ARTICLE { tran_etal_aij17hierarchical,    AUTHOR = { Truyen Tran and Dinh Phung and Hung H. Bui and Svetha Venkatesh },    TITLE = { Hierarchical semi-Markov conditional random fields for deep recursive sequential data },    JOURNAL = { Artificial Intelligence (AIJ) },    YEAR = { 2017 },    MONTH = { Feb. },    NOTE = { Accepted },    ABSTRACT = { We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. },    FILE = { :tran_etal_aij17hierarchical - Hierarchical Semi Markov Conditional Random Fields for Deep Recursive Sequential Data.pdf:PDF },    KEYWORDS = { Deep nested sequential processes, Hierarchical semi-Markov conditional random field, Partial labelling, Constrained inference, Numerical scaling },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.02.21 },    URL = { http://www.sciencedirect.com/science/article/pii/S0004370217300231 },} J
• See my thesis (chapter 5) for for an equivalent directed graphical model, which is the precusor of this work and where I had described the Assymetric Inside-Outside (AIO) algorithm in great detail. A brief version of this for directed case has also appeared in this AAAI'04's paper. The idea of semi-Markov duration modelling has also been addressed for directed case in these CVPR05 and AIJ09 papers.
 Column Networks for Collective Classification Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In The Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017. [ | | pdf] Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. @CONFERENCE { pham_etal_aaai17column,    AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Column Networks for Collective Classification },    BOOKTITLE = { The Thirty-First AAAI Conference on Artificial Intelligence (AAAI) },    YEAR = { 2017 },    ABSTRACT = { Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. },    COMMENT = { Accepted },    FILE = { :pham_etal_aaai17column - Column Networks for Collective Classification.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.11.14 },    URL = { https://arxiv.org/abs/1609.04508 },} C
 Dual Space Gradient Descent for Online Learning Le, Trung, Nguyen, Tu Dinh, Nguyen, Vu and Phung, Dinh. In Advances in Neural Information Processing (NIPS), December 2016. [ | | pdf] One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines. @CONFERENCE { le_etal_nips16dual,    AUTHOR = { Le, Trung and Nguyen, Tu Dinh and Nguyen, Vu and Phung, Dinh },    TITLE = { Dual Space Gradient Descent for Online Learning },    BOOKTITLE = { Advances in Neural Information Processing (NIPS) },    YEAR = { 2016 },    MONTH = { December },    ABSTRACT = { One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines. },    FILE = { :le_etal_nips16dual - Dual Space Gradient Descent for Online Learning.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.08.16 },    URL = { https://papers.nips.cc/paper/6560-dual-space-gradient-descent-for-online-learning.pdf },} C
 Scalable Nonparametric Bayesian Multilevel Clustering Huynh, V., Phung, D., Svetha, V., Nguyen, X.L, Hoffman, M. and Bui, H.. In 32th Conference on Uncertainty in Artificial Intelligence (UAI), June 2016. [ | | pdf] @CONFERENCE { huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable,    AUTHOR = { Huynh, V. and Phung, D. and Svetha, V. and Nguyen, X.L and Hoffman, M. and Bui, H. },    TITLE = { Scalable Nonparametric {B}ayesian Multilevel Clustering },    BOOKTITLE = { 32th Conference on Uncertainty in Artificial Intelligence (UAI) },    YEAR = { 2016 },    MONTH = { June },    FILE = { :huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable - Scalable Nonparametric Bayesian Multilevel Clustering.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.09 },    URL = { http://auai.org/uai2016/proceedings/papers/262.pdf },} C
 Budgeted Semi-supervised Support Vector Machine Le, Trung, Duong, Phuong, Dinh, Mi, Nguyen, Tu, Nguyen, Vu and Phung, Dinh. In 32th Conference on Uncertainty in Artificial Intelligence (UAI), June 2016. [ | | pdf] @CONFERENCE { le_duong_dinh_nguyen_nguyen_phung_uai16budgeted,    AUTHOR = { Le, Trung and Duong, Phuong and Dinh, Mi and Nguyen, Tu and Nguyen, Vu and Phung, Dinh },    TITLE = { Budgeted Semi-supervised {S}upport {V}ector {M}achine },    BOOKTITLE = { 32th Conference on Uncertainty in Artificial Intelligence (UAI) },    YEAR = { 2016 },    MONTH = { June },    FILE = { :le_duong_dinh_nguyen_nguyen_phung_uai16budgeted - Budgeted Semi Supervised Support Vector Machine.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.09 },    URL = { http://auai.org/uai2016/proceedings/papers/110.pdf },} C
 Nonparametric Budgeted Stochastic Gradient Descent Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS), May 2016. [ | | pdf] @CONFERENCE { le_nguyen_phung_aistats16nonparametric,    AUTHOR = { Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh },    TITLE = { Nonparametric Budgeted Stochastic Gradient Descent },    BOOKTITLE = { 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS) },    YEAR = { 2016 },    MONTH = { May },    FILE = { :le_nguyen_phung_aistats16nonparametric - Nonparametric Budgeted Stochastic Gradient Descent.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },    URL = { http://www.jmlr.org/proceedings/papers/v51/le16.pdf },} C
 One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems Nguyen, Vu, Nguyen, Tu Dinh, Le, Trung, Phung, Dinh and Venkatesh, Svetha. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1113-1118, Dec 2016. [ | | pdf | code] Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. @CONFERENCE { nguyen_etal_icdm16onepass,    AUTHOR = { Nguyen, Vu and Nguyen, Tu Dinh and Le, Trung and Phung, Dinh and Venkatesh, Svetha },    TITLE = { One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems },    BOOKTITLE = { 2016 IEEE 16th International Conference on Data Mining (ICDM) },    YEAR = { 2016 },    PAGES = { 1113-1118 },    MONTH = { Dec },    ABSTRACT = { Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. },    CODE = { https://github.com/ntienvu/ICDM2016_OLR },    DOI = { 10.1109/ICDM.2016.0145 },    FILE = { :nguyen_etal_icdm16onepass - One Pass Logistic Regression for Label Drift and Large Scale Classification on Distributed Systems.pdf:PDF },    KEYWORDS = { Big Data;distributed processing;pattern classification;regression analysis;Big Data cross-validation;Spark-OLR;class labels;data shuffling;distributed systems;execution time;label-drift classification problem;large-scale classification;large-scale datasets;one-pass logistic regression;optimal hyperparameter selection;sparkling OLR;Bayes methods;Big data;Context;Data models;Estimation;Industries;Logistics;Apache Spark;Logistic regression;distributed system;label-drift;large-scale classification },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.09.10 },    URL = { http://ieeexplore.ieee.org/document/7837958/ },} C
 A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process Nguyen, T., Nguyen, V., Salim, F.D., Le, D.V. and Phung, D.. Pervasive and Mobile Computing (PMC), 2016. [ | | pdf] Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated. @ARTICLE { nguyen_nguyen_flora_le_phung_pmc16simultaneous,    AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Le, D.V. and Phung, D. },    TITLE = { A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested {D}irichlet Process },    JOURNAL = { Pervasive and Mobile Computing (PMC) },    YEAR = { 2016 },    ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated. },    DOI = { http://dx.doi.org/10.1016/j.pmcj.2016.08.019 },    FILE = { :nguyen_nguyen_flora_le_phung_pmc16simultaneous - A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.08.17 },    URL = { http://www.sciencedirect.com/science/article/pii/S1574119216302097 },} J
 Streaming Variational Inference for Dirichlet Process Mixtures Huynh, V., Phung, D. and Venkatesh, S.. In 7th Asian Conference on Machine Learning (ACML), pages 237-252, Nov. 2015. [ | | pdf] Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data. @INPROCEEDINGS { huynh_phung_venkatesh_15streaming,    AUTHOR = { Huynh, V. and Phung, D. and Venkatesh, S. },    TITLE = { Streaming Variational Inference for {D}irichlet {P}rocess {M}ixtures },    BOOKTITLE = { 7th Asian Conference on Machine Learning (ACML) },    YEAR = { 2015 },    PAGES = { 237--252 },    MONTH = { Nov. },    ABSTRACT = { Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data. },    FILE = { :huynh_phung_venkatesh_15streaming - Streaming Variational Inference for Dirichlet Process Mixtures.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },    URL = { http://www.jmlr.org/proceedings/papers/v45/Huynh15.pdf },} C
 Tensor-variate Restricted Boltzmann Machines Nguyen, Tu, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proc. of AAAI Conf. on Artificial Intelligence (AAAI), pages 2887-2893, Austin Texas, USA , January 2015. [ | | pdf] Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance. @INPROCEEDINGS { tu_truyen_phung_venkatesh_aaai15,    TITLE = { Tensor-variate Restricted {B}oltzmann Machines },    AUTHOR = { Nguyen, Tu and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proc. of AAAI Conf. on Artificial Intelligence (AAAI) },    YEAR = { 2015 },    ADDRESS = { Austin Texas, USA },    MONTH = { January },    PAGES = { 2887--2893 },    ABSTRACT = { Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance. },    KEYWORDS = { tensor; rbm; restricted boltzmann machine; tvrbm; multiplicative interaction; eeg; },    OWNER = { ngtu },    TIMESTAMP = { 2015.01.29 },    URL = { http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9371 },} C
 Learning Latent Activities from Social Signals with Hierarchical Dirichlet Process Phung, Dinh, Nguyen, T. C., Gupta, S. and Venkatesh, Svetha. In Handbook on Plan, Activity, and Intent Recognition, pages 149-174.Elsevier, , 2014. [ | | pdf | code] Understanding human activities is an important research topic, noticeablyin assisted living and health monitoring. Beyond simple forms ofactivity (e.g., RFID event of entering a building), learning latentactivities that are more semantically interpretable, such as sittingat a desk, meeting with people or gathering with friends, remainsa challenging problem. Supervised learning has been the typical modelingchoice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adaptto the continuing growth of data over time. In this chapter, we exploreBayesian nonparametric method, in particular the Hierarchical DirichletProcess, to infer latent activities from sensor data acquired ina pervasive setting. Our framework is unsupervised, requires no labeleddata and is able to discover new activities as data grows. We presentexperiments on extracting movement and interaction activities fromsociometric badge signals and show how to use them for detectionof sub-communities. Using the popular Reality Mining dataset, wefurther demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups. @INCOLLECTION { phung_nguyen_gupta_venkatesh_pair14,    TITLE = { Learning Latent Activities from Social Signals with Hierarchical {D}irichlet Process },    AUTHOR = { Phung, Dinh and Nguyen, T. C. and Gupta, S. and Venkatesh, Svetha },    BOOKTITLE = { Handbook on Plan, Activity, and Intent Recognition },    PUBLISHER = { Elsevier },    YEAR = { 2014 },    EDITOR = { Gita Sukthankar and Christopher Geib and David V. Pynadath and HungBui and Robert P. Goldman },    PAGES = { 149--174 },    ABSTRACT = { Understanding human activities is an important research topic, noticeablyin assisted living and health monitoring. Beyond simple forms ofactivity (e.g., RFID event of entering a building), learning latentactivities that are more semantically interpretable, such as sittingat a desk, meeting with people or gathering with friends, remainsa challenging problem. Supervised learning has been the typical modelingchoice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adaptto the continuing growth of data over time. In this chapter, we exploreBayesian nonparametric method, in particular the Hierarchical DirichletProcess, to infer latent activities from sensor data acquired ina pervasive setting. Our framework is unsupervised, requires no labeleddata and is able to discover new activities as data grows. We presentexperiments on extracting movement and interaction activities fromsociometric badge signals and show how to use them for detectionof sub-communities. Using the popular Reality Mining dataset, wefurther demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups. },    CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code },    OWNER = { ctng },    TIMESTAMP = { 2013.07.25 },    URL = { http://prada-research.net/~dinh/uploads/Main/Publications/Phung_etal_pair14.pdf },} BC
 A Random Finite Set Model for Data Clustering Phung, Dinh and Vo, Ba-Ngu. In Proc. of Intl. Conf. on Fusion (FUSION), Salamanca, Spain, July 2014. [ | | pdf] Abstract--- The goal of data clustering is to partition data pointsinto groups to minimize a given objective function. While most existingclustering algorithms treat each data point as vector, in many applicationseach datum is not a vector but a point pattern or a set of points.Moreover, many existing clustering methods require the user to specifythe number of clusters, which is not available in advance. This paperproposes a new class of models for data clustering that addressesset-valued data as well as unknown number of clusters, using a DirichletProcess mixture of Poisson random finite sets. We also develop anefficient Markov Chain Monte Carlo posterior inference techniquethat can learn the number of clusters and mixture parameters automaticallyfrom the data. Numerical studies are presented to demonstrate thesalient features of this new model, in particular its capacity todiscover extremely unbalanced clusters in data. @INPROCEEDINGS { phung_vo_fusion14,    TITLE = { A Random Finite Set Model for Data Clustering },    AUTHOR = { Phung, Dinh and Vo, Ba-Ngu },    BOOKTITLE = { Proc. of Intl. Conf. on Fusion (FUSION) },    YEAR = { 2014 },    ADDRESS = { Salamanca, Spain },    MONTH = { July },    ABSTRACT = { Abstract--- The goal of data clustering is to partition data pointsinto groups to minimize a given objective function. While most existingclustering algorithms treat each data point as vector, in many applicationseach datum is not a vector but a point pattern or a set of points.Moreover, many existing clustering methods require the user to specifythe number of clusters, which is not available in advance. This paperproposes a new class of models for data clustering that addressesset-valued data as well as unknown number of clusters, using a DirichletProcess mixture of Poisson random finite sets. We also develop anefficient Markov Chain Monte Carlo posterior inference techniquethat can learn the number of clusters and mixture parameters automaticallyfrom the data. Numerical studies are presented to demonstrate thesalient features of this new model, in particular its capacity todiscover extremely unbalanced clusters in data. },    OWNER = { dinh },    TIMESTAMP = { 2014.05.16 },    URL = { http://prada-research.net/~dinh/uploads/Main/Publications/phung_vo_fusion14.pdf },} C
 Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter Vo, Ba-Ngu, Vo, Ba-Tuong and Phung, Dinh. IEEE Transactions on Signal Processing (TSP), 62(24):6554-6567, 2014. [ | ] @ARTICLE { vo_vo_phung_tsp14,    AUTHOR = { Vo, Ba-Ngu and Vo, Ba-Tuong and Phung, Dinh },    TITLE = { Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter },    JOURNAL = { IEEE Transactions on Signal Processing (TSP) },    YEAR = { 2014 },    VOLUME = { 62 },    NUMBER = { 24 },    PAGES = { 6554--6567 },    FILE = { :vo_vo_phung_tsp14 - Labeled Random Finite Sets and the Bayes Multi Target Tracking Filter.pdf:PDF },    OWNER = { dinh },    TIMESTAMP = { 2014.07.02 },} J
 Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts Vu Nguyen, Phung, Dinh, XuanLong Nguyen, Venkatesh, Svetha and Hung Bui. In Proc. of Intl. Conf. on Machine Learning (ICML), pages 288-296, 2014. [ | ] @INPROCEEDINGS { nguyen_phung_nguyen_venkatesh_bui_icml14,    TITLE = { Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts },    AUTHOR = { Vu Nguyen and Phung, Dinh and XuanLong Nguyen and Venkatesh, Svetha and Hung Bui },    BOOKTITLE = { Proc. of Intl. Conf. on Machine Learning (ICML) },    YEAR = { 2014 },    PAGES = { 288--296 },    OWNER = { tvnguye },    TIMESTAMP = { 2013.12.13 },} C
 Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions Sunil Kumar Gupta, Santu Rana, Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 235-243, 2014. [ | | pdf] @INPROCEEDINGS { gupta_rana_phung_venkatesh_sdm14,    TITLE = { Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions },    AUTHOR = { Sunil Kumar Gupta and Santu Rana and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) },    YEAR = { 2014 },    PAGES = { 235-243 },    CHAPTER = { 27 },    DOI = { 10.1137/1.9781611973440.27 },    EPRINT = { http://epubs.siam.org/doi/pdf/10.1137/1.9781611973440.27 },    OWNER = { thinng },    TIMESTAMP = { 2015.01.28 },    URL = { http://epubs.siam.org/doi/abs/10.1137/1.9781611973440.27 },} C
 An Integrated Framework for Suicide Risk Prediction Tran, Truyen, Phung, Dinh, Luo, Wei, Harvey,R., Berk,M. and Venkatesh, Svetha. In Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Chicago, US, 2013. [ | ] @INPROCEEDINGS { tran_phung_luo_harvey_berk_venkatesh_kdd13,    TITLE = { An Integrated Framework for Suicide Risk Prediction },    AUTHOR = { Tran, Truyen and Phung, Dinh and Luo, Wei and Harvey,R. and Berk,M. and Venkatesh, Svetha },    BOOKTITLE = { Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD) },    YEAR = { 2013 },    ADDRESS = { Chicago, US },    OWNER = { Dinh },    TIMESTAMP = { 2013.06.07 },} C
 Thurstonian Boltzmann Machines: Learning from Multiple Inequalities Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ] We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis. @INPROCEEDINGS { tran_phung_venkatesh_icml13,    TITLE = { {T}hurstonian {B}oltzmann Machines: Learning from Multiple Inequalities },    AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { International Conference on Machine Learning (ICML) },    YEAR = { 2013 },    ADDRESS = { Atlanta, USA },    MONTH = { June 16-21 },    ABSTRACT = { We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis. },    OWNER = { dinh },    TIMESTAMP = { 2013.03.01 },} C
 Factorial Multi-Task Learning : A Bayesian Nonparametric Approach Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In Proceedings of International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ] Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods. @INPROCEEDINGS { gupta_phung_venkatesh_icml13,    TITLE = { Factorial Multi-Task Learning : A Bayesian Nonparametric Approach },    AUTHOR = { Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proceedings of International Conference on Machine Learning (ICML) },    YEAR = { 2013 },    ADDRESS = { Atlanta, USA },    MONTH = { June 16-21 },    ABSTRACT = { Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods. },    OWNER = { Dinh },    TIMESTAMP = { 2013.04.16 },} C
 Sparse Subspace Clustering via Group Sparse Coding Saha, B., Pham, D.S., Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 130-138, 2013. [ | ] Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably. @INPROCEEDINGS { saha_pham_phung_venkatesh_sdm13,    TITLE = { Sparse Subspace Clustering via Group Sparse Coding },    AUTHOR = { Saha, B. and Pham, D.S. and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) },    YEAR = { 2013 },    PAGES = { 130-138 },    ABSTRACT = { Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably. },    OWNER = { thinng },    TIMESTAMP = { 2013.01.07 },} C
 Bayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster) Phung, Dinh. In International Conference on Bayesian Nonparametrics, Amsterdam, The Netherlands, June 10-14 2013. [ | | code | poster] When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's pro le and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction. @INPROCEEDINGS { phung_bnp13,    TITLE = { {B}ayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster) },    AUTHOR = { Phung, Dinh },    BOOKTITLE = { International Conference on Bayesian Nonparametrics },    YEAR = { 2013 },    ADDRESS = { Amsterdam, The Netherlands },    MONTH = { June 10-14 },    ABSTRACT = { When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's pro le and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction. },    CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code },    OWNER = { dinh },    POSTER = { http://prada-research.net/~dinh/uploads/Main/Publications/A0_poster_BNP13.pdf },    TIMESTAMP = { 2013.03.01 },} C
 Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis Phung, Dinh, Gupta, S. K., Nguyen, T. and Venkatesh, Svetha. IEEE Transactions on Multimedia (TMM), 15:1316-1325, 2013. [ | | pdf] Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being. @ARTICLE { phung_gupta_nguyen_venkatesh_tmm13,    TITLE = { Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis },    AUTHOR = { Phung, Dinh and Gupta, S. K. and Nguyen, T. and Venkatesh, Svetha },    JOURNAL = { IEEE Transactions on Multimedia (TMM) },    YEAR = { 2013 },    PAGES = { 1316-1325 },    VOLUME = { 15 },    ABSTRACT = { Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being. },    ISSN = { 0219-1377 },    LANGUAGE = { English },    TIMESTAMP = { 2013.04.16 },    URL = { http://prada-research.net/~dinh/uploads/Main/HomePage/phung_gupta_nguyen_venkatesh_tmm13.pdf },} J
 Regularized nonnegative shared subspace learning Gupta, Sunil Kumar, Phung, Dinh, Adams, Brett and Venkatesh, Svetha. Data Mining and Knowledge Discovery, 26(1):57-97, 2013. [ | ] @ARTICLE { gupta_phung_adams_venkatesh_dami13,    TITLE = { Regularized nonnegative shared subspace learning },    AUTHOR = { Gupta, Sunil Kumar and Phung, Dinh and Adams, Brett and Venkatesh, Svetha },    JOURNAL = { Data Mining and Knowledge Discovery },    YEAR = { 2013 },    NUMBER = { 1 },    PAGES = { 57--97 },    VOLUME = { 26 },    OWNER = { thinng },    PUBLISHER = { Springer },    TIMESTAMP = { 2015.01.29 },} J
 A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning Gupta, S., Phung, Dinh and Venkatesh, Svetha. In Proc. of Intl. Conf. on Uncertainty in Artificial Intelligence (UAI), pages 316-325, 2012. [ | ] @INPROCEEDINGS { gupta_phung_venkatesh_uai12,    TITLE = { A Slice Sampler for Restricted Hierarchical {B}eta Process with Applications to Shared Subspace Learning },    AUTHOR = { Gupta, S. and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proc. of Intl. Conf. on Uncertainty in Artificial Intelligence (UAI) },    YEAR = { 2012 },    PAGES = { 316--325 },    OWNER = { dinh },    TIMESTAMP = { 2012.05.24 },} C
 A Bayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources Gupta, S., Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 200-211, 2012. [ | ] Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval. @INPROCEEDINGS { gupta_phung_venkatesh_sdm12,    TITLE = { A {B}ayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources },    AUTHOR = { Gupta, S. and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) },    YEAR = { 2012 },    PAGES = { 200--211 },    ABSTRACT = { Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval. },} C
 A Sequential Decision Approach to Ordinal Preferences in Recommender Systems Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proc. of AAAI Conf. on Artificial Intelligence (AAAI), pages 676-682, 2012. [ | ] We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods. @INPROCEEDINGS { truyen_phung_venkatesh_aaai12,    TITLE = { A Sequential Decision Approach to Ordinal Preferences in Recommender Systems },    AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proc. of AAAI Conf. on Artificial Intelligence (AAAI) },    YEAR = { 2012 },    PAGES = { 676--682 },    ABSTRACT = { We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods. },    TIMESTAMP = { 2012.04.11 },} C
 Improved Subspace Clustering via Exploitation of Spatial Constraints Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 550-557, 2012. [ | ] We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation. @INPROCEEDINGS { pham_budhaditya_phung_venkatesh_cvpr12,    TITLE = { Improved Subspace Clustering via Exploitation of Spatial Constraints },    AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) },    YEAR = { 2012 },    PAGES = { 550--557 },    ABSTRACT = { We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation. },    OWNER = { thinng },    TIMESTAMP = { 2012.04.11 },} C
 Sparse Subspace Representation for Spectral Document Clustering Saha, B., Phung, Dinh, Pham, D.S. and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Data Mining (ICDM), pages 1092-1097, 2012. [ | ] @INPROCEEDINGS { saha_phung_pham_venkatesh_icdm12,    TITLE = { Sparse Subspace Representation for Spectral Document Clustering },    AUTHOR = { Saha, B. and Phung, Dinh and Pham, D.S. and Venkatesh, Svetha },    BOOKTITLE = { Proc. of IEEE Intl. Conf. on Data Mining (ICDM) },    YEAR = { 2012 },    PAGES = { 1092--1097 },    OWNER = { dinh },    TIMESTAMP = { 2012.10.31 },} C
 Detection of Cross-Channel Anomalies Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 35(1):33-59, 2013. [ | ] The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. @ARTICLE { pham_budhaditya_phung_venkatesh_kais13,    TITLE = { Detection of Cross-Channel Anomalies },    AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha },    JOURNAL = { Knowledge and Information Systems (KAIS) },    YEAR = { 2013 },    NUMBER = { 1 },    PAGES = { 33--59 },    VOLUME = { 35 },    ABSTRACT = { The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. },} J
 Detection of Cross-Channel Anomalies From Multiple Data Channels Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Data Mining (ICDM), Vancouver, Canada, December 2011. [ | ] We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. @INPROCEEDINGS { pham_budhaditya_phung_venkatesh_icdm11,    TITLE = { Detection of Cross-Channel Anomalies From Multiple Data Channels },    AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proc. of IEEE Intl. Conf. on Data Mining (ICDM) },    YEAR = { 2011 },    ADDRESS = { Vancouver, Canada },    MONTH = { December },    ABSTRACT = { We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. },    COMMENT = { coauthor },    OWNER = { thinng },    TIMESTAMP = { 2012.04.11 },} C
 Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Procs. of SIAM Intl. Conf. on Data Mining (SDM), Arizona, USA, April 2011. [ | | pdf] Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that modelsthe process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals. @INPROCEEDINGS { truyen_phung_venkatesh_sdm11,    TITLE = { Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering },    AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Procs. of SIAM Intl. Conf. on Data Mining (SDM) },    YEAR = { 2011 },    ADDRESS = { Arizona, USA },    MONTH = { April },    ABSTRACT = { Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that modelsthe process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals. },    COMMENT = { coauthor },    FILE = { :papers\\phung\\truyen_phung_venkatesh_sdm11.pdf:PDF },    OWNER = { 184698H },    TIMESTAMP = { 2011.02.07 },    URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Truyen_etal_sdm11.pdf },} C
 Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval Gupta, Sunil, Phung, Dinh, Adams, Brett, Tran, Truyen and Venkatesh, Svetha. In Proc. of ACM Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Washington DC, USA, July 2010. [ | | pdf] Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets. @INPROCEEDINGS { gupta_phung_adams_truyen_venkatesh_sigkdd10,    TITLE = { Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval },    AUTHOR = { Gupta, Sunil and Phung, Dinh and Adams, Brett and Tran, Truyen and Venkatesh, Svetha },    BOOKTITLE = { Proc. of ACM Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD) },    YEAR = { 2010 },    ADDRESS = { Washington DC, USA },    MONTH = { July },    ABSTRACT = { Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets. },    COMMENT = { coauthor },    FILE = { :papers\\phung\\gupta_phung_adams_truyen_venkatesh_sigkdd10.pdf:PDF },    OWNER = { Dinh Phung },    TIMESTAMP = { 2010.06.29 },    URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Gupta_etal_sigkdd10.pdf },} C
 Efficient duration and hierarchical modeling for human activity recognition Duong, Thi, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. Artificial Intelligence (AIJ), 173(7-8):830-856, 2009. [ | | pdf | code] A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling. @ARTICLE { duong_phung_bui_venkatesh_aij09,    AUTHOR = { Duong, Thi and Phung, Dinh and Bui, Hung and Venkatesh, Svetha },    TITLE = { Efficient duration and hierarchical modeling for human activity recognition },    JOURNAL = { Artificial Intelligence (AIJ) },    YEAR = { 2009 },    VOLUME = { 173 },    NUMBER = { 7-8 },    PAGES = { 830--856 },    ABSTRACT = { A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling. },    CODE = { https://github.com/DASCIMAL/CxHSMM },    COMMENT = { coauthor },    DOI = { http://dx.doi.org/10.1016/j.artint.2008.12.005 },    FILE = { :duong_phung_bui_venkatesh_aij09 - Efficient Duration and Hierarchical Modeling for Human Activity Recognition.pdf:PDF },    KEYWORDS = { activity, recognition, duration modeling, Coxian, Hidden semi-Markov model, HSMM , smart surveillance },    OWNER = { 184698H },    PUBLISHER = { Elsevier },    TIMESTAMP = { 2010.08.11 },    URL = { http://www.sciencedirect.com/science/article/pii/S0004370208002142 },} J
 MCMC for Hierarchical Semi-Markov Conditional Random Fields Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of Workshop on Deep Learning for Speech Recognition and Related Applications, in conjunction with the Neural Information Processing Systems Conference (NIPS), Whistler, BC, Canada, December 2009. [ | ] Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length. @INPROCEEDINGS { truyen_phung_bui_venkatesh_nips09,    TITLE = { {MCMC} for Hierarchical Semi-Markov Conditional Random Fields },    AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha },    BOOKTITLE = { Proc. of Workshop on Deep Learning for Speech Recognition and Related Applications, in conjunction with the Neural Information Processing Systems Conference (NIPS) },    YEAR = { 2009 },    ADDRESS = { Whistler, BC, Canada },    MONTH = { December },    ABSTRACT = { Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length. },    COMMENT = { coauthor },    OWNER = { Dinh Phung },    TIMESTAMP = { 2010.06.29 },} C
 Ordinal Boltzmann Machines for Collaborative Filtering Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proc. of Intl. Conf. on Uncertainty in Artificial Intelligence (UAI), Montreal, Canada, June 2009. (Runner-up Best Paper Award). [ | | pdf] Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and itembased processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods. @INPROCEEDINGS { truyen_phung_venkatesh_uai09,    AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Ordinal Boltzmann Machines for Collaborative Filtering },    BOOKTITLE = { Proc. of Intl. Conf. on Uncertainty in Artificial Intelligence (UAI) },    YEAR = { 2009 },    ADDRESS = { Montreal, Canada },    MONTH = { June },    NOTE = { Runner-up Best Paper Award },    ABSTRACT = { Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and itembased processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods. },    COMMENT = { coauthor },    FILE = { :truyen_phung_venkatesh_uai09 - Ordinal Boltzmann Machines for Collaborative Filtering.pdf:PDF },    OWNER = { Dinh Phung },    TIMESTAMP = { 2009.09.22 },    URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Truyen_el_uai09.pdf },} C
 The Hidden Permutation Model and Location-Based Activity Recognition Bui, Hung, Phung, Dinh, Venkatesh, Svetha and Phan, Hai. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 1345-1350, Chicago, USA, July 2008. [ | | pdf] Permutation modeling is challenging because of the combinatorial nature of the problem. However, such modeling is often required in many real-world applications, including activity recognition where subactivities are often permuted and partially ordered. This paper introduces a novel Hidden Permutation Model (HPM) that can learn the partial ordering constraints in permuted state sequences. The HPMis parameterized as an exponential family distribution and is flexible so that it can encode constraints via different feature functions. A chain-flipping Metropolis-Hastings Markov chain Monte Carlo (MCMC) is employed for inference to overcome the O(n!) complexity. Gradient-based maximum likelihood parameter learning is presented for two cases when the permutation is known and when it is hidden. The HPM is evaluated using both simulated and real data from a location-based activity recognition domain. Experimental results indicate that the HPM performs far better than other baseline models, including the naive Bayes classifier, the HMM classifier, and Kirshners multinomial permutation model. Our presented HPM is generic and can potentially be utilized in any problem where the modeling of permuted states from noisy data is needed. @INPROCEEDINGS { bui_phung_venkatesh_phan_aaai08,    TITLE = { The Hidden Permutation Model and Location-Based Activity Recognition },    AUTHOR = { Bui, Hung and Phung, Dinh and Venkatesh, Svetha and Phan, Hai },    BOOKTITLE = { Proceedings of the National Conference on Artificial Intelligence (AAAI) },    YEAR = { 2008 },    ADDRESS = { Chicago, USA },    MONTH = { July },    PAGES = { 1345--1350 },    VOLUME = { 8 },    ABSTRACT = { Permutation modeling is challenging because of the combinatorial nature of the problem. However, such modeling is often required in many real-world applications, including activity recognition where subactivities are often permuted and partially ordered. This paper introduces a novel Hidden Permutation Model (HPM) that can learn the partial ordering constraints in permuted state sequences. The HPMis parameterized as an exponential family distribution and is flexible so that it can encode constraints via different feature functions. A chain-flipping Metropolis-Hastings Markov chain Monte Carlo (MCMC) is employed for inference to overcome the O(n!) complexity. Gradient-based maximum likelihood parameter learning is presented for two cases when the permutation is known and when it is hidden. The HPM is evaluated using both simulated and real data from a location-based activity recognition domain. Experimental results indicate that the HPM performs far better than other baseline models, including the naive Bayes classifier, the HMM classifier, and Kirshners multinomial permutation model. Our presented HPM is generic and can potentially be utilized in any problem where the modeling of permuted states from noisy data is needed. },    FILE = { :papers\\phung\\bui_phung_venkatesh_phan_aaai08.pdf:PDF },    OWNER = { 184698H },    TIMESTAMP = { 2010.08.11 },    URL = { http://www.aaai.org/Papers/AAAI/2008/AAAI08-213.pdf },} C
 Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. Advances in Neural Information Processing (NIPS), December 2008. [ | | ] Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. @ARTICLE { truyen_phung_bui_venkatesh_nips08,    TITLE = { Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data },    AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha },    JOURNAL = { Advances in Neural Information Processing (NIPS) },    YEAR = { 2008 },    MONTH = { December },    ABSTRACT = { Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. },    ADDRESS = { Vancouver, Canada },    OWNER = { 184698H },    TIMESTAMP = { 2010.08.11 },    URL = { 2008/conferences/truyen_phung_bui_venkatesh_nips08.pdf },} J
 AdaBoost.MRF: Boosted Markov Random Forests and Application to Multilevel Activity Recognition Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1686-1693, New York, USA, June 2006. [ | ] Activity recognition is an important issue in building intelligent monitoring systems. We address the recognition of multilevel activities in this paper via a conditional Markov random field (MRF), known as the dynamic conditional random field (DCRF). Parameter estimation in general MRFs using maximum likelihood is known to be computationally challenging (except for extreme cases), and thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Distinct from most existing work, our algorithm can handle hidden variables (missing labels) and is particularly attractive for smarthouse domains where reliable labels are often sparsely observed. Furthermore, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithmto a home video surveillance application and demonstrate its efficacy. @INPROCEEDINGS { truyen_phung_bui_venkatesh_cvpr06,    TITLE = { {AdaBoost.MRF}: Boosted {M}arkov Random Forests and Application to Multilevel Activity Recognition },    AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha },    BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) },    YEAR = { 2006 },    ADDRESS = { New York, USA },    MONTH = { June },    PAGES = { 1686-1693 },    ABSTRACT = { Activity recognition is an important issue in building intelligent monitoring systems. We address the recognition of multilevel activities in this paper via a conditional Markov random field (MRF), known as the dynamic conditional random field (DCRF). Parameter estimation in general MRFs using maximum likelihood is known to be computationally challenging (except for extreme cases), and thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Distinct from most existing work, our algorithm can handle hidden variables (missing labels) and is particularly attractive for smarthouse domains where reliable labels are often sparsely observed. Furthermore, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithmto a home video surveillance application and demonstrate its efficacy. },    COMMENT = { coauthor },    OWNER = { 184698H },    TIMESTAMP = { 2010.08.11 },} C
 Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model Duong, Thi, Bui, Hung, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 838-845, San Diego, 20-26 June 2005. [ | ] This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model. @INPROCEEDINGS { duong_bui_phung_venkatesh_cvpr05,    TITLE = { Activity Recognition and Abnormality Detection with the {S}witching {H}idden {S}emi-{M}arkov {M}odel },    AUTHOR = { Duong, Thi and Bui, Hung and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) },    YEAR = { 2005 },    ADDRESS = { San Diego },    MONTH = { 20-26 June },    PAGES = { 838--845 },    PUBLISHER = { IEEE Computer Society },    VOLUME = { 1 },    ABSTRACT = { This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model. },    KEYWORDS = { Activity Recognition, Abnormality detection, semi-Markov, hierarchical HSMM },    OWNER = { 184698H },    TIMESTAMP = { 2010.08.11 },} C
 Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model Nguyen, N., Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 955-960, San Diego, 2005. [ | ] Directly modeling the inherent hierarchy and shared structures of human behaviors, we present an application of the hierarchical hidden Markov model (HHMM) for the problem of activity recognition. We argue that to robustly model and recognize complex human activities, it is crucial to exploit both the natural hierarchical decomposition and shared semantics embedded in the movement trajectories. To this end, we propose the use of the HHMM, a rich stochastic model that has been recently extended to handle shared structures, for representing and recognizing a set of complex indoor activities. Furthermore, in the need of real-time recognition, we propose a Rao-Blackwellised particle filter (RBPF) that efficiently computes the filtering distribution at a constant time complexity for each new observation arrival. The main contributions of this paper lie in the application of the sharedstructure HHMM, the estimation of the model's parameters at all levels simultaneously, and a construction of an RBPF approximate inference scheme. The experimental results in a real-world environment have confirmed our belief that directly modeling shared structures not only reduces computational cost, but also improves recognition accuracy when compared with the tree HHMM and the flat HMM. @INPROCEEDINGS { nguyen_phung_bui_venkatesh_cvpr05,    TITLE = { Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model },    AUTHOR = { Nguyen, N. and Phung, Dinh and Bui, Hung and Venkatesh, Svetha },    BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) },    YEAR = { 2005 },    ADDRESS = { San Diego },    PAGES = { 955--960 },    PUBLISHER = { IEEE Computer Soceity },    VOLUME = { 1 },    ABSTRACT = { Directly modeling the inherent hierarchy and shared structures of human behaviors, we present an application of the hierarchical hidden Markov model (HHMM) for the problem of activity recognition. We argue that to robustly model and recognize complex human activities, it is crucial to exploit both the natural hierarchical decomposition and shared semantics embedded in the movement trajectories. To this end, we propose the use of the HHMM, a rich stochastic model that has been recently extended to handle shared structures, for representing and recognizing a set of complex indoor activities. Furthermore, in the need of real-time recognition, we propose a Rao-Blackwellised particle filter (RBPF) that efficiently computes the filtering distribution at a constant time complexity for each new observation arrival. The main contributions of this paper lie in the application of the sharedstructure HHMM, the estimation of the model's parameters at all levels simultaneously, and a construction of an RBPF approximate inference scheme. The experimental results in a real-world environment have confirmed our belief that directly modeling shared structures not only reduces computational cost, but also improves recognition accuracy when compared with the tree HHMM and the flat HMM. },    OWNER = { 184698H },    TIMESTAMP = { 2010.08.11 },} C
 Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models Phung, Dinh, Duong, Thi, Bui, Hung and Venkatesh, Svetha. In Proc. of ACM Intl. Conf. on Multimedia (ACM-MM), Singapore, 6--11 Nov. 2005. [ | ] In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling. @INPROCEEDINGS { phung_duong_bui_venkatesh_acmmm05,    TITLE = { Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models },    AUTHOR = { Phung, Dinh and Duong, Thi and Bui, Hung and Venkatesh, Svetha },    BOOKTITLE = { Proc. of ACM Intl. Conf. on Multimedia (ACM-MM) },    YEAR = { 2005 },    ADDRESS = { Singapore },    MONTH = { 6--11 Nov. },    ABSTRACT = { In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling. },    OWNER = { 184698H },    TIMESTAMP = { 2010.08.11 },} C
 Hierarchical Hidden Markov Models with General State Hierarchy Bui, Hung, Phung, Dinh and Venkatesh, Svetha. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 324-329, San Jose, California, USA, 2004. [ | | pdf] The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition. @INPROCEEDINGS { bui_phung_venkatesh_aaai04,    TITLE = { Hierarchical Hidden Markov Models with General State Hierarchy },    AUTHOR = { Bui, Hung and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proceedings of the National Conference on Artificial Intelligence (AAAI) },    YEAR = { 2004 },    ADDRESS = { San Jose, California, USA },    EDITOR = { McGuinness, Deborah L. and Ferguson, George },    PAGES = { 324--329 },    PUBLISHER = { MIT Press },    ABSTRACT = { The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition. },    FILE = { :papers\\phung\\bui_phung_venkatesh_aaai04.pdf:PDF },    GROUP = { Statistics, Hierarchical Hidden Markov Models (HMM,HHMM) },    OWNER = { 184698H },    TIMESTAMP = { 2010.08.11 },    URL = { http://www.aaai.org/Papers/AAAI/2004/AAAI04-052.pdf },} C
 2019
 Learning Generative Adversarial Networks from Multiple Data Sources Trung Le, Quan Hoang, Hung Vu, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI), aug 2019. [ | ] Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair. @INPROCEEDINGS { le_etal_ijcai19_learningGAN,    AUTHOR = { Trung Le and Quan Hoang and Hung Vu and Tu Dinh Nguyen and Hung Bui and Dinh Phung },    TITLE = { Learning Generative Adversarial Networks from Multiple Data Sources },    BOOKTITLE = { Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI) },    YEAR = { 2019 },    MONTH = { aug },    ABSTRACT = { Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair. },} C
 Three-Player Wasserstein GAN via Amortised Duality Nhan Dam, Quan Hoang, Trung Le, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI), aug 2019. [ | ] We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method. @INPROCEEDINGS { dam_etal_ijcai19_3pwgan,    AUTHOR = { Nhan Dam and Quan Hoang and Trung Le and Tu Dinh Nguyen and Hung Bui and Dinh Phung },    TITLE = { Three-Player {W}asserstein {GAN} via Amortised Duality },    BOOKTITLE = { Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI) },    YEAR = { 2019 },    MONTH = { aug },    ABSTRACT = { We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method. },} C
 Learning How to Active Learn by Dreaming Thuy-Trang Vu, Ming Liu, Dinh Phung and Gholamreza Haffari. In In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italia, jul 2019. [ | ] @INPROCEEDINGS { vu_etal_acl19_learning,    AUTHOR = { Thuy-Trang Vu and Ming Liu and Dinh Phung and Gholamreza Haffari },    TITLE = { Learning How to Active Learn by Dreaming },    BOOKTITLE = { In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL) },    YEAR = { 2019 },    ADDRESS = { Florence, Italia },    MONTH = { jul },} C
 Deep Domain Adaptation for Vulnerable Code Function Identification Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier DeVel, Paul Montague, Lizhen Qu and Dinh Phung. In Int. Joint Conf. on Neural Networks (IJCNN), 2019. [ | ] Due to the ubiquity of computer software, software vulnerability detection (SVD) has become crucial in the software industry and in the field of computer security. Two significant issues in SVD arise when using machine learning, namely: i) how to learn automatic features that can help improve the predictive performance of vulnerability detection and ii) how to overcome the scarcity of labeled vulnerabilities in projects that require the laborious labeling of code by software security experts. In this paper, we address these two crucial concerns by proposing a novel architecture which leverages deep domain adaptation with automatic feature learning for software vulnerability identification. Based on this architecture, we keep the principles and reapply the state-of-the-art deep domain adaptation methods to indicate that deep domain adaptation for SVD is plausible and promising. Moreover, we further propose a novel method named Semi-supervised Code Domain Adaptation Network (SCDAN) that can efficiently utilize and exploit information carried in unlabeled target data by considering them as the unlabeled portion in a semi-supervised learning context. The proposed SCDAN method enforces the clustering assumption, which is a key principle in semi-supervised learning. The experimental results using six real-world software project datasets show that our SCDAN method and the baselines using our architecture have better predictive performance by a wide margin compared with the Deep Code Network (VulDeePecker) method without domain adaptation. Also, the proposed SCDAN significantly outperforms the DIRT-T which to the best of our knowledge is currently the-state-of-the-art method in deep domain adaptation and other baselines. @INPROCEEDINGS { van_etal_ijcnn19_deepdomain,    AUTHOR = { Van Nguyen and Trung Le and Tue Le and Khanh Nguyen and Olivier DeVel and Paul Montague and Lizhen Qu and Dinh Phung },    TITLE = { Deep Domain Adaptation for Vulnerable Code Function Identification },    BOOKTITLE = { Int. Joint Conf. on Neural Networks (IJCNN) },    YEAR = { 2019 },    ABSTRACT = { Due to the ubiquity of computer software, software vulnerability detection (SVD) has become crucial in the software industry and in the field of computer security. Two significant issues in SVD arise when using machine learning, namely: i) how to learn automatic features that can help improve the predictive performance of vulnerability detection and ii) how to overcome the scarcity of labeled vulnerabilities in projects that require the laborious labeling of code by software security experts. In this paper, we address these two crucial concerns by proposing a novel architecture which leverages deep domain adaptation with automatic feature learning for software vulnerability identification. Based on this architecture, we keep the principles and reapply the state-of-the-art deep domain adaptation methods to indicate that deep domain adaptation for SVD is plausible and promising. Moreover, we further propose a novel method named Semi-supervised Code Domain Adaptation Network (SCDAN) that can efficiently utilize and exploit information carried in unlabeled target data by considering them as the unlabeled portion in a semi-supervised learning context. The proposed SCDAN method enforces the clustering assumption, which is a key principle in semi-supervised learning. The experimental results using six real-world software project datasets show that our SCDAN method and the baselines using our architecture have better predictive performance by a wide margin compared with the Deep Code Network (VulDeePecker) method without domain adaptation. Also, the proposed SCDAN significantly outperforms the DIRT-T which to the best of our knowledge is currently the-state-of-the-art method in deep domain adaptation and other baselines. },    FILE = { :van_etal_ijcnn19_deepdomain - Deep Domain Adaptation for Vulnerable Code Function Identification.pdf:PDF },} C
 A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, USA, jun 2019. [ | | pdf] In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. @INPROCEEDINGS { nguyen_etal_naaclhtl19_acapsule,    AUTHOR = { Dai Quoc Nguyen and Thanh Vu and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },    TITLE = { A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization },    BOOKTITLE = { In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL) },    YEAR = { 2019 },    ADDRESS = { Minneapolis, USA },    MONTH = { jun },    ABSTRACT = { In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. },    FILE = { :nguyen_etal_naaclhtl19_acapsule - A Capsule Network Based Embedding Model for Knowledge Graph Completion and Search Personalization.pdf:PDF },    URL = { https://arxiv.org/abs/1808.04122 },} C
 Probabilistic Multilevel Clustering via Composite Transportation Distance Viet Huynh, Nhat Ho, Dinh Phung and Michael I. Jordan. In In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan, apr 2019. [ | | pdf] We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach. @INPROCEEDINGS { ho_etal_aistats19_probabilistic,    AUTHOR = { Viet Huynh and Nhat Ho and Dinh Phung and Michael I. Jordan },    TITLE = { Probabilistic Multilevel Clustering via Composite Transportation Distance },    BOOKTITLE = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },    YEAR = { 2019 },    ADDRESS = { Okinawa, Japan },    MONTH = { apr },    ABSTRACT = { We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach. },    FILE = { :ho_etal_aistats19_probabilistic - Probabilistic Multilevel Clustering Via Composite Transportation Distance.pdf:PDF },    JOURNAL = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },    URL = { https://arxiv.org/abs/1810.11911 },} C
 Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel and Lizhen Qu. In International Conference on Learning Representations (ICLR), 2019. [ | | pdf] @INPROCEEDINGS { le_etal_iclr18_maximal,    AUTHOR = { Tue Le and Tuan Nguyen and Trung Le and Dinh Phung and Paul Montague and Olivier De Vel and Lizhen Qu },    TITLE = { Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection },    BOOKTITLE = { International Conference on Learning Representations (ICLR) },    YEAR = { 2019 },    FILE = { :le_etal_iclr18_maximal - Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection.pdf:PDF },    URL = { https://openreview.net/forum?id=ByloIiCqYQ },} C
 Robust Anomaly Detection in Videos using Multilevel Representations Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, USA, 2019. [ | | pdf] @INPROCEEDINGS { vu_etal_aaai19_robustanomaly,    AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung },    TITLE = { Robust Anomaly Detection in Videos using Multilevel Representations },    BOOKTITLE = { In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI) },    YEAR = { 2019 },    ADDRESS = { Honolulu, USA },    FILE = { :vu_etal_aaai19_robustanomaly - Robust Anomaly Detection in Videos Using Multilevel Representations.pdf:PDF },    GROUPS = { Anomaly Detection },    URL = { https://github.com/SeaOtter/vad_gan },} C
 Usefulness of Wearable Cameras as a Tool to Enhance Chronic Disease Self-Management: Scoping Review Ralph Maddison, Susie Cartledge, Michelle Rogerson, Nicole Sylvia Goedhart, Tarveen Ragbir Singh, Christopher Neil, Dinh Phung and Kylie Ball. JMIR Mhealth Uhealth, 7(1):e10371, Jan 2019. [ | | pdf] Background: Self-management is a critical component of chronic disease management and can include a host of activities, such as adhering to prescribed medications, undertaking daily care activities, managing dietary intake and body weight, and proactively contacting medical practitioners. The rise of technologies (mobile phones, wearable cameras) for health care use offers potential support for people to better manage their disease in collaboration with their treating health professionals. Wearable cameras can be used to provide rich contextual data and insight into everyday activities and aid in recall. This information can then be used to prompt memory recall or guide the development of interventions to support self-management. Application of wearable cameras to better understand and augment self-management by people with chronic disease has yet to be investigated. Objective: The objective of our review was to ascertain the scope of the literature on the use of wearable cameras for self-management by people with chronic disease and to determine the potential of wearable cameras to assist people to better manage their disease. Methods: We conducted a scoping review, which involved a comprehensive electronic literature search of 9 databases in July 2017. The search strategy focused on studies that used wearable cameras to capture one or more modifiable lifestyle risk factors associated with chronic disease or to capture typical self-management behaviors, or studies that involved a chronic disease population. We then categorized and described included studies according to their characteristics (eg, behaviors measured, study design or type, characteristics of the sample). Results: We identified 31 studies: 25 studies involved primary or secondary data analysis, and 6 were review, discussion, or descriptive articles. Wearable cameras were predominantly used to capture dietary intake, physical activity, activities of daily living, and sedentary behavior. Populations studied were predominantly healthy volunteers, school students, and sports people, with only 1 study examining an intervention using wearable cameras for people with an acquired brain injury. Most studies highlighted technical or ethical issues associated with using wearable cameras, many of which were overcome. Conclusions: This scoping review highlighted the potential of wearable cameras to capture health-related behaviors and risk factors of chronic disease, such as diet, exercise, and sedentary behaviors. Data collected from wearable cameras can be used as an adjunct to traditional data collection methods such as self-reported diaries in addition to providing valuable contextual information. While most studies to date have focused on healthy populations, wearable cameras offer promise to better understand self-management of chronic disease and its context. @ARTICLE { maddison_etal_jmir19_usefulness,    AUTHOR = { Ralph Maddison and Susie Cartledge and Michelle Rogerson and Nicole Sylvia Goedhart and Tarveen Ragbir Singh and Christopher Neil and Dinh Phung and Kylie Ball },    TITLE = { Usefulness of Wearable Cameras as a Tool to Enhance Chronic Disease Self-Management: Scoping Review },    JOURNAL = { JMIR Mhealth Uhealth },    YEAR = { 2019 },    VOLUME = { 7 },    NUMBER = { 1 },    PAGES = { e10371 },    MONTH = { Jan },    ISSN = { 2291-5222 },    ABSTRACT = { Background: Self-management is a critical component of chronic disease management and can include a host of activities, such as adhering to prescribed medications, undertaking daily care activities, managing dietary intake and body weight, and proactively contacting medical practitioners. The rise of technologies (mobile phones, wearable cameras) for health care use offers potential support for people to better manage their disease in collaboration with their treating health professionals. Wearable cameras can be used to provide rich contextual data and insight into everyday activities and aid in recall. This information can then be used to prompt memory recall or guide the development of interventions to support self-management. Application of wearable cameras to better understand and augment self-management by people with chronic disease has yet to be investigated. Objective: The objective of our review was to ascertain the scope of the literature on the use of wearable cameras for self-management by people with chronic disease and to determine the potential of wearable cameras to assist people to better manage their disease. Methods: We conducted a scoping review, which involved a comprehensive electronic literature search of 9 databases in July 2017. The search strategy focused on studies that used wearable cameras to capture one or more modifiable lifestyle risk factors associated with chronic disease or to capture typical self-management behaviors, or studies that involved a chronic disease population. We then categorized and described included studies according to their characteristics (eg, behaviors measured, study design or type, characteristics of the sample). Results: We identified 31 studies: 25 studies involved primary or secondary data analysis, and 6 were review, discussion, or descriptive articles. Wearable cameras were predominantly used to capture dietary intake, physical activity, activities of daily living, and sedentary behavior. Populations studied were predominantly healthy volunteers, school students, and sports people, with only 1 study examining an intervention using wearable cameras for people with an acquired brain injury. Most studies highlighted technical or ethical issues associated with using wearable cameras, many of which were overcome. Conclusions: This scoping review highlighted the potential of wearable cameras to capture health-related behaviors and risk factors of chronic disease, such as diet, exercise, and sedentary behaviors. Data collected from wearable cameras can be used as an adjunct to traditional data collection methods such as self-reported diaries in addition to providing valuable contextual information. While most studies to date have focused on healthy populations, wearable cameras offer promise to better understand self-management of chronic disease and its context. },    DAY = { 03 },    DOI = { 10.2196/10371 },    FILE = { :ralph_etal_jmir19_usefulness - Usefulness of Wearable Cameras As a Tool to Enhance Chronic Disease Self Management_ Scoping Review.pdf:PDF },    KEYWORDS = { eHealth; review; cameras; life-logging; lifestyle behavior; chronic disease },    URL = { https://mhealth.jmir.org/2019/1/e10371/ },} J
 2018
 Model-Based Learning for Point Pattern Data Ba-Ngu Vo, Nhan Dam, Dinh Phung, Quang N. Tran and Ba-Tuong Vo. Pattern Recognition (PR), 84:136-151, 2018. [ | | pdf] This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. @ARTICLE { vo_etal_pr18_modelbased,    AUTHOR = { Ba-Ngu Vo and Nhan Dam and Dinh Phung and Quang N. Tran and Ba-Tuong Vo },    TITLE = { Model-Based Learning for Point Pattern Data },    JOURNAL = { Pattern Recognition (PR) },    YEAR = { 2018 },    VOLUME = { 84 },    PAGES = { 136--151 },    ISSN = { 0031-3203 },    ABSTRACT = { This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. },    DOI = { https://doi.org/10.1016/j.patcog.2018.07.008 },    FILE = { :vo_etal_pr18_modelbased - Model Based Learning for Point Pattern Data.pdf:PDF },    KEYWORDS = { Point pattern, Point process, Random finite set, Multiple instance learning, Classification, Novelty detection, Clustering },    PUBLISHER = { Elsevier },    URL = { http://www.sciencedirect.com/science/article/pii/S0031320318302395 },} J
 Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data Khanh Nguyen, Trung Le, Tu Nguyen, Geoff Webb and Dinh Phung. In Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), London, UK, aug 2018. [ | ] Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to eﬀectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals. @INPROCEEDINGS { nguyen_etal_kdd18_robustbayesian,    AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Geoff Webb and Dinh Phung },    TITLE = { Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data },    BOOKTITLE = { Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD) },    YEAR = { 2018 },    ADDRESS = { London, UK },    MONTH = { aug },    PUBLISHER = { ACM },    ABSTRACT = { Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to eﬀectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals. },    FILE = { :nguyen_etal_kdd18_robustbayesian - Robust Bayesian Kernel Machine Via Stein Variational Gradient Descent for Big Data.pdf:PDF },} C
 MGAN: Training Generative Adversarial Nets with Multiple Generators Quan Hoang, Tu Dinh Nguyen, Trung Le and Dinh Phung. In International Conference on Learning Representations (ICLR), 2018. [ | | pdf] We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. @INPROCEEDINGS { hoang_etal_iclr18_mgan,    AUTHOR = { Quan Hoang and Tu Dinh Nguyen and Trung Le and Dinh Phung },    TITLE = { {MGAN}: Training Generative Adversarial Nets with Multiple Generators },    BOOKTITLE = { International Conference on Learning Representations (ICLR) },    YEAR = { 2018 },    ABSTRACT = { We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. },    FILE = { :hoang_etal_iclr18_mgan - MGAN_ Training Generative Adversarial Nets with Multiple Generators.pdf:PDF },    URL = { https://openreview.net/forum?id=rkmu5b0a- },} C
 Geometric enclosing networks Trung Le, Hung Vu, Tu Dinh Nguyen and Dinh Phung. In Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI), jul 2018. [ | ] Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. @INPROCEEDINGS { le_etal_ijcai18_geometric,    AUTHOR = { Trung Le and Hung Vu and Tu Dinh Nguyen and Dinh Phung },    TITLE = { Geometric enclosing networks },    BOOKTITLE = { Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI) },    YEAR = { 2018 },    MONTH = { jul },    ABSTRACT = { Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. },    FILE = { :le_etal_ijcai18_geometric - Geometric Enclosing Networks.pdf:PDF },} C
 A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018. [ | | pdf] We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. @INPROCEEDINGS { nguyen_etal_naacl18_anovelembedding,    AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },    TITLE = { A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network },    BOOKTITLE = { Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) },    YEAR = { 2018 },    ABSTRACT = { We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. },    FILE = { :nguyen_etal_naacl18_anovelembedding - A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network.pdf:PDF },    URL = { https://arxiv.org/abs/1712.02121 },} C
 Text Generation with Deep Variational GAN Mahmoud Hossam, Trung Le, Michael Papasimeon, Viet Huynh and Dinh Phung. In 32nd Neural Information Processing System (NIPS) Workshop on Bayesian Deep Learning, 2018. [ | ] Generating realistic sequences is a central task in many machine learning appli-cations. There has been considerable recent progress on building deep generativemodels for sequence generation tasks. However, the issue of mode-collapsingremains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem of mode-collapse in a principledapproach. We change the standard GAN objective to maximize a variationallower-bound of the log-likelihood while minimizing the Jensen-Shanon diver-gence between data and model distributions. We experiment our model with textgeneration task and show that it can generate realistic text with high diversity. @INPROCEEDINGS { hossam_etal_bdl18_textgeneration,    AUTHOR = { Mahmoud Hossam and Trung Le and Michael Papasimeon and Viet Huynh and Dinh Phung },    TITLE = { Text Generation with Deep Variational {GAN} },    BOOKTITLE = { 32nd Neural Information Processing System (NIPS) Workshop on Bayesian Deep Learning },    YEAR = { 2018 },    ABSTRACT = { Generating realistic sequences is a central task in many machine learning appli-cations. There has been considerable recent progress on building deep generativemodels for sequence generation tasks. However, the issue of mode-collapsingremains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem of mode-collapse in a principledapproach. We change the standard GAN objective to maximize a variationallower-bound of the log-likelihood while minimizing the Jensen-Shanon diver-gence between data and model distributions. We experiment our model with textgeneration task and show that it can generate realistic text with high diversity. },    FILE = { :hossam_etal_bdl18_textgeneration - Text Generation with Deep Variational GAN.pdf:PDF },} C
 Batch-normalized Deep Boltzmann Machines Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Asian Conference on Machine Learning (ACML), Beijing, China, 2018. [ | ] @INPROCEEDINGS { vu_etal_acml18_batchnormalized,    AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung },    TITLE = { Batch-normalized Deep {Boltzmann} Machines },    BOOKTITLE = { In Proceedings of Asian Conference on Machine Learning (ACML) },    YEAR = { 2018 },    ADDRESS = { Beijing, China },    OWNER = { hungv },    TIMESTAMP = { 2018.03.22 },} C
 Clustering Induced Kernel Learning Nguyen, Khanh, Dam, Nhan, Le, Trung, Nguyen, {Tu Dinh} and Phung, Dinh. In Proc. of the 10th Asian Conference on Machine Learning (ACML), pages 129-144, 14--16 Nov 2018. [ | | pdf] Learning rich and expressive kernel functions is a challenging task in kernel-based supervised learning. Multiple kernel learning (MKL) approach addresses this problem by combining a mixed variety of kernels and letting the optimization solver choose the most appropriate combination. However, most of existing methods are parametric in the sense that they require a predefined list of kernels. Hence, there appears a substantial trade-off between computation and the modeling risk of not being able to explore more expressive and suitable kernel functions. Moreover, current existing approaches to combine kernels cannot exploit clustering structure carried in data, especially when data are heterogeneous. In this work, we present a new framework that leverages Bayesian nonparametric models (i.e, automatically grow kernel functions) with multiple kernel learning to develop a new framework that enjoys the nonparametric flavor in the context of multiple kernel learning. In particular, we propose \emph{Clustering Induced Kernel Learning} (CIK) method that can automatically discover clustering structure from the data and train a single kernel machine to fit data in each discovered cluster simultaneously. The outcome of our proposed method includes both clustering analysis and multiple kernel classifier for a given dataset. We conduct extensive experiments on several benchmark datasets. The experimental results show that our method can improve classification and clustering performance when datasets have complex clustering structure with different preferred kernels. @INPROCEEDINGS { nguyen_etal_acml18_clustering,    AUTHOR = { Nguyen, Khanh and Dam, Nhan and Le, Trung and Nguyen, {Tu Dinh} and Phung, Dinh },    TITLE = { Clustering Induced Kernel Learning },    BOOKTITLE = { Proc. of the 10th Asian Conference on Machine Learning (ACML) },    YEAR = { 2018 },    EDITOR = { Zhu, Jun and Takeuchi, Ichiro },    VOLUME = { 95 },    SERIES = { Proceedings of Machine Learning Research },    PAGES = { 129--144 },    MONTH = { 14--16 Nov },    PUBLISHER = { PMLR },    ABSTRACT = { Learning rich and expressive kernel functions is a challenging task in kernel-based supervised learning. Multiple kernel learning (MKL) approach addresses this problem by combining a mixed variety of kernels and letting the optimization solver choose the most appropriate combination. However, most of existing methods are parametric in the sense that they require a predefined list of kernels. Hence, there appears a substantial trade-off between computation and the modeling risk of not being able to explore more expressive and suitable kernel functions. Moreover, current existing approaches to combine kernels cannot exploit clustering structure carried in data, especially when data are heterogeneous. In this work, we present a new framework that leverages Bayesian nonparametric models (i.e, automatically grow kernel functions) with multiple kernel learning to develop a new framework that enjoys the nonparametric flavor in the context of multiple kernel learning. In particular, we propose \emph{Clustering Induced Kernel Learning} (CIK) method that can automatically discover clustering structure from the data and train a single kernel machine to fit data in each discovered cluster simultaneously. The outcome of our proposed method includes both clustering analysis and multiple kernel classifier for a given dataset. We conduct extensive experiments on several benchmark datasets. The experimental results show that our method can improve classification and clustering performance when datasets have complex clustering structure with different preferred kernels. },    FILE = { :nguyen_etal_acml18_clustering - Clustering Induced Kernel Learning.pdf:PDF;nguyen18a.pdf:http\://proceedings.mlr.press/v95/nguyen18a/nguyen18a.pdf:PDF },    URL = { http://proceedings.mlr.press/v95/nguyen18a.html },} C
 LTARM: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment Dang Nguyen, Wei Luo, Dinh Phung and Svetha Venkatesh. Knowledge-Based Systems, 2018. [ | ] Cancer is a worldwide problem and one of the leading causes of death. Increasing prevalence of cancer, particularly in developing countries, demands better understandings of the effectiveness and adverse consequences of different cancer treatment regimes in real patient populations. Current understandings of cancer treatment toxicities are often derived from either “clean” patient cohorts or coarse population statistics. Thus, it is difficult to get up-to-date and local assessments of treatment toxicities for specific cancer centers. To address these problems, we propose a novel and efficient method for discovering toxicity progression patterns in the form of temporal association rules (TARs). A temporal association rule is defined as a rule where the diagnosis codes in the right hand side (e.g., a combination of toxicities/complications) are temporally occurred after the diagnosis codes in the left hand side (e.g., a particular type of cancer treatment). Our method develops a lattice structure to efficiently discover TARs. More specifically, the lattice structure is first constructed to store all frequent diagnosis codes in the dataset. It is then traversed using the paternity relations among nodes to generate TARs. Our extensive experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the temporal comorbidity analysis. In addition, our method significantly outperforms existing methods for mining TARs in terms of runtime. @ARTICLE { nguyen_kbs18_ltarm,    AUTHOR = { Dang Nguyen and Wei Luo and Dinh Phung and Svetha Venkatesh },    TITLE = { {LTARM}: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment },    JOURNAL = { Knowledge-Based Systems },    YEAR = { 2018 },    ABSTRACT = { Cancer is a worldwide problem and one of the leading causes of death. Increasing prevalence of cancer, particularly in developing countries, demands better understandings of the effectiveness and adverse consequences of different cancer treatment regimes in real patient populations. Current understandings of cancer treatment toxicities are often derived from either “clean” patient cohorts or coarse population statistics. Thus, it is difficult to get up-to-date and local assessments of treatment toxicities for specific cancer centers. To address these problems, we propose a novel and efficient method for discovering toxicity progression patterns in the form of temporal association rules (TARs). A temporal association rule is defined as a rule where the diagnosis codes in the right hand side (e.g., a combination of toxicities/complications) are temporally occurred after the diagnosis codes in the left hand side (e.g., a particular type of cancer treatment). Our method develops a lattice structure to efficiently discover TARs. More specifically, the lattice structure is first constructed to store all frequent diagnosis codes in the dataset. It is then traversed using the paternity relations among nodes to generate TARs. Our extensive experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the temporal comorbidity analysis. In addition, our method significantly outperforms existing methods for mining TARs in terms of runtime. },    DOI = { https://doi.org/10.1016/j.knosys.2018.07.031 },    FILE = { :nguyen_kbs18_ltarm - LTARM_ a Novel Temporal Association Rule Mining Method to Understand Toxicities in a Routine Cancer Treatment.pdf:PDF },} J
 Jointly predicting affective and mental health scores using deep neural networks of visual cues on the Web Hung Nguyen, Van Nguyen, Thin Nguyen, Mark Larsen, Bridianne O'Dea, Duc Thanh Nguyen, Trung Le, Dinh Phung, Svetha Venkatesh and Helen Christensen. In Proc. of the Int. Conf. on Web Information Systems Engineering (WISE)Springer, , 2018. [ | ] Despite the range of studies examining the relationship between mental health and social media data, not all prior studies have validated the social media markers against “ground truth”, or validated psychiatric information, in general community samples. Instead, researchers have approximated psychiatric diagnosis using user statements such as “I have been diagnosed as X”. Without “ground truth”, the value of predictive algorithms is highly questionable and potentially harmful. In addition, for social media data, whilst linguistic features have been widely identified as strong markers of mental health disorders, little is known about non-textual features on their links with the disorders. The current work is a longitudinal study during which participants’ mental health data, consisting of depression and anxiety scores, were collected fortnightly with a validated, diagnostic, clinical measure. Also, datasets with labels relevant to mental health scores, such as emotional scores, are also employed to improve the performance in prediction of mental health scores. This work introduces a deep neural network-based method integrating sub-networks on predicting affective scores and mental health outcomes from images. Experimental results have shown that in the both predictions of emotion and mental health scores, (1) deep features majorly outperform handcrafted ones and (2) the proposed network achieves better performance compared with separate networks. @INCOLLECTION { nguyen_etal_wise18_jointly,    AUTHOR = { Hung Nguyen and Van Nguyen and Thin Nguyen and Mark Larsen and Bridianne O'Dea and Duc Thanh Nguyen and Trung Le and Dinh Phung and Svetha Venkatesh and Helen Christensen },    TITLE = { Jointly predicting affective and mental health scores using deep neural networks of visual cues on the Web },    BOOKTITLE = { Proc. of the Int. Conf. on Web Information Systems Engineering (WISE) },    PUBLISHER = { Springer },    YEAR = { 2018 },    SERIES = { Lecture Notes in Computer Science },    ABSTRACT = { Despite the range of studies examining the relationship between mental health and social media data, not all prior studies have validated the social media markers against “ground truth”, or validated psychiatric information, in general community samples. Instead, researchers have approximated psychiatric diagnosis using user statements such as “I have been diagnosed as X”. Without “ground truth”, the value of predictive algorithms is highly questionable and potentially harmful. In addition, for social media data, whilst linguistic features have been widely identified as strong markers of mental health disorders, little is known about non-textual features on their links with the disorders. The current work is a longitudinal study during which participants’ mental health data, consisting of depression and anxiety scores, were collected fortnightly with a validated, diagnostic, clinical measure. Also, datasets with labels relevant to mental health scores, such as emotional scores, are also employed to improve the performance in prediction of mental health scores. This work introduces a deep neural network-based method integrating sub-networks on predicting affective scores and mental health outcomes from images. Experimental results have shown that in the both predictions of emotion and mental health scores, (1) deep features majorly outperform handcrafted ones and (2) the proposed network achieves better performance compared with separate networks. },    FILE = { :nguyen_etal_wise18_jointly - Jointly Predicting Affective and Mental Health Scores Using Deep Neural Networks of Visual Cues on the Web.pdf:PDF },    LANGUAGE = { English },    OWNER = { thinng },    TIMESTAMP = { 2017.08.28 },} BC
 Learning Graph Representation via Frequent Subgraphs Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In Proc. of SIAM Int. Conf. on Data Mining (SDM), 2018. [ | ] @INPROCEEDINGS { nguyen_etal_sdm18_learning,    AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung },    TITLE = { Learning Graph Representation via Frequent Subgraphs },    BOOKTITLE = { Proc. of SIAM Int. Conf. on Data Mining (SDM) },    YEAR = { 2018 },    PUBLISHER = { SIAM },    FILE = { :nguyen_etal_sdm18_learning - Learning Graph Representation Via Frequent Subgraphs.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2018.01.12 },} C
 Sqn2Vec: Learning Sequence Representation via Sequential Patterns with a Gap Constraint Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In ECML-PKDD, 2018. (Runner-up Best Student Machine Leaning Paper Award). [ | ] When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization. @INPROCEEDINGS { nguyen_etal_ecml18_sqn2vec,    AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung },    TITLE = { {Sqn2Vec}: Learning Sequence Representation via Sequential Patterns with a Gap Constraint },    BOOKTITLE = { ECML-PKDD },    YEAR = { 2018 },    NOTE = { Runner-up Best Student Machine Leaning Paper Award },    ABSTRACT = { When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization. },    FILE = { :nguyen_etal_ecml18_sqn2vec - Sqn2Vec_ Learning Sequence Representation Via Sequential Patterns with a Gap Constraint.pdf:PDF },} C
 A Convolutional Neural Network-based Model for Knowledge Base Completion and Its Application to Search Personalization Dai Quoc Nguyen, Dat Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. Semantic Web journal (SWJ), 2018. [ | | pdf] In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB obtains better link prediction and triple classification results than previous state-of-the-art models on benchmark datasets WN18RR, FB15k-237, WN11 and FB13. We further apply our ConvKB to search personalization problem which aims to tailor the search results to each specific user based on the user's personal interests and preferences. In particular, we model the potential relationship between the submitted query, the user and the search result (i.e., document) as a triple \textit(query, user, document) on which the ConvKB is able to work. Experimental results on query logs from a commercial web search engine show that ConvKB achieves better performances than the standard ranker as well as up-to-date search personalization baselines. @ARTICLE { nguyen_etal_swj18_convolutional,    AUTHOR = { Dai Quoc Nguyen and Dat Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung },    TITLE = { A Convolutional Neural Network-based Model for Knowledge Base Completion and Its Application to Search Personalization },    JOURNAL = { Semantic Web journal (SWJ) },    YEAR = { 2018 },    ABSTRACT = { In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB obtains better link prediction and triple classification results than previous state-of-the-art models on benchmark datasets WN18RR, FB15k-237, WN11 and FB13. We further apply our ConvKB to search personalization problem which aims to tailor the search results to each specific user based on the user's personal interests and preferences. In particular, we model the potential relationship between the submitted query, the user and the search result (i.e., document) as a triple \textit(query, user, document) on which the ConvKB is able to work. Experimental results on query logs from a commercial web search engine show that ConvKB achieves better performances than the standard ranker as well as up-to-date search personalization baselines. },    FILE = { :nguyen_etal_swj18_convolutional - A Convolutional Neural Network Based Model for Knowledge Base Completion and Its Application to Search Personalization.pdf:PDF },    URL = { http://www.semantic-web-journal.net/system/files/swj1867.pdf },} J
 GoGP: Scalable Geometric-based Gaussian Process for Online Regression Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. Knowledge and Information Systems (KAIS), may 2018. [ | ] One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. @ARTICLE { le_etal_kais18_gogp,    AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung },    TITLE = { {GoGP}: Scalable Geometric-based Gaussian Process for Online Regression },    JOURNAL = { Knowledge and Information Systems (KAIS) },    YEAR = { 2018 },    MONTH = { may },    ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. },    FILE = { :le_etal_kais18_gogp - GoGP_ Scalable Geometric Based Gaussian Process for Online Regression.pdf:PDF },} J
 Effective Identification of Similar Patients through Sequential Matching over ICD Code Embedding Dang Nguyen, Wei Luo, Svetha Venkatesh and Dinh Phung. Journal of Medical Systems (JMS), 2018. [ | ] Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data. @ARTICLE { nguyen_etal_jms18_effective,    AUTHOR = { Dang Nguyen and Wei Luo and Svetha Venkatesh and Dinh Phung },    TITLE = { Effective Identification of Similar Patients through Sequential Matching over ICD Code Embedding },    JOURNAL = { Journal of Medical Systems (JMS) },    YEAR = { 2018 },    ABSTRACT = { Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data. },    FILE = { :nguyen_etal_jms18_effective - Effective Identification of Similar Patients through Sequential Matching Over ICD Code Embedding.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2018.03.29 },} J
 Bayesian Multi-Hyperplane Machine for Pattern Recognition Khanh Nguyen, Trung Le, Tu Nguyen and Dinh Phung. In Proc. of the 24th International Conference on Pattern Recognition (ICPR), Beijing, China, aug 2018. [ | ] Current existing multi-hyperplane machine approach deals with high-dimensional and complex datasets by approximating the input data region using a parametric mixture of hyperplanes. Consequently, this approach requires an excessively time-consuming parameter search to find the set of optimal hyper-parameters. Another serious drawback of this approach is that it is often suboptimal since the optimal choice for the hyper-parameter is likely to lie outside the searching space due to the space discretization step required in grid search. To address these challenges, we propose in this paper BAyesian Multi-hyperplane Machine (BAMM). Our approach departs from a Bayesian perspective, and aims to construct an alternative probabilistic view in such a way that its maximuma-posteriori (MAP) estimation reduces exactly to the original optimization problem of a multi-hyperplane machine. This view allows us to endow prior distributions over hyper-parameters and augment auxiliary variables to efficiently infer model parameters and hyper-parameters via Markov chain Monte Carlo (MCMC) method. We then employ a Stochastic Gradient Descent (SGD) framework to scale our model up with ever-growing large datasets. Extensive experiments demonstrate the capability of our proposed method in learning the optimal model without using any parameter tuning, and in achieving comparable accuracies compared with the state-of-art baselines; in the meantime our model can seamlessly handle with large-scale datasets. @INPROCEEDINGS { nguyen_etal_icpr18_bayesian,    AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Dinh Phung },    TITLE = { Bayesian Multi-Hyperplane Machine for Pattern Recognition },    BOOKTITLE = { Proc. of the 24th International Conference on Pattern Recognition (ICPR) },    YEAR = { 2018 },    ADDRESS = { Beijing, China },    MONTH = { aug },    ABSTRACT = { Current existing multi-hyperplane machine approach deals with high-dimensional and complex datasets by approximating the input data region using a parametric mixture of hyperplanes. Consequently, this approach requires an excessively time-consuming parameter search to find the set of optimal hyper-parameters. Another serious drawback of this approach is that it is often suboptimal since the optimal choice for the hyper-parameter is likely to lie outside the searching space due to the space discretization step required in grid search. To address these challenges, we propose in this paper BAyesian Multi-hyperplane Machine (BAMM). Our approach departs from a Bayesian perspective, and aims to construct an alternative probabilistic view in such a way that its maximuma-posteriori (MAP) estimation reduces exactly to the original optimization problem of a multi-hyperplane machine. This view allows us to endow prior distributions over hyper-parameters and augment auxiliary variables to efficiently infer model parameters and hyper-parameters via Markov chain Monte Carlo (MCMC) method. We then employ a Stochastic Gradient Descent (SGD) framework to scale our model up with ever-growing large datasets. Extensive experiments demonstrate the capability of our proposed method in learning the optimal model without using any parameter tuning, and in achieving comparable accuracies compared with the state-of-art baselines; in the meantime our model can seamlessly handle with large-scale datasets. },    FILE = { :nguyen_etal_icpr18_bayesian - Bayesian Multi Hyperplane Machine for Pattern Recognition.pdf:PDF },} C
 2017
 Dual Discriminator Generative Adversarial Nets Tu Dinh Nguyen, Trung Le, Hung Vu and Dinh Phung. In Advances in Neural Information Processing Systems 29 (NIPS), 2017. [ | ] We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database. @INPROCEEDINGS { tu_etal_nips17_d2gan,    AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Vu and Dinh Phung },    TITLE = { Dual Discriminator Generative Adversarial Nets },    BOOKTITLE = { Advances in Neural Information Processing Systems 29 (NIPS) },    YEAR = { 2017 },    ABSTRACT = { We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database. },    FILE = { :tu_etal_nips17_d2gan - Dual Discriminator Generative Adversarial Nets.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.09.06 },} C
 GoGP: Fast Online Regression with Gaussian Processes Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. In International Conference on Data Mining (ICDM), 2017. [ | ] One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. @INPROCEEDINGS { le_etal_icdm17_gogp,    AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung },    TITLE = { {GoGP}: Fast Online Regression with Gaussian Processes },    BOOKTITLE = { International Conference on Data Mining (ICDM) },    YEAR = { 2017 },    ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. },    FILE = { :le_etal_icdm17_gogp - GoGP_ Fast Online Regression with Gaussian Processes.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.09.01 },} C
 Supervised Restricted Boltzmann Machines Tu Dinh Nguyen, Dinh Phung, Viet Huynh and Trung Le. In In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2017. [ | | pdf] We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation. @INPROCEEDINGS { nguyen_etal_uai17supervised,    AUTHOR = { Tu Dinh Nguyen and Dinh Phung and Viet Huynh and Trung Le },    TITLE = { Supervised Restricted Boltzmann Machines },    BOOKTITLE = { In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI) },    YEAR = { 2017 },    ABSTRACT = { We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation. },    FILE = { :nguyen_etal_uai17supervised - Supervised Restricted Boltzmann Machines.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.08.29 },    URL = { http://auai.org/uai2017/proceedings/papers/106.pdf },} C
 Multilevel clustering via Wasserstein means Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, Viet Huynh and Dinh Phung. In Proc. of ICML (ICML), 2017. [ | | pdf] We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. @INPROCEEDINGS { ho_etal_icml17multilevel,    AUTHOR = { Nhat Ho and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Viet Huynh and Dinh Phung },    TITLE = { Multilevel clustering via Wasserstein means },    BOOKTITLE = { Proc. of ICML (ICML) },    YEAR = { 2017 },    ABSTRACT = { We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. },    FILE = { :ho_etal_icml17multilevel - Multilevel Clustering Via Wasserstein Means.pdf:PDF },    URL = { http://proceedings.mlr.press/v70/ho17a.html },} C
 Approximation Vector Machines for Large-scale Online Learning Trung Le, Tu Dinh Nguyen, Vu Nguyen and Dinh Q. Phung. Journal of Machine Learning Research (JMLR), 2017. [ | | pdf] One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. @ARTICLE { le_etal_jmlr17approximation,    AUTHOR = { Trung Le and Tu Dinh Nguyen and Vu Nguyen and Dinh Q. Phung },    TITLE = { Approximation Vector Machines for Large-scale Online Learning },    JOURNAL = { Journal of Machine Learning Research (JMLR) },    YEAR = { 2017 },    ABSTRACT = { One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. },    FILE = { :le_etal_jmlr17approximation - Approximation Vector Machines for Large Scale Online Learning.pdf:PDF },    KEYWORDS = { kernel, online learning, large-scale machine learning, sparsity, big data, core set, stochastic gradient descent, convergence analysis },    URL = { https://arxiv.org/abs/1604.06518 },} J
 Discriminative Bayesian Nonparametric Clustering Vu Nguyen, Dinh Phung, Trung Le, Svetha Venkatesh and Hung Bui. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf] We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models. @INPROCEEDINGS { nguyen_etal_ijcai17discriminative,    AUTHOR = { Vu Nguyen and Dinh Phung and Trung Le and Svetha Venkatesh and Hung Bui },    TITLE = { Discriminative Bayesian Nonparametric Clustering },    BOOKTITLE = { Proc. of International Joint Conference on Artificial Intelligence (IJCAI) },    YEAR = { 2017 },    ABSTRACT = { We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models. },    FILE = { :nguyen_etal_ijcai17discriminative - Discriminative Bayesian Nonparametric Clustering.pdf:PDF },    URL = { https://www.ijcai.org/proceedings/2017/355 },} C
 Large-scale Online Kernel Learning with Random Feature Reparameterization Tu Dinh Nguyen, Trung Le, Hung Bui and Dinh Phung. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf] A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency. @INPROCEEDINGS { tu_etal_ijcai17_rrf,    AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Bui and Dinh Phung },    TITLE = { Large-scale Online Kernel Learning with Random Feature Reparameterization },    BOOKTITLE = { Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) },    YEAR = { 2017 },    SERIES = { IJCAI'17 },    ABSTRACT = { A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency. },    FILE = { :tu_etal_ijcai17_rrf - Large Scale Online Kernel Learning with Random Feature Reparameterization.pdf:PDF },    LOCATION = { Melbourne, Australia },    NUMPAGES = { 7 },    URL = { https://www.ijcai.org/proceedings/2017/354 },} C
 Column Networks for Collective Classification Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In The Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017. [ | | pdf] Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. @CONFERENCE { pham_etal_aaai17column,    AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Column Networks for Collective Classification },    BOOKTITLE = { The Thirty-First AAAI Conference on Artificial Intelligence (AAAI) },    YEAR = { 2017 },    ABSTRACT = { Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. },    COMMENT = { Accepted },    FILE = { :pham_etal_aaai17column - Column Networks for Collective Classification.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.11.14 },    URL = { https://arxiv.org/abs/1609.04508 },} C
 Forward-Backward Smoothing for Hidden Markov Models of Point Pattern Data Nhan Dam, Dinh Phung, Ba-Ngu Vo and Viet Huynh. In The 4th International Conference on Data Science and Advanced Analytics (DSAA), Tokyo, Japan, 2017. [ | ] @INPROCEEDINGS { dam_etal_dsaa17forward,    AUTHOR = { Nhan Dam and Dinh Phung and Ba-Ngu Vo and Viet Huynh },    TITLE = { Forward-Backward Smoothing for Hidden Markov Models of Point Pattern Data },    BOOKTITLE = { The 4th International Conference on Data Science and Advanced Analytics (DSAA) },    YEAR = { 2017 },    ADDRESS = { Tokyo, Japan },    FILE = { :dam_etal_dsaa17forward - Forward Backward Smoothing for Hidden Markov Models of Point Pattern Data.pdf:PDF },    OWNER = { ndam },    TIMESTAMP = { 2017.08.28 },} C
 Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring Hung Nguyen, Sarah J. Maclagan, Tu Dinh Nguyen, Thin Nguyen, Paul Flemons, Kylie Andrews, Euan G. Ritchie and Dinh Phung. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2017. [ | ] Efficient and reliable monitoring of wild animals in their natural habitats is essential to inform conservation and management decisions. Automatic covert cameras or “camera traps” are being an increasingly popular tool for wildlife monitoring due to their effectiveness and reliability in collecting data of wildlife unobtrusively, continuously and in large volume. However, processing such a large volume of images and videos captured from camera traps manually is extremely expensive, time-consuming and also monotonous. This presents a major obstacle to scientists and ecologists to monitor wildlife in an open environment. Leveraging on recent advances in deep learning techniques in computer vision, we propose in this paper a framework to build automated animal recognition in the wild, aiming at an automated wildlife monitoring system. In particular, we use a single-labeled dataset from Wildlife Spotter project, done by citizen scientists, and the state-of-the-art deep convolutional neural network architectures, to train a computational system capable of filtering animal images and identifying species automatically. Our experimental results achieved an accuracy at 96.6% for the task of detecting images containing animal, and 90.4% for identifying the three most common species among the set of images of wild animals taken in South-central Victoria, Australia, demonstrating the feasibility of building fully automated wildlife observation. This, in turn, can therefore speed up research findings, construct more efficient citizen sciencebased monitoring systems and subsequent management decisions, having the potential to make significant impacts to the world of ecology and trap camera images analysis. @INPROCEEDINGS { hung_etal_dsaa17animal,    AUTHOR = { Hung Nguyen and Sarah J. Maclagan and Tu Dinh Nguyen and Thin Nguyen and Paul Flemons and Kylie Andrews and Euan G. Ritchie and Dinh Phung },    TITLE = { Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring },    BOOKTITLE = { Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA) },    YEAR = { 2017 },    ABSTRACT = { Efficient and reliable monitoring of wild animals in their natural habitats is essential to inform conservation and management decisions. Automatic covert cameras or “camera traps” are being an increasingly popular tool for wildlife monitoring due to their effectiveness and reliability in collecting data of wildlife unobtrusively, continuously and in large volume. However, processing such a large volume of images and videos captured from camera traps manually is extremely expensive, time-consuming and also monotonous. This presents a major obstacle to scientists and ecologists to monitor wildlife in an open environment. Leveraging on recent advances in deep learning techniques in computer vision, we propose in this paper a framework to build automated animal recognition in the wild, aiming at an automated wildlife monitoring system. In particular, we use a single-labeled dataset from Wildlife Spotter project, done by citizen scientists, and the state-of-the-art deep convolutional neural network architectures, to train a computational system capable of filtering animal images and identifying species automatically. Our experimental results achieved an accuracy at 96.6% for the task of detecting images containing animal, and 90.4% for identifying the three most common species among the set of images of wild animals taken in South-central Victoria, Australia, demonstrating the feasibility of building fully automated wildlife observation. This, in turn, can therefore speed up research findings, construct more efficient citizen sciencebased monitoring systems and subsequent management decisions, having the potential to make significant impacts to the world of ecology and trap camera images analysis. },    FILE = { :hung_etal_dsaa17animal - Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring.pdf:PDF },    OWNER = { hung },    TIMESTAMP = { 2017.08.28 },} C
 Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features Nguyen, Thin, Nguyen, Duc Thanh, Larsen, Mark E., O'Dea, Bridianne, Yearwood, John, Phung, Dinh, Venkatesh, Svetha and Christensen, Helen. In Proceedings of the International Conference on World Wide Web (WWW), 2017. [ | | pdf] From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting â€œinsufficient sleepâ€, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels. @INPROCEEDINGS { nguyen_etal_www17prediction,    AUTHOR = { Nguyen, Thin and Nguyen, Duc Thanh and Larsen, Mark E. and O'Dea, Bridianne and Yearwood, John and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen },    TITLE = { Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features },    BOOKTITLE = { Proceedings of the International Conference on World Wide Web (WWW) },    YEAR = { 2017 },    ABSTRACT = { From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting â€œinsufficient sleepâ€, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels. },    FILE = { :nguyen_etal_www17prediction - Prediction of Population Health Indices from Social Media Using Kernel Based Textual and Temporal Features.pdf:PDF },    OWNER = { thinng },    TIMESTAMP = { 2017.03.25 },    URL = { http://dl.acm.org/citation.cfm?id=3054136 },} C
 Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health-related Communities Bo Dao, Thin Nguyen, Svetha Venkatesh and Dinh Phung. International Journal of Data Science and Analytics, 2017. [ | ] Social media are an online means of interaction among individuals. People are increasingly using social media, especially online communities, to discuss health concerns and seek support. Understanding topics, sentiment, and structures of these communities informs important aspects of health-related conditions. There has been growing research interest in analyzing online mental health communities; however analysis of these communities with health concerns has been limited. This paper investigate and identify latent meta-groups of online communities with and without mental health-related conditions including depression and autism. Large datasets from online communities were crawled. We analyse both sentiment-based, psycholinguistic-based and topic-based features from blog posts made by members of these online communities. The work focuses on using nonparametric methods to infer latent topics automatically from the corpus of affective words in the blog posts. The visualization of the discovered meta-communities in their use of latent topics shows a difference between the groups. This presents evidence of the emotion-bearing difference in online mental health-related communities, suggesting a possible angle for support and intervention. The methodology might offer potential machine learning techniques for research and practice in psychiatry. @ARTICLE { Dao_etal_17Latent,    AUTHOR = { Bo Dao and Thin Nguyen and Svetha Venkatesh and Dinh Phung },    TITLE = { Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health-related Communities },    JOURNAL = { International Journal of Data Science and Analytics },    YEAR = { 2017 },    ABSTRACT = { Social media are an online means of interaction among individuals. People are increasingly using social media, especially online communities, to discuss health concerns and seek support. Understanding topics, sentiment, and structures of these communities informs important aspects of health-related conditions. There has been growing research interest in analyzing online mental health communities; however analysis of these communities with health concerns has been limited. This paper investigate and identify latent meta-groups of online communities with and without mental health-related conditions including depression and autism. Large datasets from online communities were crawled. We analyse both sentiment-based, psycholinguistic-based and topic-based features from blog posts made by members of these online communities. The work focuses on using nonparametric methods to infer latent topics automatically from the corpus of affective words in the blog posts. The visualization of the discovered meta-communities in their use of latent topics shows a difference between the groups. This presents evidence of the emotion-bearing difference in online mental health-related communities, suggesting a possible angle for support and intervention. The methodology might offer potential machine learning techniques for research and practice in psychiatry. },    FILE = { :Dao_etal_17Latent - Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health Related Communities.pdf:PDF },    OWNER = { thinng },    TIMESTAMP = { 2017.08.31 },} J
 Academia versus social media: A psycho-linguistic analysis Thin Nguyen, Svetha Venkatesh and Dinh Phung. Journal of Computational Science, 2017. [ | ] Publication pressure has influenced the way scientists report their experimental results. Recently it has been found that scientific outcomes have been exaggerated or distorted (spin) to hopefully be published. Apart from investigating the content to look for spins, language styles has been proven to be the good traces. For example, the use of words in emotion lexicons has been used to interpret exaggeration and overstatement in academia. This work adapts a data-driven approach to explore a comprehensive set of psycho-linguistic features for a large corpus of PubMed papers published for the last four decades. The language features for other media – online encyclopedia (Wikipedia), online diaries (web-logs), online forums (Reddit), and micro-blogs (Twitter) – are also extracted. Several binary classifications are employed to discover linguistic predictors of scientific abstracts versus other media as well as strong predictors of scientific articles in different cohorts of impact factors and author affiliations. Trends of language styles expressed in scientific articles over the course of 40 years has also been discovered, providing the evolution of academic writing for the period of time. The study demonstrates advances in lightning-fast cluster computing on dealing with large scale data, consisting of 5.8 terabytes of data containing 3.6 billion records from all the media. The good performance of the advanced cluster computing framework suggests the potential of pattern recognition in data at scale. @ARTICLE { Nguyen_etal_17Academia,    AUTHOR = { Thin Nguyen and Svetha Venkatesh and Dinh Phung },    TITLE = { Academia versus social media: A psycho-linguistic analysis },    JOURNAL = { Journal of Computational Science },    YEAR = { 2017 },    VOLUME = { 0 },    NUMBER = { 0 },    PAGES = { 0-0 },    ABSTRACT = { Publication pressure has influenced the way scientists report their experimental results. Recently it has been found that scientific outcomes have been exaggerated or distorted (spin) to hopefully be published. Apart from investigating the content to look for spins, language styles has been proven to be the good traces. For example, the use of words in emotion lexicons has been used to interpret exaggeration and overstatement in academia. This work adapts a data-driven approach to explore a comprehensive set of psycho-linguistic features for a large corpus of PubMed papers published for the last four decades. The language features for other media – online encyclopedia (Wikipedia), online diaries (web-logs), online forums (Reddit), and micro-blogs (Twitter) – are also extracted. Several binary classifications are employed to discover linguistic predictors of scientific abstracts versus other media as well as strong predictors of scientific articles in different cohorts of impact factors and author affiliations. Trends of language styles expressed in scientific articles over the course of 40 years has also been discovered, providing the evolution of academic writing for the period of time. The study demonstrates advances in lightning-fast cluster computing on dealing with large scale data, consisting of 5.8 terabytes of data containing 3.6 billion records from all the media. The good performance of the advanced cluster computing framework suggests the potential of pattern recognition in data at scale. },    FILE = { :Nguyen_etal_17Academia - Academia Versus Social Media_ a Psycho Linguistic Analysis.pdf:PDF },    OWNER = { thinng },    TIMESTAMP = { 2017.08.28 },} J
 Estimating support scores of autism communities in large-scale Web information systems Thin Nguyen, Hung Nguyen, Svetha Venkatesh and Dinh Phung. In Proceedings of the International Conference on Web Information Systems Engineering (WISE)Springer, , 2017. [ | ] Individuals with Autism Spectrum Disorder (ASD) have been shown to prefer communication at a socio-spatial distance. So while rarely found in the real world, autism communities are popular in Web-based forums, convenient for people with ASD to seek and share health related information. Reddit is one such avenue for people of common interest to connect, forming communities of specific interest, namely subreddits. This work aims to estimate support scores provided by a popular subreddit interested in ASD – www.reddit.com/r/aspergers. The scores were measured in both the quantities and qualities of the conversations in the forum, including conversational involvement, emotional, and informational support. The support scores of the subreddit Aspergers was compared with that of an average subreddit derived from entire Reddit, represented by two big corpora of approximately 200 million Reddit posts and 1.66 billion Reddit comments. The ASD subreddit was found to be a supportive community, having far higher support scores than did the average subreddit. Apache Spark, an advanced cluster computing framework, is employed to speed up processing of the large corpora. Scalable machine learning techniques implemented in Spark help discriminate the content made in Aspergers versus other subreddits and automatically discover linguistic predictors of ASD within minutes, providing timely reports. @INCOLLECTION { Nguyen_etal_17Estimating,    AUTHOR = { Thin Nguyen and Hung Nguyen and Svetha Venkatesh and Dinh Phung },    TITLE = { Estimating support scores of autism communities in large-scale Web information systems },    BOOKTITLE = { Proceedings of the International Conference on Web Information Systems Engineering (WISE) },    PUBLISHER = { Springer },    YEAR = { 2017 },    SERIES = { Lecture Notes in Computer Science },    ABSTRACT = { Individuals with Autism Spectrum Disorder (ASD) have been shown to prefer communication at a socio-spatial distance. So while rarely found in the real world, autism communities are popular in Web-based forums, convenient for people with ASD to seek and share health related information. Reddit is one such avenue for people of common interest to connect, forming communities of specific interest, namely subreddits. This work aims to estimate support scores provided by a popular subreddit interested in ASD – www.reddit.com/r/aspergers. The scores were measured in both the quantities and qualities of the conversations in the forum, including conversational involvement, emotional, and informational support. The support scores of the subreddit Aspergers was compared with that of an average subreddit derived from entire Reddit, represented by two big corpora of approximately 200 million Reddit posts and 1.66 billion Reddit comments. The ASD subreddit was found to be a supportive community, having far higher support scores than did the average subreddit. Apache Spark, an advanced cluster computing framework, is employed to speed up processing of the large corpora. Scalable machine learning techniques implemented in Spark help discriminate the content made in Aspergers versus other subreddits and automatically discover linguistic predictors of ASD within minutes, providing timely reports. },    FILE = { :Nguyen_etal_17Estimating - Estimating Support Scores of Autism Communities in Large Scale Web Information Systems.pdf:PDF },    LANGUAGE = { English },    OWNER = { thinng },    TIMESTAMP = { 2017.08.28 },} BC
 Kernel-based features for predicting population health indices from geocoded social media data Thin Nguyen, Mark E. Larsen, Bridianne O'Dea, Duc Thanh Nguyen, John Yearwood, Dinh Phung, Svetha Venkatesh and Helen Christensen. Decision Support Systems, 2017. [ | | pdf] When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. @ARTICLE { Nguyen_etal_17Kernel,    AUTHOR = { Thin Nguyen and Mark E. Larsen and Bridianne O'Dea and Duc Thanh Nguyen and John Yearwood and Dinh Phung and Svetha Venkatesh and Helen Christensen },    TITLE = { Kernel-based features for predicting population health indices from geocoded social media data },    JOURNAL = { Decision Support Systems },    YEAR = { 2017 },    VOLUME = { 0 },    NUMBER = { 0 },    PAGES = { 1-34 },    ABSTRACT = { When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. },    FILE = { :Nguyen_etal_17Kernel - Kernel Based Features for Predicting Population Health Indices from Geocoded Social Media Data.pdf:PDF },    OWNER = { thinng },    TIMESTAMP = { 2017.07.01 },    URL = { http://www.sciencedirect.com/science/article/pii/S0167923617301227 },} J
 Estimation of the prevalence of adverse drug reactions from social media Thin Nguyen, Mark Larsen, Bridianne O'Dea, Dinh Phung, Svetha Venkatesh and Helen Christensen. International Journal of Medical Informatics (IJMI), 2017. [ | | pdf] This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale. @ARTICLE { nguyen_etal_jmi17estimation,    AUTHOR = { Thin Nguyen and Mark Larsen and Bridianne O'Dea and Dinh Phung and Svetha Venkatesh and Helen Christensen },    TITLE = { Estimation of the prevalence of adverse drug reactions from social media },    JOURNAL = { International Journal of Medical Informatics (IJMI) },    YEAR = { 2017 },    PAGES = { 1--17 },    ABSTRACT = { This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale. },    FILE = { :nguyen_etal_jmi17estimation - Estimation of the Prevalence of Adverse Drug Reactions from Social Media.pdf:PDF },    URL = { http://www.sciencedirect.com/science/article/pii/S1386505617300746 },} J
 Model-Based Multiple Instance Learning Ba-Ngu Vo, Dinh Phung, Quang N. Tran and Ba-Tuong Vo. arXiv, Mar. 2017. [ | | pdf] While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. @ARTICLE { vo_etal_arxiv17modelbased,    AUTHOR = { Ba-Ngu Vo and Dinh Phung and Quang N. Tran and Ba-Tuong Vo },    TITLE = { Model-Based Multiple Instance Learning },    JOURNAL = { arXiv },    YEAR = { 2017 },    MONTH = { Mar. },    ABSTRACT = { While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. },    FILE = { :vo_etal_arxiv17modelbased - Model Based Multiple Instance Learning.pdf:PDF },    KEYWORDS = { Multiple instance learning, point pattern, point process, random finite set, classification, novelty detection, clustering },    URL = { https://arxiv.org/pdf/1703.02155.pdf },} J
 Hierarchical semi-Markov conditional random fields for deep recursive sequential data Truyen Tran, Dinh Phung, Hung H. Bui and Svetha Venkatesh. Artificial Intelligence (AIJ), Feb. 2017. (Accepted). [ | | pdf] We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. @ARTICLE { tran_etal_aij17hierarchical,    AUTHOR = { Truyen Tran and Dinh Phung and Hung H. Bui and Svetha Venkatesh },    TITLE = { Hierarchical semi-Markov conditional random fields for deep recursive sequential data },    JOURNAL = { Artificial Intelligence (AIJ) },    YEAR = { 2017 },    MONTH = { Feb. },    NOTE = { Accepted },    ABSTRACT = { We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. },    FILE = { :tran_etal_aij17hierarchical - Hierarchical Semi Markov Conditional Random Fields for Deep Recursive Sequential Data.pdf:PDF },    KEYWORDS = { Deep nested sequential processes, Hierarchical semi-Markov conditional random field, Partial labelling, Constrained inference, Numerical scaling },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.02.21 },    URL = { http://www.sciencedirect.com/science/article/pii/S0004370217300231 },} J
• See my thesis (chapter 5) for for an equivalent directed graphical model, which is the precusor of this work and where I had described the Assymetric Inside-Outside (AIO) algorithm in great detail. A brief version of this for directed case has also appeared in this AAAI'04's paper. The idea of semi-Markov duration modelling has also been addressed for directed case in these CVPR05 and AIJ09 papers.
 Streaming Clustering with Bayesian Nonparametric Models Viet Huynh and Dinh Phung. Neurocomputing, 2017. [ | | pdf] Bayesian nonparametric (BNP) models are theoretically suitable for learning streaming data due to their complexity relaxation to growing data observed over time. There is a rich body of literature on developing efficient approximate methods for posterior inferences in BNP models, typically dominated by MCMC. However, very limited work has addressed posterior inference in a streaming fashion, which is important to fully realize the potential of BNP models applied to real-world tasks. The main challenge resides in developing one-pass posterior update which is consistent withthe data streamed over time (i.e., data is scanned only once), for which general MCMC methods will fail to address. On the other hand, Dirichlet process-based mixture models are the most fundamental building blocks in the field of BNP. To this end, we develop in this paper a class of variational methods suitable for posterior inference of the Dirichlet process mixture (DPM) models where both the posterior update and data are presented in a streaming setting. We first propose new methods to advance existing variational based inference approaches for BNP to allow the variational distributions growing over time, hence overcoming an important limitation of current methods in imposing parametric, truncated restrictions on the variational distributions. This results in two new methods namely truncation-free variational Bayes (TFVB) and truncation-free maximization expectation (TFME) respectively where the latter further supports hard clustering. These inference methods form the foundation for our streaming inference algorithm where we further adapt the recent Streaming Variational Bayes proposed in [1] to our task. To demonstrate our framework for realworld tasks whose datasets are often heterogeneous, we develop one more theoretical extension for our model to handle assorted data where each observation consists of different data types. Our experiments with automatically learning the number of clusters demonstrate the comparable inference capability of our framework in comparison with truncated version variational inference algorithms for both synthetic and real-world datasets. Moreover, an evaluation of streaming learning algorithms with text corpora reveals both quantitative and qualitative efficacy of the algorithms on clustering documents. @ARTICLE { huynh_phung_neuro17streaming,    AUTHOR = { Viet Huynh and Dinh Phung },    TITLE = { Streaming Clustering with Bayesian Nonparametric Models },    JOURNAL = { Neurocomputing },    YEAR = { 2017 },    ISSN = { 0925-2312 },    ABSTRACT = { Bayesian nonparametric (BNP) models are theoretically suitable for learning streaming data due to their complexity relaxation to growing data observed over time. There is a rich body of literature on developing efficient approximate methods for posterior inferences in BNP models, typically dominated by MCMC. However, very limited work has addressed posterior inference in a streaming fashion, which is important to fully realize the potential of BNP models applied to real-world tasks. The main challenge resides in developing one-pass posterior update which is consistent withthe data streamed over time (i.e., data is scanned only once), for which general MCMC methods will fail to address. On the other hand, Dirichlet process-based mixture models are the most fundamental building blocks in the field of BNP. To this end, we develop in this paper a class of variational methods suitable for posterior inference of the Dirichlet process mixture (DPM) models where both the posterior update and data are presented in a streaming setting. We first propose new methods to advance existing variational based inference approaches for BNP to allow the variational distributions growing over time, hence overcoming an important limitation of current methods in imposing parametric, truncated restrictions on the variational distributions. This results in two new methods namely truncation-free variational Bayes (TFVB) and truncation-free maximization expectation (TFME) respectively where the latter further supports hard clustering. These inference methods form the foundation for our streaming inference algorithm where we further adapt the recent Streaming Variational Bayes proposed in [1] to our task. To demonstrate our framework for realworld tasks whose datasets are often heterogeneous, we develop one more theoretical extension for our model to handle assorted data where each observation consists of different data types. Our experiments with automatically learning the number of clusters demonstrate the comparable inference capability of our framework in comparison with truncated version variational inference algorithms for both synthetic and real-world datasets. Moreover, an evaluation of streaming learning algorithms with text corpora reveals both quantitative and qualitative efficacy of the algorithms on clustering documents. },    FILE = { :huynh_phung_neuro17streaming - Streaming Clustering with Bayesian Nonparametric Models.pdf:PDF },    KEYWORDS = { streaming learning, Bayesian nonparametric, variational Bayes inference, Dirichlet process, Dirichlet process mixtures, heterogeneous data sources },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.02.18 },    URL = { http://www.sciencedirect.com/science/article/pii/S0925231217304253 },} J
 Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions Budhaditya Saha, Sunil Gupta, Dinh Phung and Svetha Venkatesh. Knowledge and Information Systems (KAIS), 2017. [ | | pdf] Electronic Medical Records (EMR) are being increasingly used for “risk” prediction.By “risks”, we denote outcomes such as emergency presentation, readmission, thelength of hospitalizations etc. However, EMR data analysis is complicated by missing entries.There are two reasons - the “primary reason for admission” is included in EMR, but thecomorbidities (other chronic diseases) are left uncoded, and, many zero values in the dataare accurate, reflecting that a patient has not accessed medical facilities. A key challenge isto deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflectingthe fact that patients have some, but not all diseases. We propose a novel model to fill-inthese missing values and use the new representation for prediction of key hospital events. To“fill-in” missing values, we represent the feature-patient matrix as a product of two low-rankfactors, preserving the sparsity property in the product. Intuitively, the product regularizationallows sparse imputation of patient conditions reflecting common comorbidities acrosspatients. We develop a scalable optimization algorithm based on Block coordinate descentmethod to find an optimal solution. We evaluate the proposed framework on two real worldEMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions).Our result shows that the AUC for 3 months emergency presentation prediction isimproved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMIdata. Similarly, AUC for 3 months emergency admission prediction from (0.730 to 0.752)for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method toa supervised model for predicting multiple related risk outcomes (e.g. emergency presentationsand admissions in hospital over 3, 6 and 12 months period) in an integrated framework.The supervised model consistently outperforms state-of-the-art baseline methods. @ARTICLE { budhaditya_gupta_phung_venkatesh_kais17effective,    AUTHOR = { Budhaditya Saha and Sunil Gupta and Dinh Phung and Svetha Venkatesh },    TITLE = { Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions },    JOURNAL = { Knowledge and Information Systems (KAIS) },    YEAR = { 2017 },    ABSTRACT = { Electronic Medical Records (EMR) are being increasingly used for “risk” prediction.By “risks”, we denote outcomes such as emergency presentation, readmission, thelength of hospitalizations etc. However, EMR data analysis is complicated by missing entries.There are two reasons - the “primary reason for admission” is included in EMR, but thecomorbidities (other chronic diseases) are left uncoded, and, many zero values in the dataare accurate, reflecting that a patient has not accessed medical facilities. A key challenge isto deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflectingthe fact that patients have some, but not all diseases. We propose a novel model to fill-inthese missing values and use the new representation for prediction of key hospital events. To“fill-in” missing values, we represent the feature-patient matrix as a product of two low-rankfactors, preserving the sparsity property in the product. Intuitively, the product regularizationallows sparse imputation of patient conditions reflecting common comorbidities acrosspatients. We develop a scalable optimization algorithm based on Block coordinate descentmethod to find an optimal solution. We evaluate the proposed framework on two real worldEMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions).Our result shows that the AUC for 3 months emergency presentation prediction isimproved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMIdata. Similarly, AUC for 3 months emergency admission prediction from (0.730 to 0.752)for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method toa supervised model for predicting multiple related risk outcomes (e.g. emergency presentationsand admissions in hospital over 3, 6 and 12 months period) in an integrated framework.The supervised model consistently outperforms state-of-the-art baseline methods. },    FILE = { :budhaditya_gupta_phung_venkatesh_kais17effective - Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.17 },    URL = { https://link.springer.com/article/10.1007/s10115-017-1038-0 },} J
 Energy-Based Localized Anomaly Detection in Video Surveillance Hung Vu, Tu Dinh Nguyen, Anthony Travers, Svetha Venkatesh and Dinh Phung. In The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Jeju, South Korea, May 23-26 2017. (Best Application Paper Award). [ | | pdf] Automated detection of abnormal events in video surveillance is an important task in research and practical applications. This is, however, a challenging problem due to the growing collection of data without the knowledge of what to be defined as “abnormal”, and the expensive feature engineering procedure. In this paper we introduce a unified framework for anomaly detection in video based on the restricted Boltzmann machine (RBM), a recent powerful method for unsupervised learning and representation learning. Our proposed system works directly on the image pixels rather than hand-crafted features, it learns new representations for data in a completely unsupervised manner without the need for labels, and then reconstructs the data to recognize the locations of abnormal events based on the reconstruction errors. More importantly, our approach can be deployed in both offline and streaming settings, in which trained parameters of the model are fixed in offline setting whilst are updated incrementally with video data arriving in a stream. Experiments on three publicly benchmark video datasets show that our proposed method can detect and localize the abnormalities at pixel level with better accuracy than those of baselines, and achieve competitive performance compared with state-of-the-art approaches. Moreover, as RBM belongs to a wider class of deep generative models, our framework lays the groundwork towards a more powerful deep unsupervised abnormality detection framework. @INPROCEEDINGS { vu_etal_pakdd17energy,    AUTHOR = { Hung Vu and Tu Dinh Nguyen and Anthony Travers and Svetha Venkatesh and Dinh Phung },    TITLE = { Energy-Based Localized Anomaly Detection in Video Surveillance },    BOOKTITLE = { The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },    YEAR = { 2017 },    EDITOR = { Jinho Kim, Kyuseok Shim, Longbing Cao, Jae-Gil Lee, Xuemin Lin, Yang-Sae Moon },    ADDRESS = { Jeju, South Korea },    MONTH = { May 23-26 },    NOTE = { Best Application Paper Award },    ABSTRACT = { Automated detection of abnormal events in video surveillance is an important task in research and practical applications. This is, however, a challenging problem due to the growing collection of data without the knowledge of what to be defined as “abnormal”, and the expensive feature engineering procedure. In this paper we introduce a unified framework for anomaly detection in video based on the restricted Boltzmann machine (RBM), a recent powerful method for unsupervised learning and representation learning. Our proposed system works directly on the image pixels rather than hand-crafted features, it learns new representations for data in a completely unsupervised manner without the need for labels, and then reconstructs the data to recognize the locations of abnormal events based on the reconstruction errors. More importantly, our approach can be deployed in both offline and streaming settings, in which trained parameters of the model are fixed in offline setting whilst are updated incrementally with video data arriving in a stream. Experiments on three publicly benchmark video datasets show that our proposed method can detect and localize the abnormalities at pixel level with better accuracy than those of baselines, and achieve competitive performance compared with state-of-the-art approaches. Moreover, as RBM belongs to a wider class of deep generative models, our framework lays the groundwork towards a more powerful deep unsupervised abnormality detection framework. },    FILE = { :vu_etal_pakdd17energy - Energy Based Localized Anomaly Detection in Video Surveillance.pdf:PDF },    OWNER = { hungv },    TIMESTAMP = { 2017.01.31 },    URL = { https://link.springer.com/chapter/10.1007/978-3-319-57454-7_50 },} C
 2016
 One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems Nguyen, Vu, Nguyen, Tu Dinh, Le, Trung, Phung, Dinh and Venkatesh, Svetha. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1113-1118, Dec 2016. [ | | pdf | code] Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. @CONFERENCE { nguyen_etal_icdm16onepass,    AUTHOR = { Nguyen, Vu and Nguyen, Tu Dinh and Le, Trung and Phung, Dinh and Venkatesh, Svetha },    TITLE = { One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems },    BOOKTITLE = { 2016 IEEE 16th International Conference on Data Mining (ICDM) },    YEAR = { 2016 },    PAGES = { 1113-1118 },    MONTH = { Dec },    ABSTRACT = { Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. },    CODE = { https://github.com/ntienvu/ICDM2016_OLR },    DOI = { 10.1109/ICDM.2016.0145 },    FILE = { :nguyen_etal_icdm16onepass - One Pass Logistic Regression for Label Drift and Large Scale Classification on Distributed Systems.pdf:PDF },    KEYWORDS = { Big Data;distributed processing;pattern classification;regression analysis;Big Data cross-validation;Spark-OLR;class labels;data shuffling;distributed systems;execution time;label-drift classification problem;large-scale classification;large-scale datasets;one-pass logistic regression;optimal hyperparameter selection;sparkling OLR;Bayes methods;Big data;Context;Data models;Estimation;Industries;Logistics;Apache Spark;Logistic regression;distributed system;label-drift;large-scale classification },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.09.10 },    URL = { http://ieeexplore.ieee.org/document/7837958/ },} C
 Dual Space Gradient Descent for Online Learning Le, Trung, Nguyen, Tu Dinh, Nguyen, Vu and Phung, Dinh. In Advances in Neural Information Processing (NIPS), December 2016. [ | | pdf] One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines. @CONFERENCE { le_etal_nips16dual,    AUTHOR = { Le, Trung and Nguyen, Tu Dinh and Nguyen, Vu and Phung, Dinh },    TITLE = { Dual Space Gradient Descent for Online Learning },    BOOKTITLE = { Advances in Neural Information Processing (NIPS) },    YEAR = { 2016 },    MONTH = { December },    ABSTRACT = { One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines. },    FILE = { :le_etal_nips16dual - Dual Space Gradient Descent for Online Learning.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.08.16 },    URL = { https://papers.nips.cc/paper/6560-dual-space-gradient-descent-for-online-learning.pdf },} C
 Scalable Nonparametric Bayesian Multilevel Clustering Huynh, V., Phung, D., Svetha, V., Nguyen, X.L, Hoffman, M. and Bui, H.. In 32th Conference on Uncertainty in Artificial Intelligence (UAI), June 2016. [ | | pdf] @CONFERENCE { huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable,    AUTHOR = { Huynh, V. and Phung, D. and Svetha, V. and Nguyen, X.L and Hoffman, M. and Bui, H. },    TITLE = { Scalable Nonparametric {B}ayesian Multilevel Clustering },    BOOKTITLE = { 32th Conference on Uncertainty in Artificial Intelligence (UAI) },    YEAR = { 2016 },    MONTH = { June },    FILE = { :huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable - Scalable Nonparametric Bayesian Multilevel Clustering.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.09 },    URL = { http://auai.org/uai2016/proceedings/papers/262.pdf },} C
 Budgeted Semi-supervised Support Vector Machine Le, Trung, Duong, Phuong, Dinh, Mi, Nguyen, Tu, Nguyen, Vu and Phung, Dinh. In 32th Conference on Uncertainty in Artificial Intelligence (UAI), June 2016. [ | | pdf] @CONFERENCE { le_duong_dinh_nguyen_nguyen_phung_uai16budgeted,    AUTHOR = { Le, Trung and Duong, Phuong and Dinh, Mi and Nguyen, Tu and Nguyen, Vu and Phung, Dinh },    TITLE = { Budgeted Semi-supervised {S}upport {V}ector {M}achine },    BOOKTITLE = { 32th Conference on Uncertainty in Artificial Intelligence (UAI) },    YEAR = { 2016 },    MONTH = { June },    FILE = { :le_duong_dinh_nguyen_nguyen_phung_uai16budgeted - Budgeted Semi Supervised Support Vector Machine.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.09 },    URL = { http://auai.org/uai2016/proceedings/papers/110.pdf },} C
 Nonparametric Budgeted Stochastic Gradient Descent Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS), May 2016. [ | | pdf] @CONFERENCE { le_nguyen_phung_aistats16nonparametric,    AUTHOR = { Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh },    TITLE = { Nonparametric Budgeted Stochastic Gradient Descent },    BOOKTITLE = { 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS) },    YEAR = { 2016 },    MONTH = { May },    FILE = { :le_nguyen_phung_aistats16nonparametric - Nonparametric Budgeted Stochastic Gradient Descent.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },    URL = { http://www.jmlr.org/proceedings/papers/v51/le16.pdf },} C
 Introduction: special issue of selected papers from ACML 2014 Li, Hang and Phung, Dinh and Cao, Tru and Ho, Tu-Bao and Zhou, Zhi-Hua, editor. volume 103 of Machine Learning, Springer, May 2016. [ | | pdf] @PROCEEDINGS { li_phung_cao_ho_zhou_acml14_selectedpapers,    TITLE = { Introduction: special issue of selected papers from {ACML} 2014 },    YEAR = { 2016 },    EDITOR = { Li, Hang and Phung, Dinh and Cao, Tru and Ho, Tu-Bao and Zhou, Zhi-Hua },    VOLUME = { 103 },    NUMBER = { 2 },    SERIES = { Machine Learning },    PUBLISHER = { Springer },    MONTH = { May },    DOI = { 10.1007/s10994-016-5549-9 },    FILE = { :li_phung_cao_ho_zhou_acml14_selectedpapers - Introduction_ Special Issue of Selected Papers from ACML 2014.pdf:PDF },    ISSN = { 1573-0565 },    JOURNAL = { Machine Learning },    OWNER = { Thanh-Binh Nguyen },    PAGES = { 1--3 },    TIMESTAMP = { 2016.04.11 },    URL = { http://dx.doi.org/10.1007/s10994-016-5549-9 },} P
 Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View Luo, Wei, Phung, Dinh, Tran, Truyen, Gupta, Sunil, Rana, Santu, Karmakar, Chandan, Shilton, Alistair, Yearwood, John, Dimitrova, Nevenka, Ho, Bao Tu, Venkatesh, Svetha and Berk, Michael. J Med Internet Res, 18(12):e323, Dec 2016. [ | | pdf] Background: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community. @ARTICLE { Luo_etal_jmir16guidelines,    AUTHOR = { Luo, Wei and Phung, Dinh and Tran, Truyen and Gupta, Sunil and Rana, Santu and Karmakar, Chandan and Shilton, Alistair and Yearwood, John and Dimitrova, Nevenka and Ho, Bao Tu and Venkatesh, Svetha and Berk, Michael },    TITLE = { Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View },    JOURNAL = { J Med Internet Res },    YEAR = { 2016 },    VOLUME = { 18 },    NUMBER = { 12 },    PAGES = { e323 },    MONTH = { Dec },    ABSTRACT = { Background: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community. },    DAY = { 16 },    DOI = { 10.2196/jmir.5870 },    FILE = { :Luo_etal_jmir16guidelines - Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research_ a Multidisciplinary View.pdf:PDF },    KEYWORDS = { machine learning, clinical prediction rule, guideline },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.12.21 },    URL = { http://www.jmir.org/2016/12/e323/ },} J
 Data Clustering Using Side Information Dependent Chinese Restaurant Processes Li, Cheng, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 47(2):463-488, May 2016. [ | | pdf] Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach. @ARTICLE { li_rana_phung_venkatesh_kais16,    AUTHOR = { Li, Cheng and Rana, Santu and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Data Clustering Using Side Information Dependent {C}hinese Restaurant Processes },    JOURNAL = { Knowledge and Information Systems (KAIS) },    YEAR = { 2016 },    VOLUME = { 47 },    NUMBER = { 2 },    PAGES = { 463--488 },    MONTH = { May },    ABSTRACT = { Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach. },    DOI = { 10.1007/s10115-015-0834-7 },    FILE = { :li_rana_phung_venkatesh_kais16 - Data Clustering Using Side Information Dependent Chinese Restaurant Processes.pdf:PDF },    KEYWORDS = { Side information Similarity Data clustering Bayesian nonparametric models },    OWNER = { Dinh },    TIMESTAMP = { 2015.03.02 },    URL = { http://link.springer.com/article/10.1007/s10115-015-0834-7 },} J
 Multiple Kernel Learning with Data Augmentation Nguyen, Khanh, Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 8th Asian Conference on Machine Learning (ACML), Nov. 2016. [ | ] @CONFERENCE { nguyen_etal_acml16multiple,    AUTHOR = { Nguyen, Khanh and Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh },    TITLE = { Multiple Kernel Learning with Data Augmentation },    BOOKTITLE = { 8th Asian Conference on Machine Learning (ACML) },    YEAR = { 2016 },    MONTH = { Nov. },    FILE = { :nguyen_etal_acml16multiple - Multiple Kernel Learning with Data Augmentation.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.07.13 },} C
 Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In Advances in Artificial Intelligence, pages 455-468.Springer, , 2016. [ | | pdf] Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening. @INCOLLECTION { nguyen_etal_ai16exceptional,    AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious },    BOOKTITLE = { Advances in Artificial Intelligence },    PUBLISHER = { Springer },    YEAR = { 2016 },    VOLUME = { 9992 },    PAGES = { 455--468 },    ABSTRACT = { Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening. },    FILE = { :nguyen_etal_ai16exceptional - Exceptional Contrast Set Mining_ Moving beyond the Deluge of the Obvious.pdf:PDF },    GROUPS = { Contrast Set Mining },    ORGANIZATION = { Springer },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.01.05 },    URL = { http://link.springer.com/chapter/10.1007/978-3-319-50127-7_39 },} BC
 SECC: Simultaneous extraction of context and community from pervasive signals Nguyen, T., Nguyen, V., Salim, F.D. and Phung, D.. In IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM), pages 1-9, March 2016. [ | | pdf] Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels. We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated. @INPROCEEDINGS { nguyen_nguyen_salim_phung_percom16secc,    AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Phung, D. },    TITLE = { {SECC}: Simultaneous extraction of context and community from pervasive signals },    BOOKTITLE = { IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM) },    YEAR = { 2016 },    PAGES = { 1-9 },    MONTH = { March },    ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels. We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated. },    DOI = { 10.1109/PERCOM.2016.7456501 },    FILE = { :nguyen_nguyen_salim_phung_percom16secc - SECC_ Simultaneous Extraction of Context and Community from Pervasive Signals.pdf:PDF },    KEYWORDS = { Bluetooth;Context;Context modeling;Data mining;Data models;Feature extraction;Mixture models },    URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7456501 },} C
 Nonparametric discovery of movement patterns from accelerometer signals Nguyen, T., Gupta, S., Venkatesh, S. and Phung, D.. Pattern Recognition Letters, 70(C):52-58, Jan. 2016. [ | | pdf] Monitoring daily physical activity plays an important role in disease prevention and intervention. This paper proposes an approach to monitor the body movement intensity levels from accelerometer data. We collect the data using the accelerometer in a realistic setting without any supervision. The ground-truth of activities is provided by the participants themselves using an experience sampling application running on their mobile phones. We compute a novel feature that has a strong correlation with the movement intensity. We use the hierarchical Dirichlet process (HDP) model to detect the activity levels from this feature. Consisting of Bayesian nonparametric priors over the parameters the model can infer the number of levels automatically. By demonstrating the approach on the publicly available USC-HAD dataset that includes ground-truth activity labels, we show a strong correlation between the discovered activity levels and the movement intensity of the activities. This correlation is further confirmed using our newly collected dataset. We further use the extracted patterns as features for clustering and classifying the activity sequences to improve performance. @ARTICLE { nguyen_gupta_venkatesh_phung_pr16nonparametric,    AUTHOR = { Nguyen, T. and Gupta, S. and Venkatesh, S. and Phung, D. },    TITLE = { Nonparametric discovery of movement patterns from accelerometer signals },    JOURNAL = { Pattern Recognition Letters },    YEAR = { 2016 },    VOLUME = { 70 },    NUMBER = { C },    PAGES = { 52--58 },    MONTH = { Jan. },    ISSN = { 0167-8655 },    ABSTRACT = { Monitoring daily physical activity plays an important role in disease prevention and intervention. This paper proposes an approach to monitor the body movement intensity levels from accelerometer data. We collect the data using the accelerometer in a realistic setting without any supervision. The ground-truth of activities is provided by the participants themselves using an experience sampling application running on their mobile phones. We compute a novel feature that has a strong correlation with the movement intensity. We use the hierarchical Dirichlet process (HDP) model to detect the activity levels from this feature. Consisting of Bayesian nonparametric priors over the parameters the model can infer the number of levels automatically. By demonstrating the approach on the publicly available USC-HAD dataset that includes ground-truth activity labels, we show a strong correlation between the discovered activity levels and the movement intensity of the activities. This correlation is further confirmed using our newly collected dataset. We further use the extracted patterns as features for clustering and classifying the activity sequences to improve performance. },    DOI = { http://dx.doi.org/10.1016/j.patrec.2015.11.003 },    FILE = { :nguyen_gupta_venkatesh_phung_pr16nonparametric - Nonparametric Discovery of Movement Patterns from Accelerometer Signals.pdf:PDF },    KEYWORDS = { Accelerometer, Activity recognition, Bayesian nonparametric, Dirichlet process, Movement intensity },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.10 },    URL = { http://www.sciencedirect.com/science/article/pii/S016786551500389X },} J
 Preterm Birth Prediction: Stable Selection of Interpretable Rules from High Dimensional Data Tran, Truyen, Luo, Wei, Phung, Dinh, Morris, Jonathan, Rickard, Kristen and Venkatesh, Svetha. In Proceedings of the 1st Machine Learning for Healthcare Conference, pages 164-177, 2016. (Accepted). [ | | pdf] Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. The most well-designed clinical study thus far reaches a modest sensitivity of 18.2–24.2% at specificity of 28.6–33.3%. We take a different approach by exploiting databases of normal hospital operations. We aims are twofold: (i) to derive an easy-to-use, interpretable prediction rule with quantified uncertainties, and (ii) to construct accurate classifiers for preterm birth prediction. Our approach is to automatically generate and select from hundreds (if not thousands) of possible predictors using stability-aware techniques. Derived from a large database of 15,814 women, our simplified prediction rule with only 10 items has sensitivity of 62.3% at specificity of 81.5%. @INPROCEEDINGS { tran_etal_mlhc16pretern,    AUTHOR = { Tran, Truyen and Luo, Wei and Phung, Dinh and Morris, Jonathan and Rickard, Kristen and Venkatesh, Svetha },    TITLE = { Preterm Birth Prediction: Stable Selection of Interpretable Rules from High Dimensional Data },    BOOKTITLE = { Proceedings of the 1st Machine Learning for Healthcare Conference },    YEAR = { 2016 },    EDITOR = { Finale Doshi-Velez, Jim Fackler, David Kale, Byron Wallace, Jenna Weins },    VOLUME = { 56 },    SERIES = { JMLR Workshop and Conference Proceedings },    PAGES = { 164--177 },    PUBLISHER = { JMLR },    NOTE = { Accepted },    ABSTRACT = { Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. The most well-designed clinical study thus far reaches a modest sensitivity of 18.2–24.2% at specificity of 28.6–33.3%. We take a different approach by exploiting databases of normal hospital operations. We aims are twofold: (i) to derive an easy-to-use, interpretable prediction rule with quantified uncertainties, and (ii) to construct accurate classifiers for preterm birth prediction. Our approach is to automatically generate and select from hundreds (if not thousands) of possible predictors using stability-aware techniques. Derived from a large database of 15,814 women, our simplified prediction rule with only 10 items has sensitivity of 62.3% at specificity of 81.5%. },    FILE = { :tran_etal_mlhc16pretern - Preterm Birth Prediction_ Stable Selection of Interpretable Rules from High Dimensional Data.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.11.02 },    URL = { http://jmlr.org/proceedings/papers/v56/Tran16.html },} C
 Computer Assisted Autism Interventions for India Vellanki, Pratibha, Greenhill, Stewart, Duong, Thi, Phung, Dinh, Venkatesh, Svetha, Godwin, Jayashree, Achary, Kishna V. and Varkey, Blessin. In Proceedings of the 28th Australian Conference on Computer-Human Interaction, pages 618-622, New York, NY, USA, 2016. [ | | pdf] @INPROCEEDINGS { vellanki_etal_ozchi16computer,    AUTHOR = { Vellanki, Pratibha and Greenhill, Stewart and Duong, Thi and Phung, Dinh and Venkatesh, Svetha and Godwin, Jayashree and Achary, Kishna V. and Varkey, Blessin },    TITLE = { Computer Assisted Autism Interventions for {I}ndia },    BOOKTITLE = { Proceedings of the 28th Australian Conference on Computer-Human Interaction },    YEAR = { 2016 },    SERIES = { OzCHI '16 },    PAGES = { 618--622 },    ADDRESS = { New York, NY, USA },    PUBLISHER = { ACM },    ACMID = { 3011007 },    DOI = { 10.1145/3010915.3011007 },    FILE = { :vellanki_etal_ozchi16computer - Computer Assisted Autism Interventions for India.pdf:PDF },    ISBN = { 978-1-4503-4618-4 },    KEYWORDS = { Hindi, India, assistive technology, autism, early intervention, translation },    LOCATION = { Launceston, Tasmania, Australia },    NUMPAGES = { 5 },    URL = { http://doi.acm.org/10.1145/3010915.3011007 },} C
 A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process Nguyen, T., Nguyen, V., Salim, F.D., Le, D.V. and Phung, D.. Pervasive and Mobile Computing (PMC), 2016. [ | | pdf] Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated. @ARTICLE { nguyen_nguyen_flora_le_phung_pmc16simultaneous,    AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Le, D.V. and Phung, D. },    TITLE = { A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested {D}irichlet Process },    JOURNAL = { Pervasive and Mobile Computing (PMC) },    YEAR = { 2016 },    ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated. },    DOI = { http://dx.doi.org/10.1016/j.pmcj.2016.08.019 },    FILE = { :nguyen_nguyen_flora_le_phung_pmc16simultaneous - A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.08.17 },    URL = { http://www.sciencedirect.com/science/article/pii/S1574119216302097 },} J
 Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data Vellanki, Pratibha, Duong, Thi, Gupta, Sunil, Venkatesh, Svetha and Phung, Dinh. Knowledge and Information Systems (KAIS), 2016. [ | | pdf] The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning. @ARTICLE { vellanki_etal_kis16nonparametric,    AUTHOR = { Vellanki, Pratibha and Duong, Thi and Gupta, Sunil and Venkatesh, Svetha and Phung, Dinh },    TITLE = { Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data },    JOURNAL = { Knowledge and Information Systems (KAIS) },    YEAR = { 2016 },    PAGES = { 1--31 },    ISSN = { 0219-3116 },    ABSTRACT = { The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning. },    DOI = { 10.1007/s10115-016-0971-7 },    FILE = { :vellanki_etal_kis16nonparametric - Nonparametric Discovery and Analysis of Learning Patterns and Autism Subgroups from Therapeutic Data.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.08.02 },    URL = { http://dx.doi.org/10.1007/s10115-016-0971-7 },} J
 Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data Gopakumar, Shivapratap, Tran, Truyen, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. JMIR Med Inform, 4(3):e25, Jul 2016. [ | | pdf] Objective: Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods: We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results: Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7\% improvement in mean absolute error, for all days in the year 2014. Conclusions: In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments. @ARTICLE { gopakumar_etal_jmir16forecasting,    AUTHOR = { Gopakumar, Shivapratap and Tran, Truyen and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data },    JOURNAL = { JMIR Med Inform },    YEAR = { 2016 },    VOLUME = { 4 },    NUMBER = { 3 },    PAGES = { e25 },    MONTH = { Jul },    ABSTRACT = { Objective: Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods: We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results: Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7\% improvement in mean absolute error, for all days in the year 2014. Conclusions: In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments. },    DAY = { 21 },    DOI = { 10.2196/medinform.5650 },    FILE = { :gopakumar_etal_jmir16forecasting - Forecasting Daily Patient Outflow from a Ward Having No Real Time Clinical Data.pdf:PDF },    KEYWORDS = { patient flow },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.08.02 },    URL = { http://medinform.jmir.org/2016/3/e25/ },} J
 Control Matching via Discharge Code Sequences Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In Machine Learning for Health @ NIPS 2016, 2016. [ | ] In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. The novel practice of applying the sequential matching on the vector representation lifted the matching accuracy measured through multiple clinical outcomes. We reported the results on a large-scale dataset to demonstrate the effectiveness of our method. For such a large dataset where most clinical information has been codified, the new method is particularly relevant. @CONFERENCE { nguyen_etal_mlh16control,    AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Control Matching via Discharge Code Sequences },    BOOKTITLE = { Machine Learning for Health @ NIPS 2016 },    YEAR = { 2016 },    ABSTRACT = { In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. The novel practice of applying the sequential matching on the vector representation lifted the matching accuracy measured through multiple clinical outcomes. We reported the results on a large-scale dataset to demonstrate the effectiveness of our method. For such a large dataset where most clinical information has been codified, the new method is particularly relevant. },    FILE = { :nguyen_etal_mlh16control - Control Matching Via Discharge Code Sequences.pdf:PDF },    JOURNAL = { arXiv preprint arXiv:1612.01812 },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2017.02.06 },} C
 Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community Dao, Bo, Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In 12th IEEE-RIVF Intl. Conf. on Computing and Communication Technologies, Nov. 2016. (Best Runner-up Student Paper Award). [ | ] Social capital is linked to mental illness. It has been proposed that higher social capital is associated with better mental well-being in both individuals and groups in offline setting. However, in online settings, the association between onlinesocial capital and mental health conditions has not yet been explored. Social media offer us a rich opportunity to determine the link between social capital and aspects of mental wellbeing. In this paper, we examine social capital based on levelsof social connectivity of bloggers can be connected to aspects of depression in individuals and online depression community. We explore apparent properties of textual contents, including expressed emotions, language styles and latent topics, of a largecorpus of blog posts, to analyze the aspect of social capital in the community. Using data collected from online LiveJournal depression community, we apply both statistical tests and machine learning approaches to examine how predictive factors varybetween low and high social capital groups. Significant differences are found between low and high social capital groups when characterized by a set of latent topics, language features derived from blog posts, suggesting discriminative features, proved tobe useful in the classification task. This shows that linguistic styles are better predictors than latent topics as features. The findings indicate the potential of using social media as a sensor for monitoring mental well-being in online settings. @CONFERENCE { dao_etal_rivf16effect,    AUTHOR = { Dao, Bo and Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh },    TITLE = { Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community },    BOOKTITLE = { 12th IEEE-RIVF Intl. Conf. on Computing and Communication Technologies },    YEAR = { 2016 },    MONTH = { Nov. },    NOTE = { Best Runner-up Student Paper Award },    ABSTRACT = { Social capital is linked to mental illness. It has been proposed that higher social capital is associated with better mental well-being in both individuals and groups in offline setting. However, in online settings, the association between onlinesocial capital and mental health conditions has not yet been explored. Social media offer us a rich opportunity to determine the link between social capital and aspects of mental wellbeing. In this paper, we examine social capital based on levelsof social connectivity of bloggers can be connected to aspects of depression in individuals and online depression community. We explore apparent properties of textual contents, including expressed emotions, language styles and latent topics, of a largecorpus of blog posts, to analyze the aspect of social capital in the community. Using data collected from online LiveJournal depression community, we apply both statistical tests and machine learning approaches to examine how predictive factors varybetween low and high social capital groups. Significant differences are found between low and high social capital groups when characterized by a set of latent topics, language features derived from blog posts, suggesting discriminative features, proved tobe useful in the classification task. This shows that linguistic styles are better predictors than latent topics as features. The findings indicate the potential of using social media as a sensor for monitoring mental well-being in online settings. },    FILE = { :dao_etal_rivf16effect - Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.09.10 },} C
 MCNC: Multi-channel Nonparametric Clustering from Heterogeneous Data Nguyen, T-B., Nguyen, V., Venkatesh, S. and Phung, D.. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. (Finalist Best IBM Track 1 Student Paper Award). [ | ] Bayesian nonparametric (BNP) models have recently become popular due to its flexibility in identifying the unknown number of clusters. However, they have difficulties handling heterogeneous data from multiple sources. Existing BNP works either treat each of these sources independently -- hence do not benefit from the correlating information between them, or require to specify data sources as primary or context channels. In this paper, we present a BNP framework, termed MCNC, which has the ability to (1) discover co-patterns from multiple sources; (2) explore multi-channel data simultaneously and equally; (3) automatically identify a suitable number of patterns from data; and (4) handle missing data. The key idea is to tweak the base measure of a BNP model being a product-space. We demonstrate our framework on synthetic and real-world datasets to discover the identity--location--time (a.k.a who--where--when) patterns in two settings of complete and missing data. The experimenal results highlight the effectiveness of our MCNC in both cases of complete and missing data. @CONFERENCE { nguyen_nguyen_venkatesh_phung_icpr16mcnc,    AUTHOR = { Nguyen, T-B. and Nguyen, V. and Venkatesh, S. and Phung, D. },    TITLE = { {MCNC}: Multi-channel Nonparametric Clustering from Heterogeneous Data },    BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },    YEAR = { 2016 },    MONTH = { Dec. },    NOTE = { Finalist Best IBM Track 1 Student Paper Award },    ABSTRACT = { Bayesian nonparametric (BNP) models have recently become popular due to its flexibility in identifying the unknown number of clusters. However, they have difficulties handling heterogeneous data from multiple sources. Existing BNP works either treat each of these sources independently -- hence do not benefit from the correlating information between them, or require to specify data sources as primary or context channels. In this paper, we present a BNP framework, termed MCNC, which has the ability to (1) discover co-patterns from multiple sources; (2) explore multi-channel data simultaneously and equally; (3) automatically identify a suitable number of patterns from data; and (4) handle missing data. The key idea is to tweak the base measure of a BNP model being a product-space. We demonstrate our framework on synthetic and real-world datasets to discover the identity--location--time (a.k.a who--where--when) patterns in two settings of complete and missing data. The experimenal results highlight the effectiveness of our MCNC in both cases of complete and missing data. },    FILE = { :nguyen_nguyen_venkatesh_phung_icpr16mcnc - MCNC_ Multi Channel Nonparametric Clustering from Heterogeneous Data.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.07.13 },} C
 Stable Clinical Prediction using Graph Support Vector Machines Kamkar, Iman, Gupta, Sunil, Li, Cheng, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ] @CONFERENCE { kamkar_gupta_li_phung_venkatesh_icpr16stable,    AUTHOR = { Kamkar, Iman and Gupta, Sunil and Li, Cheng and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Stable Clinical Prediction using Graph {S}upport {V}ector {M}achines },    BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },    YEAR = { 2016 },    MONTH = { Dec. },    FILE = { :kamkar_gupta_li_phung_venkatesh_icpr16stable - Stable Clinical Prediction Using Graph Support Vector Machines.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.07.13 },} C
 Distributed Data Augmented Support Vector Machine on Spark Nguyen, Tu, Nguyen, Vu, Le, Trung and Phung, Dinh. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ] @CONFERENCE { nguyen_nguyen_le_phung_icpr16distributed,    AUTHOR = { Nguyen, Tu and Nguyen, Vu and Le, Trung and Phung, Dinh },    TITLE = { Distributed Data Augmented {S}upport {V}ector {M}achine on {S}park },    BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },    YEAR = { 2016 },    MONTH = { Dec. },    FILE = { :nguyen_nguyen_le_phung_icpr16distributed - Distributed Data Augmented Support Vector Machine on Spark.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.07.13 },} C
 Faster Training of Very Deep Networks via p-Norm Gates Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ] @CONFERENCE { pham_tran_phung_venkatesh_icpr16faster,    AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Faster Training of Very Deep Networks via p-Norm Gates },    BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },    YEAR = { 2016 },    MONTH = { Dec. },    FILE = { :pham_tran_phung_venkatesh_icpr16faster - Faster Training of Very Deep Networks Via P Norm Gates.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.07.13 },} C
 Transfer Learning for Rare Cancer Problems via Discriminative Sparse Gaussian Graphical Model Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ] @CONFERENCE { budhaditya_gupta_phung_venkatesh_icpr16transfer,    AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Transfer Learning for Rare Cancer Problems via Discriminative Sparse {G}aussian Graphical Model },    BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },    YEAR = { 2016 },    MONTH = { Dec. },    FILE = { :budhaditya_gupta_phung_venkatesh_icpr16transfer - Transfer Learning for Rare Cancer Problems Via Discriminative Sparse Gaussian Graphical Model.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.07.13 },} C
 Model-based Classification and Novelty Detection For Point Pattern Data Vo, Ba-Ngu, Tran, Nhat-Quang, Phung, Dinh and Vo, Ba-Tuong. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ] @CONFERENCE { vo_tran_phung_vo_icpr16model,    AUTHOR = { Vo, Ba-Ngu and Tran, Nhat-Quang and Phung, Dinh and Vo, Ba-Tuong },    TITLE = { Model-based Classification and Novelty Detection For Point Pattern Data },    BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },    YEAR = { 2016 },    MONTH = { Dec. },    FILE = { :vo_tran_phung_vo_icpr16model - Model Based Classification and Novelty Detection for Point Pattern Data.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.07.13 },} C
 Clustering For Point Pattern Data Tran, Nhat-Quang, Vo, Ba-Ngu, Phung, Dinh and Vo, Ba-Tuong. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ] @CONFERENCE { tran_vo_phung_vo_icpr16clustering,    AUTHOR = { Tran, Nhat-Quang and Vo, Ba-Ngu and Phung, Dinh and Vo, Ba-Tuong },    TITLE = { Clustering For Point Pattern Data },    BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },    YEAR = { 2016 },    MONTH = { Dec. },    FILE = { :tran_vo_phung_vo_icpr16clustering - Clustering for Point Pattern Data.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.07.13 },} C
 Discriminative cues for different stages of smoking cessation in online community Nguyen, Thin, Borland, Ron, Yearwood, John, Yong, Hua, Venkatesh, Svetha and Phung, Dinh. In 17th Intl. Conf. on Web Information Systems Engineering (WISE), Nov. 2016. [ | ] Smoking is the largest single cause of premature mortality, being responsible for about six million deaths annually worldwide. Most smokers want to quit, but many have problems. The Internet enables people interested in quitting smoking to connect with others via online communities; however, the characteristics of these discussions are not well understood. This work aims to explore the textual cues of an online community interested in quitting smoking: www.reddit.com/r/stopsmoking -- “a place for redditors to motivate each other to quit smoking”. A large corpus of data was crawled including thousand posts made by thousand users within the community. Four subgroups of posts based on the cessation days of abstainers were defined: S0: within the first week, S1: within the first month (excluding cohort S0), S2: from second month to one year, and S3: beyond one year. Psycho-linguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the first week S0 from the other subgroups. Topics and psycho-linguistic features were found to be highly valid predictors of the subgroups. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in studies of smoking and other addictions in online settings. @INPROCEEDINGS { nguyen_etal_wise16discriminative,    AUTHOR = { Nguyen, Thin and Borland, Ron and Yearwood, John and Yong, Hua and Venkatesh, Svetha and Phung, Dinh },    TITLE = { Discriminative cues for different stages of smoking cessation in online community },    BOOKTITLE = { 17th Intl. Conf. on Web Information Systems Engineering (WISE) },    YEAR = { 2016 },    SERIES = { Lecture Notes in Computer Science },    MONTH = { Nov. },    PUBLISHER = { Springer International Publishing },    ABSTRACT = { Smoking is the largest single cause of premature mortality, being responsible for about six million deaths annually worldwide. Most smokers want to quit, but many have problems. The Internet enables people interested in quitting smoking to connect with others via online communities; however, the characteristics of these discussions are not well understood. This work aims to explore the textual cues of an online community interested in quitting smoking: www.reddit.com/r/stopsmoking -- “a place for redditors to motivate each other to quit smoking”. A large corpus of data was crawled including thousand posts made by thousand users within the community. Four subgroups of posts based on the cessation days of abstainers were defined: S0: within the first week, S1: within the first month (excluding cohort S0), S2: from second month to one year, and S3: beyond one year. Psycho-linguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the first week S0 from the other subgroups. Topics and psycho-linguistic features were found to be highly valid predictors of the subgroups. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in studies of smoking and other addictions in online settings. },    FILE = { :nguyen_etal_wise16discriminative - Discriminative Cues for Different Stages of Smoking Cessation in Online Community.pdf:PDF },    LANGUAGE = { English },    OWNER = { thinng },    TIMESTAMP = { 2016.07.14 },} C
 Large-scale stylistic analysis of formality in academia and social media Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In 17th Intl. Conf. on Web Information Systems Engineering (WISE), Nov. 2016. [ | ] The dictum publish or perish' has influenced the way scientists present research results as to get published, including exaggeration and overstatement of research findings. This behavior emerges patterns of using language in academia. For example, recently it has been found that the proportion of positive words has risen in the content of scientific articles over the last 40 years, which probably shows the tendency in scientists to exaggerate and overstate their research results. The practice may deviate from impersonal and formal style of academic writing. In this study the degree of formality in scientific articles is investigated through a corpus of 14 million PubMed abstracts. Three aspects of stylistic features are explored: expressing emotional information, using first person pronouns to refer to the authors, and mixing English varieties. The aspects are compared with that of online user-generated media, including online encyclopedias, web-logs, forums, and micro-blogs. Trends of these stylistic features in scientific publications for the last four decades are also discovered. Advances in cluster computing are employed to process large scale data, with 5.8 terabytes and 3.6 billions of data points from all the media. The results suggest the potential of pattern recognition in data at scale. @INPROCEEDINGS { nguyen_etal_wise16LargeScale,    AUTHOR = { Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh },    TITLE = { Large-scale stylistic analysis of formality in academia and social media },    BOOKTITLE = { 17th Intl. Conf. on Web Information Systems Engineering (WISE) },    YEAR = { 2016 },    SERIES = { Lecture Notes in Computer Science },    MONTH = { Nov. },    PUBLISHER = { Springer International Publishing },    ABSTRACT = { The dictum publish or perish' has influenced the way scientists present research results as to get published, including exaggeration and overstatement of research findings. This behavior emerges patterns of using language in academia. For example, recently it has been found that the proportion of positive words has risen in the content of scientific articles over the last 40 years, which probably shows the tendency in scientists to exaggerate and overstate their research results. The practice may deviate from impersonal and formal style of academic writing. In this study the degree of formality in scientific articles is investigated through a corpus of 14 million PubMed abstracts. Three aspects of stylistic features are explored: expressing emotional information, using first person pronouns to refer to the authors, and mixing English varieties. The aspects are compared with that of online user-generated media, including online encyclopedias, web-logs, forums, and micro-blogs. Trends of these stylistic features in scientific publications for the last four decades are also discovered. Advances in cluster computing are employed to process large scale data, with 5.8 terabytes and 3.6 billions of data points from all the media. The results suggest the potential of pattern recognition in data at scale. },    FILE = { :nguyen_etal_wise16LargeScale - Large Scale Stylistic Analysis of Formality in Academia and Social Media.pdf:PDF },    LANGUAGE = { English },    OWNER = { thinng },    TIMESTAMP = { 2016.07.14 },} C
 Learning Multifaceted Latent Activities from Heterogeneous Mobile Data Nguyen, Thanh-Binh, Nguyen, Vu, Nguyen, Thuong, Venkatesh, Svetha, Kumar, Mohan and Phung, Dinh. In 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA), Oct. 2016. [ | ] Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, none of existing methods has addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudenfLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods. @INPROCEEDINGS { nguyen_etal_dsaa16learning,    AUTHOR = { Nguyen, Thanh-Binh and Nguyen, Vu and Nguyen, Thuong and Venkatesh, Svetha and Kumar, Mohan and Phung, Dinh },    TITLE = { Learning Multifaceted Latent Activities from Heterogeneous Mobile Data },    BOOKTITLE = { 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA) },    YEAR = { 2016 },    MONTH = { Oct. },    ABSTRACT = { Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, none of existing methods has addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudenfLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods. },    FILE = { :nguyen_etal_dsaa16learning - Learning Multifaceted Latent Activities from Heterogeneous Mobile Data.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.08.01 },} C
 Analysing the History of Autism Spectrum Disorder using Topic Models Beykikhoshk, Adham, Arandjelovi\'{c}, Ognjen, Venkatesh, Svetha and Phung, Dinh. In 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA), Oct. 2016. [ | ] We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data, as the topic nuances in science result in new scientific directions to emerge, a new approach to model the longitudinal literature data is using topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the they are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examining two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to aid other researchers by analysing the results or applying the model to their data collections. @INPROCEEDINGS { beykikhoshk_etal_dsaa16analysing,    AUTHOR = { Beykikhoshk, Adham and Arandjelovi\'{c}, Ognjen and Venkatesh, Svetha and Phung, Dinh },    TITLE = { Analysing the History of Autism Spectrum Disorder using Topic Models },    BOOKTITLE = { 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA) },    YEAR = { 2016 },    MONTH = { Oct. },    ABSTRACT = { We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data, as the topic nuances in science result in new scientific directions to emerge, a new approach to model the longitudinal literature data is using topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the they are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examining two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to aid other researchers by analysing the results or applying the model to their data collections. },    FILE = { :beykikhoshk_etal_dsaa16analysing - Analysing the History of Autism Spectrum Disorder Using Topic Models.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.08.01 },} C
 A Framework for Mixed-type Multi-outcome Prediction with Applications in Healthcare Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), July 2016. [ | ] @ARTICLE { budhaditya_gupta_phung_venkatesh_jbhi16framework,    AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },    TITLE = { A Framework for Mixed-type Multi-outcome Prediction with Applications in Healthcare },    JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) },    YEAR = { 2016 },    MONTH = { July },    FILE = { :budhaditya_gupta_phung_venkatesh_jbhi16framework - A Framework for Mixed Type Multi Outcome Prediction with Applications in Healthcare.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.07.13 },} J
 Discovering Latent Affective Transitions among Individuals in Online Mental Health­related Communities. Dao, Bo, Thin Nguyen, Venkatesh, Svetha and Phung, Dinh. In IEEE Intl. Conf. on Multimedia and Expo (ICME), Seatle, USA, July 2016. [ | ] The discovery of latent affective patterns of individuals with affective disorders will potentially enhance the diagnosis and treatment of mental disorders. This paper studies the phenomena of affective transitions among individuals in online mental health communities. We apply non-negative matrix factorization model to extract the common and individual factors of affective transitions across groups of individuals in different levels of affective disorders. We examine the latent patterns of emotional transitions and investigate the effects of emotional transitions across the cohorts. We establish a novel framework of utilizing social media as sensors of mood and emotional transitions. This work might suggest the base of new systems to screen individuals and communities at high risks of mental health problems in online settings. @INPROCEEDINGS { dao_nguyen_venkatesh_phung_icme16,    AUTHOR = { Dao, Bo and Thin Nguyen and Venkatesh, Svetha and Phung, Dinh },    TITLE = { Discovering Latent Affective Transitions among Individuals in Online Mental Health­related Communities. },    BOOKTITLE = { IEEE Intl. Conf. on Multimedia and Expo (ICME) },    YEAR = { 2016 },    ADDRESS = { Seatle, USA },    MONTH = { July },    PUBLISHER = { IEEE },    ABSTRACT = { The discovery of latent affective patterns of individuals with affective disorders will potentially enhance the diagnosis and treatment of mental disorders. This paper studies the phenomena of affective transitions among individuals in online mental health communities. We apply non-negative matrix factorization model to extract the common and individual factors of affective transitions across groups of individuals in different levels of affective disorders. We examine the latent patterns of emotional transitions and investigate the effects of emotional transitions across the cohorts. We establish a novel framework of utilizing social media as sensors of mood and emotional transitions. This work might suggest the base of new systems to screen individuals and communities at high risks of mental health problems in online settings. },    FILE = { :dao_nguyen_venkatesh_phung_icme16 - Discovering Latent Affective Transitions among Individuals in Online Mental Health­related Communities..pdf:PDF },    OWNER = { dbdao },    TIMESTAMP = { 2016.03.20 },} C
 Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records Li, Cheng, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. Knowledge-Based Systems (KBS), 99(1):168 - 182, May 2016. [ | | pdf] Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital \{EMR\} dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from \{EMR\} data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using \{MCMC\} technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets – PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. @ARTICLE { li_rana_phung_venkatesh_kbs16hierarchical,    AUTHOR = { Li, Cheng and Rana, Santu and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Hierarchical {B}ayesian nonparametric models for knowledge discovery from electronic medical records },    JOURNAL = { Knowledge-Based Systems (KBS) },    YEAR = { 2016 },    VOLUME = { 99 },    NUMBER = { 1 },    PAGES = { 168 - 182 },    MONTH = { May },    ISSN = { 0950-7051 },    ABSTRACT = { Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital \{EMR\} dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from \{EMR\} data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using \{MCMC\} technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets – PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. },    DOI = { http://dx.doi.org/10.1016/j.knosys.2016.02.005 },    FILE = { :li_rana_phung_venkatesh_kbs16hierarchical - Hierarchical Bayesian Nonparametric Models for Knowledge Discovery from Electronic Medical Records.pdf:PDF },    KEYWORDS = { Bayesian nonparametric models; Correspondence models; Word distances; Disease topics; Readmission prediction; Procedure codes prediction },    URL = { http://www.sciencedirect.com/science/article/pii/S0950705116000836 },} J
 Learning Multi-faceted Activities from Heterogeneous Data with the Product Space Hierarchical Dirichlet Processes Nguyen, T-B., Nguyen, V., Venkatesh, S. and Phung, D.. In 3rd PAKDD Workshop on Machine Learning for Sensory Data Analysis (MLSDA), pages 128-140, April 2016. [ | ] Hierarchical Dirichlet processes (HDP) was originally designed and experimented for a single data channel. In this paper we enhanced its ability to model heterogeneous data using a richer structure for the base measure being a product-space. The enhanced model, called Product Space HDP (PS-HDP), can (1) simultaneously model heterogeneous data from multiple sources in a Bayesian nonparametric framework, hence inherit its strengths and advantages including the ability to automatically grow the model complexity and (2) discover multilevel latent structures from data to result in different types of topics/latent structures that can be explained jointly. We experimented with the MDC dataset, a large and real-world data collected from mobile phones. Our goal was to discover identity--location--time (a.k.a who-where-when) patterns at different levels (globally for all groups and locally for each group). We provided analysis on the activities and patterns learned from our model, visualized, compared and contrasted with the ground-truth to demonstrate the merit of the proposed framework. We further quantitatively evaluated and reported its performance using standard metrics including F1-score, NMI, RI, and purity. We also compared the performance of the PS-HDP model with those of popular existing clustering methods (including K-Means, NNMF, GMM, DP-Means, and AP). Lastly, we demonstrate the ability of the model in learning activities with missing data, a common problem encountered in pervasive and ubiquitous computing applications. @INPROCEEDINGS { nguyen_nguyen_venkatesh_phung_mlsda16learning,    AUTHOR = { Nguyen, T-B. and Nguyen, V. and Venkatesh, S. and Phung, D. },    TITLE = { Learning Multi-faceted Activities from Heterogeneous Data with the Product Space Hierarchical {D}irichlet Processes },    BOOKTITLE = { 3rd PAKDD Workshop on Machine Learning for Sensory Data Analysis (MLSDA) },    YEAR = { 2016 },    PAGES = { 128--140 },    MONTH = { April },    ABSTRACT = { Hierarchical Dirichlet processes (HDP) was originally designed and experimented for a single data channel. In this paper we enhanced its ability to model heterogeneous data using a richer structure for the base measure being a product-space. The enhanced model, called Product Space HDP (PS-HDP), can (1) simultaneously model heterogeneous data from multiple sources in a Bayesian nonparametric framework, hence inherit its strengths and advantages including the ability to automatically grow the model complexity and (2) discover multilevel latent structures from data to result in different types of topics/latent structures that can be explained jointly. We experimented with the MDC dataset, a large and real-world data collected from mobile phones. Our goal was to discover identity--location--time (a.k.a who-where-when) patterns at different levels (globally for all groups and locally for each group). We provided analysis on the activities and patterns learned from our model, visualized, compared and contrasted with the ground-truth to demonstrate the merit of the proposed framework. We further quantitatively evaluated and reported its performance using standard metrics including F1-score, NMI, RI, and purity. We also compared the performance of the PS-HDP model with those of popular existing clustering methods (including K-Means, NNMF, GMM, DP-Means, and AP). Lastly, we demonstrate the ability of the model in learning activities with missing data, a common problem encountered in pervasive and ubiquitous computing applications. },    FILE = { :nguyen_nguyen_venkatesh_phung_mlsda16learning - Learning Multi Faceted Activities from Heterogeneous Data with the Product Space Hierarchical Dirichlet Processes.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },} C
 Neural Choice by Elimination via Highway Networks Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 5th PAKDD Workshop on Biologically Inspired Data Mining Techniques, April 2016. [ | ] We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods. @INPROCEEDINGS { tran_phung_venkatesh_bmd16neural,    AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Neural Choice by Elimination via Highway Networks },    BOOKTITLE = { 5th PAKDD Workshop on Biologically Inspired Data Mining Techniques },    YEAR = { 2016 },    MONTH = { April },    ABSTRACT = { We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods. },    FILE = { :tran_phung_venkatesh_bmd16neural - Neural Choice by Elimination Via Highway Networks.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },} C
 DeepCare: A Deep Dynamic Memory Model for Predictive Medicine Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 30-41, April 2016. [ | | pdf] Personalized predictive medicine necessitates modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, a deep dynamic neural network that reads medical records and predicts future medical outcomes. At the data level, DeepCare models patient health state trajectories with explicit memory of illness. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling and readmission prediction in diabetes, a chronic disease with large economic burden. The results show improved modeling and risk prediction accuracy. @CONFERENCE { pham_tran_phung_venkatesh_pakdd16deepcare,    AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { {DeepCare}: A Deep Dynamic Memory Model for Predictive Medicine },    BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) },    YEAR = { 2016 },    VOLUME = { 9652 },    SERIES = { Lecture Notes in Computer Science },    PAGES = { 30--41 },    MONTH = { April },    PUBLISHER = { Springer International Publishing },    ABSTRACT = { Personalized predictive medicine necessitates modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, a deep dynamic neural network that reads medical records and predicts future medical outcomes. At the data level, DeepCare models patient health state trajectories with explicit memory of illness. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling and readmission prediction in diabetes, a chronic disease with large economic burden. The results show improved modeling and risk prediction accuracy. },    DOI = { 10.1007/978-3-319-31750-2_3 },    FILE = { :pham_tran_phung_venkatesh_pakdd16deepcare - DeepCare_ a Deep Dynamic Memory Model for Predictive Medicine.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },    URL = { http://link.springer.com/chapter/10.1007/978-3-319-31750-2_3 },} C
 Sparse Adaptive Multi-Hyperplane Machine Nguyen, Khanh, Le, Trung, Nguyen, Vu and Phung, Dinh. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 27-39, April 2016. [ | | pdf] The Adaptive Multiple-hyperplane Machine (AMM) was recently proposed to deal with large-scale datasets. However, it has no principle to tune the complexity and sparsity levels of the solution. Addressing the sparsity is important to improve learning generalization, prediction accuracy and computational speedup. In this paper, we employ the max-margin principle and sparse approach to propose a new Sparse AMM (SAMM). We solve the new optimization objective function with stochastic gradient descent (SGD). Besides inheriting the good features of SGD-based learning method and the original AMM, our proposed Sparse AMM provides machinery and flexibility to tune the complexity and sparsity of the solution, making it possible to avoid overfitting and underfitting. We validate our approach on several large benchmark datasets. We show that with the ability to control sparsity, the proposed Sparse AMM yields superior classification accuracy to the original AMM while simultaneously achieving computational speedup. @CONFERENCE { nguyen_le_nguyen_phung_pakdd16sparse,    AUTHOR = { Nguyen, Khanh and Le, Trung and Nguyen, Vu and Phung, Dinh },    TITLE = { Sparse Adaptive Multi-Hyperplane Machine },    BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) },    YEAR = { 2016 },    VOLUME = { 9651 },    SERIES = { Lecture Notes in Computer Science },    PAGES = { 27--39 },    MONTH = { April },    PUBLISHER = { Springer International Publishing },    ABSTRACT = { The Adaptive Multiple-hyperplane Machine (AMM) was recently proposed to deal with large-scale datasets. However, it has no principle to tune the complexity and sparsity levels of the solution. Addressing the sparsity is important to improve learning generalization, prediction accuracy and computational speedup. In this paper, we employ the max-margin principle and sparse approach to propose a new Sparse AMM (SAMM). We solve the new optimization objective function with stochastic gradient descent (SGD). Besides inheriting the good features of SGD-based learning method and the original AMM, our proposed Sparse AMM provides machinery and flexibility to tune the complexity and sparsity of the solution, making it possible to avoid overfitting and underfitting. We validate our approach on several large benchmark datasets. We show that with the ability to control sparsity, the proposed Sparse AMM yields superior classification accuracy to the original AMM while simultaneously achieving computational speedup. },    DOI = { 10.1007/978-3-319-31753-3_3 },    FILE = { :nguyen_le_nguyen_phung_pakdd16sparse - Sparse Adaptive Multi Hyperplane Machine.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },    URL = { http://link.springer.com/chapter/10.1007/978-3-319-31753-3_3 },} C
 Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi-Task Framework Li, Cheng, Gupta, Sunil, Rana, Santu, Luo, Wei, Venkatesh, Svetha, Ashely, David and Phung, Dinh. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 152-164, April 2016. [ | | pdf] Treatments of cancer cause severe side effects called toxicities. Reduction of such effects is crucial in cancer care. To impact care, we need to predict toxicities at fortnightly intervals. This toxicity data differs from traditional time series data as toxicities can be caused by one treatment on a given day alone, and thus it is necessary to consider the effect of the singular data vector causing toxicity. We model the data before prediction points using the multiple instance learning, where each bag is composed of multiple instances associated with daily treatments and patient-specific attributes, such as chemotherapy, radiotherapy, age and cancer types. We then formulate a Bayesian multi-task framework to enhance toxicity prediction at each prediction point. The use of the prior allows factors to be shared across task predictors. Our proposed method simultaneously captures the heterogeneity of daily treatments and performs toxicity prediction at different prediction points. Our method was evaluated on a real-word dataset of more than 2000 cancer patients and had achieved a better prediction accuracy in terms of AUC than the state-of-art baselines. @CONFERENCE { li_gupta_rana_luo_venkatesh_ashley_phung_pakdd16toxicity,    AUTHOR = { Li, Cheng and Gupta, Sunil and Rana, Santu and Luo, Wei and Venkatesh, Svetha and Ashely, David and Phung, Dinh },    TITLE = { Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi-Task Framework },    BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) },    YEAR = { 2016 },    PAGES = { 152--164 },    MONTH = { April },    PUBLISHER = { Springer },    ABSTRACT = { Treatments of cancer cause severe side effects called toxicities. Reduction of such effects is crucial in cancer care. To impact care, we need to predict toxicities at fortnightly intervals. This toxicity data differs from traditional time series data as toxicities can be caused by one treatment on a given day alone, and thus it is necessary to consider the effect of the singular data vector causing toxicity. We model the data before prediction points using the multiple instance learning, where each bag is composed of multiple instances associated with daily treatments and patient-specific attributes, such as chemotherapy, radiotherapy, age and cancer types. We then formulate a Bayesian multi-task framework to enhance toxicity prediction at each prediction point. The use of the prior allows factors to be shared across task predictors. Our proposed method simultaneously captures the heterogeneity of daily treatments and performs toxicity prediction at different prediction points. Our method was evaluated on a real-word dataset of more than 2000 cancer patients and had achieved a better prediction accuracy in terms of AUC than the state-of-art baselines. },    DOI = { 10.​1007/​978-3-319-31753-3_​13 },    FILE = { :li_gupta_rana_luo_venkatesh_ashley_phung_pakdd16toxicity - Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi Task Framework.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },    URL = { http://link.springer.com/chapter/10.1007/978-3-319-31753-3_13 },} C
 Modelling Human Preferences for Ranking and Collaborative Filtering: A Probabilistic Ordered Partition Approach Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 47(1):157-188, April 2016. [ | | pdf] @ARTICLE { tran_phung_venkatesh_kais16,    AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Modelling Human Preferences for Ranking and Collaborative Filtering: A Probabilistic Ordered Partition Approach },    JOURNAL = { Knowledge and Information Systems (KAIS) },    YEAR = { 2016 },    VOLUME = { 47 },    NUMBER = { 1 },    PAGES = { 157--188 },    MONTH = { April },    DOI = { 10.1007/s10115-015-0840-9 },    FILE = { :tran_phung_venkatesh_kais16 - Modelling Human Preferences for Ranking and Collaborative Filtering_ a Probabilistic Ordered Partition Approach.pdf:PDF },    KEYWORDS = { Preference learning Learning-to-rank Collaborative filtering Probabilistic ordered partition model Set-based ranking Probabilistic reasoning },    OWNER = { Dinh },    TIMESTAMP = { 2015.03.02 },    URL = { http://link.springer.com/article/10.1007%2Fs10115-015-0840-9 },} J
 Consistency of the Health of the Nation Outcome Scales (HoNOS) at inpatient-to-community transition Luo, Wei, Harvey, Richard, Tran, Truyen, Phung, Dinh, Venkatesh, Svetha and Connor, Jason P. BMJ open, 6(4):e010732, April 2016. [ | | pdf] Objectives The Health of the Nation Outcome Scales (HoNOS) are mandated outcome-measures in many mental-health jurisdictions. When HoNOS are used in different care settings, it is important to assess if setting specific bias exists. This article examines the consistency of HoNOS in a sample of psychiatric patients transitioned from acute inpatient care and community centres.Setting A regional mental health service with both acute and community facilities.Participants 111 psychiatric patients were transferred from inpatient care to community care from 2012 to 2014. Their HoNOS scores were extracted from a clinical database; Each inpatient-discharge assessment was followed by a community-intake assessment, with the median period between assessments being 4 days (range 0–14). Assessor experience and professional background were recorded.Primary and secondary outcome measures The difference of HoNOS at inpatient-discharge and community-intake were assessed with Pearson correlation, Cohen's κ and effect size.Results Inpatient-discharge HoNOS was on average lower than community-intake HoNOS. The average HoNOS was 8.05 at discharge (median 7, range 1–22), and 12.16 at intake (median 12, range 1–25), an average increase of 4.11 (SD 6.97). Pearson correlation between two total scores was 0.073 (95% CI −0.095 to 0.238) and Cohen's κ was 0.02 (95% CI −0.02 to 0.06). Differences did not appear to depend on assessor experience or professional background.Conclusions Systematic change in the HoNOS occurs at inpatient-to-community transition. Some caution should be exercised in making direct comparisons between inpatient HoNOS and community HoNOS scores. @ARTICLE { luo_harvey_tran_phung_venkatesh_connor_bmj16consistency,    AUTHOR = { Luo, Wei and Harvey, Richard and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha and Connor, Jason P },    TITLE = { Consistency of the Health of the Nation Outcome Scales ({HoNOS}) at inpatient-to-community transition },    JOURNAL = { BMJ open },    YEAR = { 2016 },    VOLUME = { 6 },    NUMBER = { 4 },    PAGES = { e010732 },    MONTH = { April },    ABSTRACT = { Objectives The Health of the Nation Outcome Scales (HoNOS) are mandated outcome-measures in many mental-health jurisdictions. When HoNOS are used in different care settings, it is important to assess if setting specific bias exists. This article examines the consistency of HoNOS in a sample of psychiatric patients transitioned from acute inpatient care and community centres.Setting A regional mental health service with both acute and community facilities.Participants 111 psychiatric patients were transferred from inpatient care to community care from 2012 to 2014. Their HoNOS scores were extracted from a clinical database; Each inpatient-discharge assessment was followed by a community-intake assessment, with the median period between assessments being 4 days (range 0–14). Assessor experience and professional background were recorded.Primary and secondary outcome measures The difference of HoNOS at inpatient-discharge and community-intake were assessed with Pearson correlation, Cohen's κ and effect size.Results Inpatient-discharge HoNOS was on average lower than community-intake HoNOS. The average HoNOS was 8.05 at discharge (median 7, range 1–22), and 12.16 at intake (median 12, range 1–25), an average increase of 4.11 (SD 6.97). Pearson correlation between two total scores was 0.073 (95% CI −0.095 to 0.238) and Cohen's κ was 0.02 (95% CI −0.02 to 0.06). Differences did not appear to depend on assessor experience or professional background.Conclusions Systematic change in the HoNOS occurs at inpatient-to-community transition. Some caution should be exercised in making direct comparisons between inpatient HoNOS and community HoNOS scores. },    DOI = { 10.1136/bmjopen-2015-010732 },    FILE = { :luo_harvey_tran_phung_venkatesh_connor_bmj16consistency - Consistency of the Health of the Nation Outcome Scales (HoNOS) at Inpatient to Community Transition.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    PUBLISHER = { British Medical Journal Publishing Group },    TIMESTAMP = { 2016.05.10 },    URL = { http://bmjopen.bmj.com/content/6/4/e010732.full },} J
 A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression Saha, Budhaditya, Nguyen, Thin, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), PP(99):1-1, March 2016. [ | | pdf] Mental illness has a deep impact on individuals, families, and by extension, society as a whole. Social networks allow individuals with mental disorders to communicate with others sufferers via online communities, providing an invaluable resource for studies on textual signs of psychological health problems. Mental disorders often occur in combinations, e.g., a patient with an anxiety disorder may also develop depression. This co-occurring mental health condition provides the focus for our work on classifying online communities with an interest in depression. For this, we have crawled a large body of 620,000 posts made by 80,000 users in 247 online communities. We have extracted the topics and psycho-linguistic features expressed in the posts, using these as inputs to our model. Following a machine learning technique, we have formulated a joint modelling framework in order to classify mental health-related co-occurring online communities from these features. Finally, we performed empirical validation of the model on the crawled dataset where our model outperforms recent state-of-the-art baselines. @ARTICLE { budhaditya_nguyen_phung_venkatesh_bhi16framework,    AUTHOR = { Saha, Budhaditya and Nguyen, Thin and Phung, Dinh and Venkatesh, Svetha },    TITLE = { A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression },    JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) },    YEAR = { 2016 },    VOLUME = { PP },    NUMBER = { 99 },    PAGES = { 1-1 },    MONTH = { March },    ISSN = { 2168-2194 },    ABSTRACT = { Mental illness has a deep impact on individuals, families, and by extension, society as a whole. Social networks allow individuals with mental disorders to communicate with others sufferers via online communities, providing an invaluable resource for studies on textual signs of psychological health problems. Mental disorders often occur in combinations, e.g., a patient with an anxiety disorder may also develop depression. This co-occurring mental health condition provides the focus for our work on classifying online communities with an interest in depression. For this, we have crawled a large body of 620,000 posts made by 80,000 users in 247 online communities. We have extracted the topics and psycho-linguistic features expressed in the posts, using these as inputs to our model. Following a machine learning technique, we have formulated a joint modelling framework in order to classify mental health-related co-occurring online communities from these features. Finally, we performed empirical validation of the model on the crawled dataset where our model outperforms recent state-of-the-art baselines. },    DOI = { 10.1109/JBHI.2016.2543741 },    FILE = { :budhaditya_nguyen_phung_venkatesh_bhi16framework - A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression.pdf:PDF },    KEYWORDS = { Blogs;Correlation;Covariance matrices;Feature extraction;Informatics;Media;Pragmatics },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },    URL = { http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7436759&tag=1 },} J
 A new transfer learning framework with application to model-agnostic multi-task learning Gupta, Sunil, Rana, Santu, Saha, Budhaditya, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), February 2016. [ | | pdf] Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner's choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines. @ARTICLE { gupta_rana_budhaditya_phung_venkatesh_kais16newtransfer,    AUTHOR = { Gupta, Sunil and Rana, Santu and Saha, Budhaditya and Phung, Dinh and Venkatesh, Svetha },    TITLE = { A new transfer learning framework with application to model-agnostic multi-task learning },    JOURNAL = { Knowledge and Information Systems (KAIS) },    YEAR = { 2016 },    PAGES = { 1--41 },    MONTH = { February },    ISSN = { 0219-3116 },    ABSTRACT = { Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner's choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines. },    DOI = { 10.1007/s10115-016-0926-z },    FILE = { :gupta_rana_budhaditya_phung_venkatesh_kais16newtransfer - A New Transfer Learning Framework with Application to Model Agnostic Multi Task Learning.pdf:PDF },    KEYWORDS = { Multi-task learning Model-agnostic framework Meta algorithm Classification Regression },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.10 },    URL = { http://dx.doi.org/10.1007/s10115-016-0926-z },} J
 Multiple Task Transfer Learning with Small Sample Sizes Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information System (KAIS), 46(2):315-342, Feb. 2016. [ | | pdf] Prognosis, such as predicting mortality, is common in medicine. Whenconfronted with small numbers of samples, as in rare medical conditions,the task is challenging. We propose a framework for classificationwith data with small numbers of samples. Conceptually our solutionis a hybrid of multi-task and transfer learning, employing data samplesfrom source tasks as in transfer learning, but considering all taskstogether as in multi-tasklearning. Each task is modelled jointly with other related tasks bydirectly augmenting the data from other tasks. The degree of augmentationdepends on the task relatedness and is estimated directly from thedata. We apply the model on three diverse real-world datasets (healthcaredata, handwritten digit data and face data) and show that our methodoutperforms several state-of-the-art multi-task learning baselines.We extend the model for online multi-task learning where the modelparameters are incrementally updated given new data or new tasks.The novelty of our method lies in offering a hybrid multi-task/transferlearning model to exploit sharing across tasks at the data-leveland joint parameter learning. @ARTICLE { budhaditya_gupta_venkatesh_phung_kais16multiple,    AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Multiple Task Transfer Learning with Small Sample Sizes },    JOURNAL = { Knowledge and Information System (KAIS) },    YEAR = { 2016 },    VOLUME = { 46 },    NUMBER = { 2 },    PAGES = { 315--342 },    MONTH = { Feb. },    ABSTRACT = { Prognosis, such as predicting mortality, is common in medicine. Whenconfronted with small numbers of samples, as in rare medical conditions,the task is challenging. We propose a framework for classificationwith data with small numbers of samples. Conceptually our solutionis a hybrid of multi-task and transfer learning, employing data samplesfrom source tasks as in transfer learning, but considering all taskstogether as in multi-tasklearning. Each task is modelled jointly with other related tasks bydirectly augmenting the data from other tasks. The degree of augmentationdepends on the task relatedness and is estimated directly from thedata. We apply the model on three diverse real-world datasets (healthcaredata, handwritten digit data and face data) and show that our methodoutperforms several state-of-the-art multi-task learning baselines.We extend the model for online multi-task learning where the modelparameters are incrementally updated given new data or new tasks.The novelty of our method lies in offering a hybrid multi-task/transferlearning model to exploit sharing across tasks at the data-leveland joint parameter learning. },    DOI = { 10.1007/s10115-015-0821-z },    FILE = { :budhaditya_gupta_venkatesh_phung_kais16multiple - Multiple Task Transfer Learning with Small Sample Sizes.pdf:PDF },    KEYWORDS = { Multi-task Transfer learning Optimization Healthcare Data mining Statistical analysis },    OWNER = { dinh },    TIMESTAMP = { 2015.06.10 },    URL = { http://link.springer.com/article/10.1007/s10115-015-0821-z },} J
 Stabilizing L1-norm Prediction Models by Supervised Feature Grouping Kamkar, Iman, Gupta, Sunil Kumar, Phung, Dinh and Venkatesh, Svetha. Journal of Biomedical Informatics (JBI), 59(C):149 -168, Feb. 2016. [ | | pdf] Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l 1 -norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. @ARTICLE { kamkar_gupta_phung_venkatesh_16stabilizing,    AUTHOR = { Kamkar, Iman and Gupta, Sunil Kumar and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Stabilizing L1-norm Prediction Models by Supervised Feature Grouping },    JOURNAL = { Journal of Biomedical Informatics (JBI) },    YEAR = { 2016 },    VOLUME = { 59 },    NUMBER = { C },    PAGES = { 149 --168 },    MONTH = { Feb. },    ISSN = { 1532-0464 },    ABSTRACT = { Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l 1 -norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. },    DOI = { http://dx.doi.org/10.1016/j.jbi.2015.11.012 },    FILE = { :kamkar_gupta_phung_venkatesh_16stabilizing - Stabilizing L1 Norm Prediction Models by Supervised Feature Grouping.pdf:PDF },    KEYWORDS = { Feature selection, Lasso, Stability, Supervised feature grouping },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },    URL = { http://www.sciencedirect.com/science/article/pii/S1532046415002804 },} J
 Graph-induced restricted Boltzmann machines for document modeling Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Information Sciences, 328(C):60-75, Jan. 2016. [ | | pdf] Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) – where the underlying graphical model is an undirected bipartite graph. Inference is efficient – document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy. @ARTICLE { nguyen_tran_phung_venkatesh_jis16graph,    AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Graph-induced restricted {B}oltzmann machines for document modeling },    JOURNAL = { Information Sciences },    YEAR = { 2016 },    VOLUME = { 328 },    NUMBER = { C },    PAGES = { 60--75 },    MONTH = { Jan. },    ABSTRACT = { Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) – where the underlying graphical model is an undirected bipartite graph. Inference is efficient – document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy. },    DOI = { doi:10.1016/j.ins.2015.08.023 },    FILE = { :nguyen_tran_phung_venkatesh_jis16graph - Graph Induced Restricted Boltzmann Machines for Document Modeling.pdf:PDF },    KEYWORDS = { Document modeling, Feature group discovery, Restricted Boltzmann machine, Topic coherence, Word graphs },    OWNER = { dinh },    PUBLISHER = { Elsevier },    TIMESTAMP = { 2015.09.16 },    URL = { http://dx.doi.org/10.1016/j.ins.2015.08.023 },} J
 2015
 Differentiating sub-groups of online depression-related communities using textual cues Nguyen, Thin, O'Dea, Bridianne, Larsen, Mark, Phung, Dinh, Venkatesh, Svetha and Christensen, Helen. In Intl. Conf. on Web Information Systems Engineering (WISE), pages 216-224, Dec. 2015. [ | | pdf] Depression is a highly prevalent mental illness and is a comorbidity of other mental and behavioural disorders. The Internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these online conversations and the language styles of those interested in depression have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A random sample of 5,000 blog posts was crawled. Five groupings were identified: depression, bipolar, self-harm, grief, and suicide. Independent variables included psycholinguistic processes and content topics extracted from the posts. Machine learning techniques were used to discriminate messages posted in the depression sub-group from the others. Good predictive validity in depression classification using topics and psycholinguistic clues as features was found. Clear discrimination between writing styles and content, with good predictive power is an important step in understanding social media and its use in mental health. @INPROCEEDINGS { nguyen_odea_larsen_phung_venkatesh_christensen_wise15differentiating,    AUTHOR = { Nguyen, Thin and O'Dea, Bridianne and Larsen, Mark and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen },    TITLE = { Differentiating sub-groups of online depression-related communities using textual cues },    BOOKTITLE = { Intl. Conf. on Web Information Systems Engineering (WISE) },    YEAR = { 2015 },    VOLUME = { 9419 },    SERIES = { Lecture Notes in Computer Science },    PAGES = { 216--224 },    MONTH = { Dec. },    PUBLISHER = { Springer },    ABSTRACT = { Depression is a highly prevalent mental illness and is a comorbidity of other mental and behavioural disorders. The Internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these online conversations and the language styles of those interested in depression have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A random sample of 5,000 blog posts was crawled. Five groupings were identified: depression, bipolar, self-harm, grief, and suicide. Independent variables included psycholinguistic processes and content topics extracted from the posts. Machine learning techniques were used to discriminate messages posted in the depression sub-group from the others. Good predictive validity in depression classification using topics and psycholinguistic clues as features was found. Clear discrimination between writing styles and content, with good predictive power is an important step in understanding social media and its use in mental health. },    DOI = { 10.1007/978-3-319-26187-4_17 },    FILE = { :nguyen_odea_larsen_phung_venkatesh_christensen_wise15differentiating - Differentiating Sub Groups of Online Depression Related Communities Using Textual Cues.pdf:PDF },    ISBN = { 978-3-319-11748-5 },    KEYWORDS = { Web community; Feature extraction; Textual cues; Online depression },    LANGUAGE = { English },    OWNER = { thinng },    TIMESTAMP = { 2015.09.16 },    URL = { http://link.springer.com/chapter/10.1007/978-3-319-26187-4_17 },} C
 Using Twitter to learn about the autism community Beykikhoshk, Adham, Arandjelovi{\'c}, Ognjen, Phung, Dinh, Venkatesh, Svetha and Caelli, Terry. IEEE/ACM Intl. Conf. on Advances in Social Network Analysis and Mining (ASONAM), 5(1):1-17, December 2015. [ | | pdf] Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD---their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work. @ARTICLE { beykikhoshk_arandjelovic_phung_venkatesh_caelli_snaam15using,    AUTHOR = { Beykikhoshk, Adham and Arandjelovi{\'c}, Ognjen and Phung, Dinh and Venkatesh, Svetha and Caelli, Terry },    TITLE = { Using {T}witter to learn about the autism community },    JOURNAL = { IEEE/ACM Intl. Conf. on Advances in Social Network Analysis and Mining (ASONAM) },    YEAR = { 2015 },    VOLUME = { 5 },    NUMBER = { 1 },    PAGES = { 1--17 },    MONTH = { December },    ABSTRACT = { Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD---their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work. },    DOI = { 10.1007/s13278-015-0261-5 },    FILE = { :beykikhoshk_arandjelovic_phung_venkatesh_caelli_snaam15using - Using Twitter to Learn about the Autism Community.pdf:PDF },    KEYWORDS = { Social media Big data Asperger’s Mental health Health care Public health ASD },    OWNER = { dinh },    PUBLISHER = { Springer Vienna },    TIMESTAMP = { 2015.06.10 },    URL = { http://dx.doi.org/10.1007/s13278-015-0261-5 },} J
 Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted Boltzmann Machines Vellanki, Pratibha, Phung, Dinh, Duong, Thi and Venkatesh, Svetha. In Trends and Applications in Knowledge Discovery and Data Mining, pages 245-257, Cham, Nov. 2015. [ | | pdf] @INPROCEEDINGS { vellanki_phung_duong_venkatesh_pakdd2015learning,    AUTHOR = { Vellanki, Pratibha and Phung, Dinh and Duong, Thi and Venkatesh, Svetha },    TITLE = { Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted {B}oltzmann Machines },    BOOKTITLE = { Trends and Applications in Knowledge Discovery and Data Mining },    YEAR = { 2015 },    VOLUME = { 9441 },    SERIES = { Lecture Notes in Computer Science },    PAGES = { 245--257 },    ADDRESS = { Cham },    MONTH = { Nov. },    PUBLISHER = { Springer },    DOI = { 10.1007/978-3-319-25660-3_21 },    FILE = { :vellanki_phung_duong_venkatesh_pakdd2015learning - Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted Boltzmann Machines.pdf:PDF },    ISBN = { 978-3-319-25660-3 },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.21 },    URL = { http://dx.doi.org/10.1007/978-3-319-25660-3_21 },} C
 Multi-View Subspace Clustering for Face Images Zhang, Xin, Phung, Dinh, Venkatesh, Svetha, Pham, Duc-Son and Liu, Wanquan. In Intl. Conf. on Digital Image Computing: Techniques and Applications (DICTA), pages 1-7, Nov. 2015. [ | | pdf] In many real-world computer vision applications, such as multi-camera surveillance, the objects of interest are captured by visual sensors concurrently, resulting in multi-view data. These views usually provide complementary information to each other. One recent and powerful computer vision method for clustering is sparse subspace clustering (SSC); however, it was not designed for multi-view data, which break down its linear separability assumption. To integrate complementary information between views, multi-view clustering algorithms are required to improve the clustering performance. In this paper, we propose a novel multi-view subspace clustering by searching for an unified latent structure as a global affinity matrix in subspace clustering. Due to the integration of affinity matrices for each view, this global affinity matrix can best represent the relationship between clusters. This could help us achieve better performance on face clustering. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other alternatives based on state-of-the-arts on challenging multi-view face datasets. @INPROCEEDINGS { zhang_phung_venkatesh_pham_liu_dicta15multiview,    AUTHOR = { Zhang, Xin and Phung, Dinh and Venkatesh, Svetha and Pham, Duc-Son and Liu, Wanquan },    TITLE = { Multi-View Subspace Clustering for Face Images },    BOOKTITLE = { Intl. Conf. on Digital Image Computing: Techniques and Applications (DICTA) },    YEAR = { 2015 },    PAGES = { 1-7 },    MONTH = { Nov. },    ABSTRACT = { In many real-world computer vision applications, such as multi-camera surveillance, the objects of interest are captured by visual sensors concurrently, resulting in multi-view data. These views usually provide complementary information to each other. One recent and powerful computer vision method for clustering is sparse subspace clustering (SSC); however, it was not designed for multi-view data, which break down its linear separability assumption. To integrate complementary information between views, multi-view clustering algorithms are required to improve the clustering performance. In this paper, we propose a novel multi-view subspace clustering by searching for an unified latent structure as a global affinity matrix in subspace clustering. Due to the integration of affinity matrices for each view, this global affinity matrix can best represent the relationship between clusters. This could help us achieve better performance on face clustering. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other alternatives based on state-of-the-arts on challenging multi-view face datasets. },    DOI = { 10.1109/DICTA.2015.7371289 },    FILE = { :zhang_phung_venkatesh_pham_liu_dicta15multiview - Multi View Subspace Clustering for Face Images.pdf:PDF },    KEYWORDS = { computer vision;face recognition;pattern clustering;ADMM framework;SSC;affinity matrices;alternating direction method;computer vision applications;computer vision method;convergent algorithm;face clustering;face images;global affinity matrix;latent structure;linear separability assumption;multicamera surveillance;multipliers;multiview data;multiview face datasets;multiview subspace clustering algorithms;sparse subspace clustering performance;visual sensors;Cameras;Clustering algorithms;Computer vision;Face;Loss measurement;Matrix decomposition;Sparse matrices },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.21 },    URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7371289 },} C
 Streaming Variational Inference for Dirichlet Process Mixtures Huynh, V., Phung, D. and Venkatesh, S.. In 7th Asian Conference on Machine Learning (ACML), pages 237-252, Nov. 2015. [ | | pdf] Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data. @INPROCEEDINGS { huynh_phung_venkatesh_15streaming,    AUTHOR = { Huynh, V. and Phung, D. and Venkatesh, S. },    TITLE = { Streaming Variational Inference for {D}irichlet {P}rocess {M}ixtures },    BOOKTITLE = { 7th Asian Conference on Machine Learning (ACML) },    YEAR = { 2015 },    PAGES = { 237--252 },    MONTH = { Nov. },    ABSTRACT = { Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data. },    FILE = { :huynh_phung_venkatesh_15streaming - Streaming Variational Inference for Dirichlet Process Mixtures.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.06 },    URL = { http://www.jmlr.org/proceedings/papers/v45/Huynh15.pdf },} C
 Understanding toxicities and complications of cancer treatment: A data mining approach Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In 28th Australasian Joint Conference on Artificial Intelligence (AI), pages 431-443, Nov 2015. [ | | pdf] @INPROCEEDINGS { nguyen_luo_phung_venkatesh_ai15understanding,    AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Understanding toxicities and complications of cancer treatment: A data mining approach },    BOOKTITLE = { 28th Australasian Joint Conference on Artificial Intelligence (AI) },    YEAR = { 2015 },    EDITOR = { Pfahringer, Bernhard and Renz, Jochen },    VOLUME = { 9457 },    SERIES = { Lecture Notes in Computer Science },    PAGES = { 431--443 },    MONTH = { Nov },    PUBLISHER = { Springer International Publishing },    DOI = { 10.1007/978-3-319-26350-2_38 },    FILE = { :nguyen_luo_phung_venkatesh_ai15understanding - Understanding Toxicities and Complications of Cancer Treatment_ a Data Mining Approach.pdf:PDF },    LOCATION = { Canberra, ACT, Australia },    OWNER = { ngdang },    TIMESTAMP = { 2015.09.15 },    URL = { http://dx.doi.org/10.1007/978-3-319-26350-2_38 },} C
 Stable Feature Selection with Support Vector Machines Kamkar, Iman, Gupta, Sunil Kumar, Phung, Dinh and Venkatesh, Svetha. In 28th Australasian Joint Conference on Artificial Intelligence (AI), pages 298-308, Cham, Nov. 2015. [ | | pdf] The support vector machine (SVM) is a popular method for classification, well known for finding the maximum-margin hyperplane. Combining SVM with l1l1-norm penalty further enables it to simultaneously perform feature selection and margin maximization within a single framework. However, l1l1-norm SVM shows instability in selecting features in presence of correlated features. We propose a new method to increase the stability of l1l1-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix. Our proposed method can capture both positive and negative correlations between features. We formulate the model as a convex optimization problem and propose a solution based on alternating minimization. Using both synthetic and real-world datasets, we show that our model achieves better stability and classification accuracy compared to several state-of-the-art regularized classification methods. @INPROCEEDINGS { kamkar_gupta_phung_venkatesh_ai15stable,    AUTHOR = { Kamkar, Iman and Gupta, Sunil Kumar and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Stable Feature Selection with {S}upport {V}ector {M}achines },    BOOKTITLE = { 28th Australasian Joint Conference on Artificial Intelligence (AI) },    YEAR = { 2015 },    EDITOR = { Pfahringer, Bernhard and Renz, Jochen },    VOLUME = { 9457 },    SERIES = { Lecture Notes in Computer Science },    PAGES = { 298--308 },    ADDRESS = { Cham },    MONTH = { Nov. },    PUBLISHER = { Springer International Publishing },    ABSTRACT = { The support vector machine (SVM) is a popular method for classification, well known for finding the maximum-margin hyperplane. Combining SVM with l1l1-norm penalty further enables it to simultaneously perform feature selection and margin maximization within a single framework. However, l1l1-norm SVM shows instability in selecting features in presence of correlated features. We propose a new method to increase the stability of l1l1-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix. Our proposed method can capture both positive and negative correlations between features. We formulate the model as a convex optimization problem and propose a solution based on alternating minimization. Using both synthetic and real-world datasets, we show that our model achieves better stability and classification accuracy compared to several state-of-the-art regularized classification methods. },    DOI = { 10.1007/978-3-319-26350-2_26 },    FILE = { :kamkar_gupta_phung_venkatesh_ai15stable - Stable Feature Selection with Support Vector Machines.pdf:PDF },    ISBN = { 978-3-319-26350-2 },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.21 },    URL = { http://dx.doi.org/10.1007/978-3-319-26350-2_26 },} C
 Exploiting Feature Relationships Towards Stable Feature Selection Kamkar, Iman, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In Intl. Conf. on Data Science and Advanced Analytics (DSAA), pages 1-10, Paris, France, Oct. 2015. [ | | pdf] Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured via a feature covariance matrix. Besides modeling positive feature correlations, our method can also identify negative correlations between features. We propose a convex formulation for our model along with an alternating optimization algorithm that can learn the weights of the features as well as the relationship between them. Using both synthetic and real-world data, we show that the proposed method is more stable than Lasso and many state-of-the-art shrinkage and feature selection methods. Also, its predictive performance is comparable to other methods. @INPROCEEDINGS { kamkar_gupta_phung_venkatesh_dsaa15,    AUTHOR = { Kamkar, Iman and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Exploiting Feature Relationships Towards Stable Feature Selection },    BOOKTITLE = { Intl. Conf. on Data Science and Advanced Analytics (DSAA) },    YEAR = { 2015 },    PAGES = { 1--10 },    ADDRESS = { Paris, France },    MONTH = { Oct. },    ABSTRACT = { Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured via a feature covariance matrix. Besides modeling positive feature correlations, our method can also identify negative correlations between features. We propose a convex formulation for our model along with an alternating optimization algorithm that can learn the weights of the features as well as the relationship between them. Using both synthetic and real-world data, we show that the proposed method is more stable than Lasso and many state-of-the-art shrinkage and feature selection methods. Also, its predictive performance is comparable to other methods. },    DOI = { 10.1109/DSAA.2015.7344859 },    FILE = { :kamkar_gupta_phung_venkatesh_dsaa15 - Exploiting Feature Relationships Towards Stable Feature Selection.pdf:PDF },    KEYWORDS = { convex programming;covariance matrices;feature selection;Lasso stability;convex formulation;correlated feature;feature covariance matrix;feature relationship;feature selection method;negative correlation;optimization algorithm;positive feature correlation;predictive model;real-world data;shrinkage;stable feature selection;synthetic data;Correlation;Covariance matrices;Linear programming;Optimization;Predictive models;Stability criteria;Correlated features;Lasso;Prediction;Stability },    OWNER = { ikamkar },    TIMESTAMP = { 2015.09.16 },    URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7344859 },} C
 Nonparametric Discovery of Online Mental Health-Related Communities Dao, Bo, Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In Intl. Conf. on Data Science and Advanced Analytics (DSAA), pages 1-10, Paris, France, Oct. 2015. [ | | pdf] @INPROCEEDINGS { dao_nguyen_venkatesh_phung_dsaa15,    AUTHOR = { Dao, Bo and Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh },    TITLE = { Nonparametric Discovery of Online Mental Health-Related Communities },    BOOKTITLE = { Intl. Conf. on Data Science and Advanced Analytics (DSAA) },    YEAR = { 2015 },    PAGES = { 1-10 },    ADDRESS = { Paris, France },    MONTH = { Oct. },    PUBLISHER = { IEEE },    DOI = { 10.1109/DSAA.2015.7344841 },    FILE = { :dao_nguyen_venkatesh_phung_dsaa15 - Nonparametric Discovery of Online Mental Health Related Communities.pdf:PDF },    KEYWORDS = { cognition;health care;nonparametric statistics;pattern clustering;social networking (online);cognitive dynamics;mood swings patterns;nonparametric clustering;nonparametric discovery;nonparametric topic modelling;online communities;online mental health-related communities;social media;Autism;Blogs;Media;Mood;Sentiment analysis;Variable speed drives;Mental Health;Moods and Emotion;Nonparametric Discovery;Online Communities;Social Media;Topics },    OWNER = { dbdao },    TIMESTAMP = { 2015.07.23 },    URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7344841 },} C
 Mixed-norm sparse representation for multi view face recognition Zhang, Xin, Pham, Duc-Son, Venkatesh, Svetha, Liu, Wanquan and Phung, Dinh. Pattern Recognition, 48(9):2935-2946, Sep. 2015. [ | | pdf] @ARTICLE { zhang_pham_venkatesh_liu_phung_pr15mixed,    AUTHOR = { Zhang, Xin and Pham, Duc-Son and Venkatesh, Svetha and Liu, Wanquan and Phung, Dinh },    TITLE = { Mixed-norm sparse representation for multi view face recognition },    JOURNAL = { Pattern Recognition },    YEAR = { 2015 },    VOLUME = { 48 },    NUMBER = { 9 },    PAGES = { 2935--2946 },    MONTH = { Sep. },    DOI = { 10.1016/j.patcog.2015.02.022 },    FILE = { :zhang_pham_venkatesh_liu_phung_pr15mixed - Mixed Norm Sparse Representation for Multi View Face Recognition.pdf:PDF },    KEYWORDS = { ADMM, Convex optimization, Group sparse representation, Joint dynamic sparse representation classification, Multi-pose face recognition, Multi-task learning, Robust face recognition, Sparse representation classification, Unsupervised learning },    OWNER = { dinh },    PUBLISHER = { Pergamon },    TIMESTAMP = { 2015.09.16 },    URL = { http://dl.acm.org/citation.cfm?id=2792197 },} J
 Overcoming Data Scarcity of Twitter: Using Tweets As Bootstrap with Application to Autism-Related Topic Content Analysis Beykikhoshk, Adham, Arandjelovi\'{c}, Ognjen, Phung, Dinh and Venkatesh, Svetha. In IEEE/ACM Intl. Conf. on Advances in Social Networks Analysis and Mining (ASONAM), pages 1354-1361, New York, NY, USA, Aug. 2015. [ | | pdf] Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags. @INPROCEEDINGS { beykikhoshk_arandjelovic_phung_venkatesh_asonam15overcoming,    AUTHOR = { Beykikhoshk, Adham and Arandjelovi\'{c}, Ognjen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Overcoming Data Scarcity of {T}witter: Using Tweets As Bootstrap with Application to Autism-Related Topic Content Analysis },    BOOKTITLE = { IEEE/ACM Intl. Conf. on Advances in Social Networks Analysis and Mining (ASONAM) },    YEAR = { 2015 },    SERIES = { ASONAM '15 },    PAGES = { 1354--1361 },    ADDRESS = { New York, NY, USA },    MONTH = { Aug. },    PUBLISHER = { ACM },    ABSTRACT = { Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags. },    ACMID = { 2808908 },    DOI = { 10.1145/2808797.2808908 },    FILE = { :beykikhoshk_arandjelovic_phung_venkatesh_asonam15overcoming - Overcoming Data Scarcity of Twitter_ Using Tweets As Bootstrap with Application to Autism Related Topic Content Analysis.pdf:PDF },    ISBN = { 978-1-4503-3854-7 },    LOCATION = { Paris, France },    NUMPAGES = { 8 },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.05.21 },    URL = { http://doi.acm.org/10.1145/2808797.2808908 },} C
 Autism Blogs: Expressed Emotion, Language Styles and Concerns in Personal and Community Settings Nguyen, Thin, Duong, Thi, Venkatesh, Svetha and Phung, Dinh. IEEE Transactions on Affective Computing (TAC), 6(3):312-323, July 2015. [ | | pdf] The Internet has provided an ever increasingly popular platform for individuals to voice their thoughts, and like-minded people to share stories. This unintentionally leaves characteristics of individuals and communities, which are often difficult to be collected in traditional studies. Individuals with autism are such a case, in which the Internet could facilitate even more communication given its social-spatial distance being a characteristic preference for individuals with autism. Previous studies examined the traces left in the posts of online autism communities (Autism) in comparison with other online communities (Control). This work further investigates these online populations through the contents of not only their posts but also their comments. We first compare the Autism and Control blogs based on three features: topics, language styles and affective information. The autism groups are then further examined, based on the same three features, by looking at their personal (Personal) and community (Community) blogs separately. Machine learning and statistical methods are used to discriminate blog contents in both cases. All three features are found to be significantly different between Autism and Control, and between autism Personal and Community. These features also show good indicative power in prediction of autism blogs in both personal and community settings. @ARTICLE { nguyen_duong_venkatesh_phung_tac15,    AUTHOR = { Nguyen, Thin and Duong, Thi and Venkatesh, Svetha and Phung, Dinh },    TITLE = { Autism Blogs: Expressed Emotion, Language Styles and Concerns in Personal and Community Settings },    JOURNAL = { IEEE Transactions on Affective Computing (TAC) },    YEAR = { 2015 },    VOLUME = { 6 },    NUMBER = { 3 },    PAGES = { 312-323 },    MONTH = { July },    ISSN = { 1949-3045 },    ABSTRACT = { The Internet has provided an ever increasingly popular platform for individuals to voice their thoughts, and like-minded people to share stories. This unintentionally leaves characteristics of individuals and communities, which are often difficult to be collected in traditional studies. Individuals with autism are such a case, in which the Internet could facilitate even more communication given its social-spatial distance being a characteristic preference for individuals with autism. Previous studies examined the traces left in the posts of online autism communities (Autism) in comparison with other online communities (Control). This work further investigates these online populations through the contents of not only their posts but also their comments. We first compare the Autism and Control blogs based on three features: topics, language styles and affective information. The autism groups are then further examined, based on the same three features, by looking at their personal (Personal) and community (Community) blogs separately. Machine learning and statistical methods are used to discriminate blog contents in both cases. All three features are found to be significantly different between Autism and Control, and between autism Personal and Community. These features also show good indicative power in prediction of autism blogs in both personal and community settings. },    DOI = { 10.1109/TAFFC.2015.2400912 },    FILE = { :nguyen_duong_venkatesh_phung_tac15 - Autism Blogs_ Expressed Emotion, Language Styles and Concerns in Personal and Community Settings.pdf:PDF },    KEYWORDS = { Web sites;human factors;learning (artificial intelligence);statistical analysis;Internet;affective information;autism blogs;blog content discrimination;community setting;control blogs;language styles;machine learning;online autism communities;personal setting;social-spatial distance;statistical methods;topics;Autism;Blogs;Communities;Educational institutions;Feature extraction;Sociology;Variable speed drives;Affective norms;affective norms;autism;language styles;psychological health;topics },    OWNER = { thinng },    TIMESTAMP = { 2015.01.28 },    URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7034996 },} J
 Stabilized Sparse Ordinal Regression for Medical Risk Stratification Tran, Truyen, Phung, Dinh, Luo, Wei and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 43(3):555-582, June 2015. [ | | pdf] The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability. @ARTICLE { tran_phung_luo_venkatesh_kais15stabilized,    AUTHOR = { Tran, Truyen and Phung, Dinh and Luo, Wei and Venkatesh, Svetha },    TITLE = { Stabilized Sparse Ordinal Regression for Medical Risk Stratification },    JOURNAL = { Knowledge and Information Systems (KAIS) },    YEAR = { 2015 },    VOLUME = { 43 },    NUMBER = { 3 },    PAGES = { 555--582 },    MONTH = { June },    ABSTRACT = { The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability. },    DOI = { 10.1007/s10115-014-0740-4 },    FILE = { :Tran2015_Article_StabilizedSparseOrdinalRegress.pdf:PDF },    KEYWORDS = { Medical risk stratification Sparse ordinal regression Stability Feature graph Electronic medical record },    OWNER = { dinh },    TIMESTAMP = { 2014.01.28 },    URL = { http://link.springer.com/article/10.1007%2Fs10115-014-0740-4 },} J
 A predictive framework for modeling healthcare data with evolving clinical interventions Rana, Santu, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. The ASA Data Science Journal Statistical Analysis and Data Mining, 8(3):162-182, June 2015. [ | | pdf] Medical interventions critically determine clinical outcomes. But prediction models either ignore interventions or dilute impact by building a single prediction rule by amalgamating interventions with other features. One rule across all interventions may not capture differential effects. Also, interventions change with time as innovations are made, requiring prediction models to evolve over time. To address these gaps, we propose a prediction framework that explicitly models interventions by extracting a set of latent intervention groups through a Hierarchical Dirichlet Process (HDP) mixture. Data are split in temporal windows and for each window, a separate distribution over the intervention groups is learnt. This ensures that the model evolves with changing interventions. The outcome is modeled as conditional, on both the latent grouping and the patients' condition, through a Bayesian logistic regression. Learning distributions for each time-window result in an over-complex model when interventions do not change in every time-window. We show that by replacing HDP with a dynamic HDP prior, a more compact set of distributions can be learnt. Experiments performed on two hospital datasets demonstrate the superiority of our framework over many existing clinical and traditional prediction frameworks. @ARTICLE { rana_gupta_phung_venkatesh_sdm15predictive,    AUTHOR = { Rana, Santu and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },    TITLE = { A predictive framework for modeling healthcare data with evolving clinical interventions },    JOURNAL = { The ASA Data Science Journal Statistical Analysis and Data Mining },    YEAR = { 2015 },    VOLUME = { 8 },    NUMBER = { 3 },    PAGES = { 162--182 },    MONTH = { June },    ABSTRACT = { Medical interventions critically determine clinical outcomes. But prediction models either ignore interventions or dilute impact by building a single prediction rule by amalgamating interventions with other features. One rule across all interventions may not capture differential effects. Also, interventions change with time as innovations are made, requiring prediction models to evolve over time. To address these gaps, we propose a prediction framework that explicitly models interventions by extracting a set of latent intervention groups through a Hierarchical Dirichlet Process (HDP) mixture. Data are split in temporal windows and for each window, a separate distribution over the intervention groups is learnt. This ensures that the model evolves with changing interventions. The outcome is modeled as conditional, on both the latent grouping and the patients' condition, through a Bayesian logistic regression. Learning distributions for each time-window result in an over-complex model when interventions do not change in every time-window. We show that by replacing HDP with a dynamic HDP prior, a more compact set of distributions can be learnt. Experiments performed on two hospital datasets demonstrate the superiority of our framework over many existing clinical and traditional prediction frameworks. },    DOI = { 10.1002/sam.11262 },    FILE = { :rana_gupta_phung_venkatesh_sdm15predictive - A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions.pdf:PDF },    KEYWORDS = { data mining, machine learning, healthcare data modeling },    OWNER = { dinh },    PUBLISHER = { Wiley Subscription Services, Inc., A Wiley Company },    TIMESTAMP = { 2015.06.10 },    URL = { http://dx.doi.org/10.1002/sam.11262 },} J
 Stabilizing High-Dimensional Prediction Models Using Feature Graphs Gopakumar, Shivapratap, Tran, Truyen, Nguyen, Tu, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), 19(3):1044-1052, May 2015. [ | | pdf] We investigate feature stability in the context of clinical prognosis derived from high-dimensional electronic medical records. To reduce variance in the selected features that are predictive, we introduce Laplacian-based regularization into a regression model. The Laplacian is derived on a feature graph that captures both the temporal and hierarchic relations between hospital events, diseases, and interventions. Using a cohort of patients with heart failure, we demonstrate better feature stability and goodness-of-fit through feature graph stabilization. @ARTICLE { gopakumar_tran_nguyen_phung_venkatesh_bhi15stabilizing,    AUTHOR = { Gopakumar, Shivapratap and Tran, Truyen and Nguyen, Tu and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Stabilizing High-Dimensional Prediction Models Using Feature Graphs },    JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) },    YEAR = { 2015 },    VOLUME = { 19 },    NUMBER = { 3 },    PAGES = { 1044--1052 },    MONTH = { May },    ISSN = { 2168-2194 },    ABSTRACT = { We investigate feature stability in the context of clinical prognosis derived from high-dimensional electronic medical records. To reduce variance in the selected features that are predictive, we introduce Laplacian-based regularization into a regression model. The Laplacian is derived on a feature graph that captures both the temporal and hierarchic relations between hospital events, diseases, and interventions. Using a cohort of patients with heart failure, we demonstrate better feature stability and goodness-of-fit through feature graph stabilization. },    DOI = { 10.1109/JBHI.2014.2353031 },    FILE = { :gopakumar_tran_nguyen_phung_venkatesh_bhi15stabilizing - Stabilizing High Dimensional Prediction Models Using Feature Graphs.pdf:PDF },    KEYWORDS = { Laplace equations;cardiology;diseases;electronic health records;feature selection;graphs;medical diagnostic computing;regression analysis;Laplacian-based regularization;clinical prognosis;diseases;feature graph stabilization;goodness-of-fit;heart failure;hierarchic relations;high-dimensional electronic medical records;hospital events;interventions;regression model;selected features;stabilizing high-dimensional prediction models;temporal relations;Data models;Feature extraction;Heart;Indexes;Predictive models;Stability criteria;Biomedical computing;electronic medical records;predictive models;stability },    OWNER = { thinng },    TIMESTAMP = { 2015.01.29 },    URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6887285 },} J
 A Bayesian Nonparametric Approach to Multilevel Regression Nguyen, V., Phung, D., Venkatesh, S. and Bui, H.H.. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 330-342, May 2015. [ | | pdf] Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups. Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression. We provide collapsed Gibbs sampler for posterior inference. We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model. @INPROCEEDINGS { nguyen_phung_venkatesh_bui_pakdd15,    AUTHOR = { Nguyen, V. and Phung, D. and Venkatesh, S. and Bui, H.H. },    TITLE = { A {B}ayesian Nonparametric Approach to Multilevel Regression },    BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },    YEAR = { 2015 },    PAGES = { 330--342 },    MONTH = { May },    ABSTRACT = { Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups. Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression. We provide collapsed Gibbs sampler for posterior inference. We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model. },    DOI = { 10.1007/978-3-319-18038-0_26 },    FILE = { :nguyen_phung_venkatesh_bui_pakdd15 - A Bayesian Nonparametric Approach to Multilevel Regression.pdf:PDF },    OWNER = { Dinh },    TIMESTAMP = { 2015.02.08 },    URL = { http://link.springer.com/chapter/10.1007%2F978-3-319-18038-0_26 },} C
 Hierarchical Dirichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature Beykikhoshk, Adham, Arandjelovi{\'{c}}, Ognjen, Venkatesh, Svetha and Phung, Dinh. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 550-562, Ho Chi Minh City, Vietnam, May 2015. [ | | pdf] In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) – an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we made freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature. @INPROCEEDINGS { beykikhoshk_arandjelovic_venkatesh_phung_pakdd15,    AUTHOR = { Beykikhoshk, Adham and Arandjelovi{\'{c}}, Ognjen and Venkatesh, Svetha and Phung, Dinh },    TITLE = { Hierarchical {D}irichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature },    BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },    YEAR = { 2015 },    EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi },    VOLUME = { 9077 },    SERIES = { Lecture Notes in Computer Science },    PAGES = { 550--562 },    ADDRESS = { Ho Chi Minh City, Vietnam },    MONTH = { May },    PUBLISHER = { Springer International Publishing },    ABSTRACT = { In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) – an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we made freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature. },    DOI = { 10.1007/978-3-319-18038-0_43 },    FILE = { :beykikhoshk_arandjelovic_venkatesh_phung_pakdd15 - Hierarchical Dirichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature.pdf:PDF },    OWNER = { Dinh },    TIMESTAMP = { 2015.02.08 },    URL = { http://dx.doi.org/10.1007/978-3-319-18038-0_43 },} C
 Stabilizing Sparse Cox Model using Statistic and Semantic Structures in Electronic Medical Records Gopakumar, Shivapratap, Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 331-343, Ho Chi Minh City, Vietnam, May 2015. (Runner-up Best Student Paper Award). [ | | pdf] Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data, which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using statistical and semantic structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using three feature graphs built from (i) Jaccard similarity among features (ii) aggregation of Jaccard similarity graph and a recently introduced semantic EMR graph (iii) Jaccard similarity among features transferred from a related cohort. Our experiments are conducted on two real world hospital datasets: a heart failure cohort and a diabetes cohort. On two stability measures – the Consistency index and signal-to-noise ratio (SNR) – the use of our proposed methods significantly increased feature stability when compared with the baselines. @INPROCEEDINGS { gopakumar_nguyen_tran_phung_venkatesh_pakdd15,    AUTHOR = { Gopakumar, Shivapratap and Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Stabilizing Sparse {C}ox Model using Statistic and Semantic Structures in Electronic Medical Records },    BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },    YEAR = { 2015 },    EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi },    VOLUME = { 9078 },    SERIES = { Lecture Notes in Computer Science },    PAGES = { 331--343 },    ADDRESS = { Ho Chi Minh City, Vietnam },    MONTH = { May },    PUBLISHER = { Springer International Publishing },    NOTE = { Runner-up Best Student Paper Award },    ABSTRACT = { Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data, which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using statistical and semantic structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using three feature graphs built from (i) Jaccard similarity among features (ii) aggregation of Jaccard similarity graph and a recently introduced semantic EMR graph (iii) Jaccard similarity among features transferred from a related cohort. Our experiments are conducted on two real world hospital datasets: a heart failure cohort and a diabetes cohort. On two stability measures – the Consistency index and signal-to-noise ratio (SNR) – the use of our proposed methods significantly increased feature stability when compared with the baselines. },    DOI = { 10.1007/978-3-319-18032-8_26 },    FILE = { :gopakumar_nguyen_tran_phung_venkatesh_pakdd15 - Stabilizing Sparse Cox Model Using Statistic and Semantic Structures in Electronic Medical Records.pdf:PDF },    OWNER = { Dinh },    TIMESTAMP = { 2015.02.08 },    URL = { http://link.springer.com/chapter/10.1007%2F978-3-319-18032-8_26 },} C
 Collaborating Differently on Different Topics: A Multi-Relational Approach to Multi-Task Learning Gupta, Sunil Kumar, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 303-316, Ho Chi Minh City, Vietnam, May 2015. (Best Paper Award). [ | | pdf] Multi-task learning offers a way to benefit from synergy of multiple related prediction tasks via their joint modeling. Current multi-task techniques model related tasks jointly, assuming that the tasks share the same relationship across features uniformly. This assumption is seldom true as tasks may be related across some features but not others. Addressing this problem, we propose a new multi-task learning model that learns separate task relationships along different features. This added flexibility allows our model to have a finer and differential level of control in joint modeling of tasks along different features. We formulate the model as an optimization problem and provide an efficient, iterative solution. We illustrate the behavior of the proposed model using a synthetic dataset where we induce varied feature-dependent task relationships: positive relationship, negative relationship, no relationship. Using four real datasets, we evaluate the effectiveness of the proposed model for many multi-task regression and classification problems, and demonstrate its superiority over other state-of-the-art multi-task learning models. @INPROCEEDINGS { gupta_rana_phung_venkatesh_pakdd15,    AUTHOR = { Gupta, Sunil Kumar and Rana, Santu and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Collaborating Differently on Different Topics: A Multi-Relational Approach to Multi-Task Learning },    BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },    YEAR = { 2015 },    EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi },    VOLUME = { 9077 },    PAGES = { 303--316 },    ADDRESS = { Ho Chi Minh City, Vietnam },    MONTH = { May },    PUBLISHER = { Springer International Publishing },    NOTE = { Best Paper Award },    ABSTRACT = { Multi-task learning offers a way to benefit from synergy of multiple related prediction tasks via their joint modeling. Current multi-task techniques model related tasks jointly, assuming that the tasks share the same relationship across features uniformly. This assumption is seldom true as tasks may be related across some features but not others. Addressing this problem, we propose a new multi-task learning model that learns separate task relationships along different features. This added flexibility allows our model to have a finer and differential level of control in joint modeling of tasks along different features. We formulate the model as an optimization problem and provide an efficient, iterative solution. We illustrate the behavior of the proposed model using a synthetic dataset where we induce varied feature-dependent task relationships: positive relationship, negative relationship, no relationship. Using four real datasets, we evaluate the effectiveness of the proposed model for many multi-task regression and classification problems, and demonstrate its superiority over other state-of-the-art multi-task learning models. },    DOI = { 10.1007/978-3-319-18038-0_24 },    FILE = { :gupta_rana_phung_venkatesh_pakdd15 - Collaborating Differently on Different Topics_ a Multi Relational Approach to Multi Task Learning.pdf:PDF },    OWNER = { Dinh },    TIMESTAMP = { 2015.02.08 },    URL = { http://link.springer.com/chapter/10.1007/978-3-319-18038-0_24 },} C
 Learning Conditional Latent Structures from Multiple Data Sources Huynh, V., Phung, D., Nguyen, X.L., Venkatesh, S. and Bui, H.H.. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 343-354, May 2015. [ | | pdf] Data usually present in heterogeneous sources. When dealing with multiple data sources, existing models often treat them independently and thus can not explicitly model the correlation structures among data sources. To address this problem, we propose a full Bayesian nonparametric approach to model correlation structures among multiple and heterogeneous datasets. The proposed framework, first, induces mixture distribution over primary data source using hierarchical Dirichlet processes (HDP). Once conditioned on each atom (group) discovered in previous step, context data sources are mutually independent and each is generated from hierarchical Dirichlet processes. In each specific application, which covariates constitute content or context(s) is determined by the nature of data. We also derive the efficient inference and exploit the conditional independence structure to propose (conditional) parallel Gibbs sampling scheme. We demonstrate our model to address the problem of latent activities discovery in pervasive computing using mobile data. We show the advantage of utilizing multiple data sources in terms of exploratory analysis as well as quantitative clustering performance. @INPROCEEDINGS { huynh_phung_nguyen_venkatesh_bui_pakdd15,    AUTHOR = { Huynh, V. and Phung, D. and Nguyen, X.L. and Venkatesh, S. and Bui, H.H. },    TITLE = { Learning Conditional Latent Structures from Multiple Data Sources },    BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },    YEAR = { 2015 },    PAGES = { 343--354 },    MONTH = { May },    ABSTRACT = { Data usually present in heterogeneous sources. When dealing with multiple data sources, existing models often treat them independently and thus can not explicitly model the correlation structures among data sources. To address this problem, we propose a full Bayesian nonparametric approach to model correlation structures among multiple and heterogeneous datasets. The proposed framework, first, induces mixture distribution over primary data source using hierarchical Dirichlet processes (HDP). Once conditioned on each atom (group) discovered in previous step, context data sources are mutually independent and each is generated from hierarchical Dirichlet processes. In each specific application, which covariates constitute content or context(s) is determined by the nature of data. We also derive the efficient inference and exploit the conditional independence structure to propose (conditional) parallel Gibbs sampling scheme. We demonstrate our model to address the problem of latent activities discovery in pervasive computing using mobile data. We show the advantage of utilizing multiple data sources in terms of exploratory analysis as well as quantitative clustering performance. },    DOI = { 10.1007/978-3-319-18038-0_27 },    FILE = { :huynh_phung_nguyen_venkatesh_bui_pakdd15 - Learning Conditional Latent Structures from Multiple Data Sources.pdf:PDF },    OWNER = { Dinh },    TIMESTAMP = { 2015.02.08 },    URL = { http://link.springer.com/chapter/10.1007/978-3-319-18038-0_27 },} C
 Fast One-Class Support Vector Machine for Novelty Detection Le, Trung, Phung, Dinh, Nguyen, Khanh and Venkatesh, Svetha. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 189-200, Ho Chi Minh City, Vietnam, May 2015. [ | | pdf] Novelty detection arises as an important learning task in several applications. Kernel-based approach to novelty detection has been widely used due to its theoretical rigor and elegance of geometric interpretation. However, computational complexity is a major obstacle in this approach. In this paper, leveraging on the cutting-plane framework with the well-known One-Class Support Vector Machine, we present a new solution that can scale up seamlessly with data. The first solution is exact and linear when viewed through the cutting-plane; the second employed a sampling strategy that remarkably has a constant computational complexity defined relatively to the probability of approximation accuracy. Several datasets are benchmarked to demonstrate the credibility of our framework. @INPROCEEDINGS { le_phung_nguyen_venkatesh_pakdd15,    AUTHOR = { Le, Trung and Phung, Dinh and Nguyen, Khanh and Venkatesh, Svetha },    TITLE = { Fast {O}ne-{C}lass {S}upport {V}ector {M}achine for Novelty Detection },    BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },    YEAR = { 2015 },    EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi },    VOLUME = { 9078 },    SERIES = { Lecture Notes in Computer Science },    PAGES = { 189--200 },    ADDRESS = { Ho Chi Minh City, Vietnam },    MONTH = { May },    PUBLISHER = { Springer International Publishing },    ABSTRACT = { Novelty detection arises as an important learning task in several applications. Kernel-based approach to novelty detection has been widely used due to its theoretical rigor and elegance of geometric interpretation. However, computational complexity is a major obstacle in this approach. In this paper, leveraging on the cutting-plane framework with the well-known One-Class Support Vector Machine, we present a new solution that can scale up seamlessly with data. The first solution is exact and linear when viewed through the cutting-plane; the second employed a sampling strategy that remarkably has a constant computational complexity defined relatively to the probability of approximation accuracy. Several datasets are benchmarked to demonstrate the credibility of our framework. },    DOI = { 10.1007/978-3-319-18032-8_15 },    FILE = { :le_phung_nguyen_venkatesh_pakdd15 - Fast One Class Support Vector Machine for Novelty Detection.pdf:PDF },    KEYWORDS = { One-class Support Vector Machine, Novelty detection, Large-scale dataset },    OWNER = { Dinh },    TIMESTAMP = { 2015.02.08 },    URL = { http://link.springer.com/chapter/10.1007/978-3-319-18032-8_15 },} C
 Small-Variance Asymptotics for Bayesian Nonparametric Models with Constraints Li, C., Rana, S., Phung, D. and Venkatesh, S.. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 92-105, May 2015. [ | | pdf] The users often have additional knowledge when Bayesian nonparametric models (BNP) are employed, e.g. for clustering there may be prior knowledge that some of the data instances should be in the same cluster (must-link constraint) or in different clusters (cannot-link constraint), and similarly for topic modeling some words should be grouped together or separately because of an underlying semantic. This can be achieved by imposing appropriate sampling probabilities based on such constraints. However, the traditional inference technique of BNP models via Gibbs sampling is time consuming and is not scalable for large data. Variational approximations are faster but many times they do not offer good solutions. Addressing this we present a small-variance asymptotic analysis of the MAP estimates of BNP models with constraints. We derive the objective function for Dirichlet process mixture model with constraints and devise a simple and efficient K-means type algorithm. We further extend the small-variance analysis to hierarchical BNP models with constraints and devise a similar simple objective function. Experiments on synthetic and real data sets demonstrate the efficiency and effectiveness of our algorithms. @INPROCEEDINGS { li_rana_phung_venkatesh_pakdd15,    AUTHOR = { Li, C. and Rana, S. and Phung, D. and Venkatesh, S. },    TITLE = { Small-Variance Asymptotics for {B}ayesian Nonparametric Models with Constraints },    BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },    YEAR = { 2015 },    PAGES = { 92--105 },    MONTH = { May },    ABSTRACT = { The users often have additional knowledge when Bayesian nonparametric models (BNP) are employed, e.g. for clustering there may be prior knowledge that some of the data instances should be in the same cluster (must-link constraint) or in different clusters (cannot-link constraint), and similarly for topic modeling some words should be grouped together or separately because of an underlying semantic. This can be achieved by imposing appropriate sampling probabilities based on such constraints. However, the traditional inference technique of BNP models via Gibbs sampling is time consuming and is not scalable for large data. Variational approximations are faster but many times they do not offer good solutions. Addressing this we present a small-variance asymptotic analysis of the MAP estimates of BNP models with constraints. We derive the objective function for Dirichlet process mixture model with constraints and devise a simple and efficient K-means type algorithm. We further extend the small-variance analysis to hierarchical BNP models with constraints and devise a similar simple objective function. Experiments on synthetic and real data sets demonstrate the efficiency and effectiveness of our algorithms. },    DOI = { 10.1007/978-3-319-18032-8_8 },    FILE = { :li_rana_phung_venkatesh_pakdd15 - Small Variance Asymptotics for Bayesian Nonparametric Models with Constraints.pdf:PDF },    OWNER = { Dinh },    TIMESTAMP = { 2015.02.08 },    URL = { http://link.springer.com/chapter/10.1007/978-3-319-18032-8_8 },} C
 Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset Luo, Wei, Nguyen, Thin, Nichols, Melanie, Tran, Truyen, Rana, Santu, Gupta, Sunil, Phung, Dinh, Venkatesh, Svetha and Allender, Steve. PLOS ONE, 10(5):1-13, May 2015. [ | | pdf] For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease. @ARTICLE { luo_nguyen_nichols_tran_rana_gupta_phung_venkatesh_allender_pone15demography,    AUTHOR = { Luo, Wei and Nguyen, Thin and Nichols, Melanie and Tran, Truyen and Rana, Santu and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha and Allender, Steve },    TITLE = { Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset },    JOURNAL = { PLOS ONE },    YEAR = { 2015 },    VOLUME = { 10 },    NUMBER = { 5 },    PAGES = { 1-13 },    MONTH = { May },    ABSTRACT = { For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease. },    DOI = { 10.1371/journal.pone.0125602 },    FILE = { :luo_nguyen_nichols_tran_rana_gupta_phung_venkatesh_allender_pone15demography - Is Demography Destiny.pdf:PDF },    OWNER = { dinh },    TIMESTAMP = { 2015.06.10 },    URL = { http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0125602 },} J
 What shall I share and with Whom? - A Multi-Task Learning Formulation using Multi-Faceted Task Relationships Gupta, Sunil, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. In SIAM Intl. Conf. on Data Mining (SDM), pages 703-711, Vancouver, Canada, May 2015. [ | | pdf] Multi-task learning is a learning paradigm that improves the performance of "related" tasks through their joint learning. To do this each task answers the question "Which other task should I share with"? This task relatedness can be complex - a task may be related to one set of tasks based on one subset of features and to other tasks based on other subsets. Existing multi-task learning methods do not explicitly model this reality, learning a single-faceted task relationship over all the features. This degrades performance by forcing a task to become similar to other tasks even on their unrelated features. Addressing this gap, we propose a novel multi-task learning model that learns multi-faceted task relationship, allowing tasks to collaborate differentially on different feature subsets. This is achieved by simultaneously learning a low dimensional subspace for task parameters and inducing task groups over each latent subspace basis using a novel combination of L_{1} and pairwise L_{\infty} norms. Further, our model can induce grouping across both positively and negatively related tasks, which helps towards exploiting knowledge from all types of related tasks. We validate our model on two synthetic and five real datasets, and show significant performance improvements over several state of-the-art multi-task learning techniques. Thus our model effectively answers for each task: What shall I share and with whom? @INPROCEEDINGS { gupta_rana_phung_venkatesh_sdm15,    AUTHOR = { Gupta, Sunil and Rana, Santu and Phung, Dinh and Venkatesh, Svetha },    TITLE = { What shall I share and with Whom? - A Multi-Task Learning Formulation using Multi-Faceted Task Relationships },    BOOKTITLE = { SIAM Intl. Conf. on Data Mining (SDM) },    YEAR = { 2015 },    PAGES = { 703-711 },    ADDRESS = { Vancouver, Canada },    MONTH = { May },    ABSTRACT = { Multi-task learning is a learning paradigm that improves the performance of "related" tasks through their joint learning. To do this each task answers the question "Which other task should I share with"? This task relatedness can be complex - a task may be related to one set of tasks based on one subset of features and to other tasks based on other subsets. Existing multi-task learning methods do not explicitly model this reality, learning a single-faceted task relationship over all the features. This degrades performance by forcing a task to become similar to other tasks even on their unrelated features. Addressing this gap, we propose a novel multi-task learning model that learns multi-faceted task relationship, allowing tasks to collaborate differentially on different feature subsets. This is achieved by simultaneously learning a low dimensional subspace for task parameters and inducing task groups over each latent subspace basis using a novel combination of L_{1} and pairwise L_{\infty} norms. Further, our model can induce grouping across both positively and negatively related tasks, which helps towards exploiting knowledge from all types of related tasks. We validate our model on two synthetic and five real datasets, and show significant performance improvements over several state of-the-art multi-task learning techniques. Thus our model effectively answers for each task: What shall I share and with whom? },    DOI = { 10.1137/1.9781611974010.79 },    FILE = { :gupta_rana_phung_venkatesh_sdm15 - What Shall I Share and with Whom_ a Multi Task Learning Formulation Using Multi Faceted Task Relationships.pdf:PDF },    OWNER = { thinng },    TIMESTAMP = { 2015.09.16 },    URL = { http://epubs.siam.org/doi/abs/10.1137/1.9781611974010.79 },} C
 Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines Tran, Truyen, Nguyen, Tu, Phung, Dinh and Venkatesh, Svetha. Journal of Biomedical Informatics (JBI), 54:96-105, April 2015. [ | | pdf] Electronic medical record (EMR) offers promises for novel analytics. However, manual feature engineering from \{EMR\} is labor intensive because \{EMR\} is complex – it contains temporal, mixed-type and multimodal data packed in irregular episodes. We present a computational framework to harness \{EMR\} with minimal human supervision via restricted Boltzmann machine (RBM). The framework derives a new representation of medical objects by embedding them in a low-dimensional vector space. This new representation facilitates algebraic and statistical manipulations such as projection onto 2D plane (thereby offering intuitive visualization), object grouping (hence enabling automated phenotyping), and risk stratification. To enhance model interpretability, we introduced two constraints into model parameters: (a) nonnegative coefficients, and (b) structural smoothness. These result in a novel model called eNRBM (EMR-driven nonnegative RBM). We demonstrate the capability of the eNRBM on a cohort of 7578 mental health patients under suicide risk assessment. The derived representation not only shows clinically meaningful feature grouping but also facilitates short-term risk stratification. The F-scores, 0.21 for moderate-risk and 0.36 for high-risk, are significantly higher than those obtained by clinicians and competitive with the results obtained by support vector machines. @ARTICLE { tran_nguyen_phung_venkatesh_bi15learning,    AUTHOR = { Tran, Truyen and Nguyen, Tu and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Learning vector representation of medical objects via {EMR}-driven nonnegative restricted {B}oltzmann machines },    JOURNAL = { Journal of Biomedical Informatics (JBI) },    YEAR = { 2015 },    VOLUME = { 54 },    PAGES = { 96--105 },    MONTH = { April },    ABSTRACT = { Electronic medical record (EMR) offers promises for novel analytics. However, manual feature engineering from \{EMR\} is labor intensive because \{EMR\} is complex – it contains temporal, mixed-type and multimodal data packed in irregular episodes. We present a computational framework to harness \{EMR\} with minimal human supervision via restricted Boltzmann machine (RBM). The framework derives a new representation of medical objects by embedding them in a low-dimensional vector space. This new representation facilitates algebraic and statistical manipulations such as projection onto 2D plane (thereby offering intuitive visualization), object grouping (hence enabling automated phenotyping), and risk stratification. To enhance model interpretability, we introduced two constraints into model parameters: (a) nonnegative coefficients, and (b) structural smoothness. These result in a novel model called eNRBM (EMR-driven nonnegative RBM). We demonstrate the capability of the eNRBM on a cohort of 7578 mental health patients under suicide risk assessment. The derived representation not only shows clinically meaningful feature grouping but also facilitates short-term risk stratification. The F-scores, 0.21 for moderate-risk and 0.36 for high-risk, are significantly higher than those obtained by clinicians and competitive with the results obtained by support vector machines. },    DOI = { 10.1016/j.jbi.2015.01.012 },    FILE = { :tran_nguyen_phung_venkatesh_bi15learning - Learning Vector Representation of Medical Objects Via EMR Driven Nonnegative Restricted Boltzmann Machines.pdf:PDF },    KEYWORDS = { Electronic medical records, Vector representation, Medical objects embedding, Feature grouping, uicide risk stratification },    TIMESTAMP = { 2015.01.29 },    URL = { http://www.sciencedirect.com/science/article/pii/S1532046415000143 },} J
 Topic Model Kernel Classification With Probabilistically Reduced Features Nguyen, Vu, Phung, Dinh and Venkatesh, Svetha. Journal of Data Science (JDS), 13(2):323-340, April 2015. [ | | pdf] Probabilistic topic models have become a standard in modern machine learning to deal with a wide range of applications. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics interpretation, but could also be informative for classification tasks. In this paper, we describe the Topic Model Kernel (TMK), a topicbased kernel for Support Vector Machine classification on data being processed by probabilistic topic models. The applicability of our proposed kernelis demonstrated in several classification tasks with real world datasets. TMK outperforms existing kernels on the distributional features and give comparative results on nonprobabilistic data types. @ARTICLE { nguyen_phung_venkatesh_jds15,    AUTHOR = { Nguyen, Vu and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Topic Model Kernel Classification With Probabilistically Reduced Features },    JOURNAL = { Journal of Data Science (JDS) },    YEAR = { 2015 },    VOLUME = { 13 },    NUMBER = { 2 },    PAGES = { 323-340 },    MONTH = { April },    ABSTRACT = { Probabilistic topic models have become a standard in modern machine learning to deal with a wide range of applications. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics interpretation, but could also be informative for classification tasks. In this paper, we describe the Topic Model Kernel (TMK), a topicbased kernel for Support Vector Machine classification on data being processed by probabilistic topic models. The applicability of our proposed kernelis demonstrated in several classification tasks with real world datasets. TMK outperforms existing kernels on the distributional features and give comparative results on nonprobabilistic data types. },    FILE = { :nguyen_phung_venkatesh_jds15 - Topic Model Kernel Classification with Probabilistically Reduced Features.pdf:PDF },    KEYWORDS = { Topic Models, Bayesian Nonparametric, Support Vector Machine, Kernel Method, Classification, Dimensionality Reduction },    OWNER = { thinng },    TIMESTAMP = { 2015.01.28 },    URL = { http://www.jds-online.com/file_download/496/6-new.pdf },} J
 Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance Nguyen, Vu, Phung, Dinh, Pham, Duc-Son and Venkatesh, Svetha. Annals of Data Science (AoDS), 2(1):21-41, March 2015. [ | | pdf] In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events. @ARTICLE { nguyen_phung_pham_venkatesh_aods15bayesian,    AUTHOR = { Nguyen, Vu and Phung, Dinh and Pham, Duc-Son and Venkatesh, Svetha },    TITLE = { {B}ayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance },    JOURNAL = { Annals of Data Science (AoDS) },    YEAR = { 2015 },    VOLUME = { 2 },    NUMBER = { 1 },    PAGES = { 21--41 },    MONTH = { March },    ABSTRACT = { In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events. },    DOI = { 10.1007/s40745-015-0030-3 },    FILE = { :nguyen_phung_pham_venkatesh_aods15bayesian - Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance.pdf:PDF },    KEYWORDS = { Abnormal detection Bayesian nonparametric User interface Multilevel data structure Video segmentation Spatio-temporal browsing },    OWNER = { dinh },    PUBLISHER = { Springer Berlin Heidelberg },    TIMESTAMP = { 2015.06.10 },    URL = { http://link.springer.com/article/10.1007%2Fs40745-015-0030-3 },} J
 Stable feature selection for clinical prediction: Exploiting \ICD\ tree structure using Tree-Lasso Kamkar, Iman, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. Journal of Biomedical Informatics (JBI), 53:277-290, Feb. 2015. [ | | pdf] Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In \{EMR\} data, patients’ diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l 1 -penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis. @ARTICLE { kamkar_gupta_phung_venkatesh_bi15,    AUTHOR = { Kamkar, Iman and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Stable feature selection for clinical prediction: Exploiting \{ICD\} tree structure using Tree-Lasso },    JOURNAL = { Journal of Biomedical Informatics (JBI) },    YEAR = { 2015 },    VOLUME = { 53 },    PAGES = { 277--290 },    MONTH = { Feb. },    ISSN = { 1532-0464 },    ABSTRACT = { Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In \{EMR\} data, patients’ diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l 1 -penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis. },    DOI = { http://dx.doi.org/10.1016/j.jbi.2014.11.013 },    FILE = { :kamkar_gupta_phung_venkatesh_bi15 - Stable Feature Selection for Clinical Prediction_ Exploiting ICD Tree Structure Using Tree Lasso.pdf:PDF },    KEYWORDS = { Feature selection, Lasso, Tree-Lasso, Feature stability, Classification },    URL = { http://www.sciencedirect.com/science/article/pii/S1532046414002639 },} J
 Tree-based Iterated Local Search for Markov Random Fields with Applications in Image Analysis Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Journal of Heuristics, 21(1):25-45, Feb. 2015. [ | | pdf] The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain. @ARTICLE { tran_phung_venkatesh_jh15,    AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Tree-based Iterated Local Search for {M}arkov {R}andom {F}ields with Applications in Image Analysis },    JOURNAL = { Journal of Heuristics },    YEAR = { 2015 },    VOLUME = { 21 },    NUMBER = { 1 },    PAGES = { 25--45 },    MONTH = { Feb. },    ABSTRACT = { The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain. },    DOI = { 10.1007/s10732-014-9270-1 },    FILE = { :tran_phung_venkatesh_jh15 - Tree Based Iterated Local Search for Markov Random Fields with Applications in Image Analysis.pdf:PDF },    KEYWORDS = { Iterated local search, Strong local search, Belief propagation, Markov random fields, MAP assignment },    OWNER = { tund },    PUBLISHER = { Springer },    TIMESTAMP = { 2014.10.14 },    URL = { http://link.springer.com/article/10.1007%2Fs10732-014-9270-1 },} J
 Tensor-variate Restricted Boltzmann Machines Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 29th AAAI Conference on Artificial Intelligence (AAAI), pages 2887-2893, Austin Texas, USA, January 2015. [ | | pdf] Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance. @INPROCEEDINGS { nguyen_tran_phung_venkatesh_aaai15,    AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    TITLE = { Tensor-variate Restricted {B}oltzmann Machines },    BOOKTITLE = { 29th AAAI Conference on Artificial Intelligence (AAAI) },    YEAR = { 2015 },    PAGES = { 2887--2893 },    ADDRESS = { Austin Texas, USA },    MONTH = { January },    ABSTRACT = { Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance. },    FILE = { :nguyen_tran_phung_venkatesh_aaai15 - Tensor Variate Restricted Boltzmann Machines.pdf:PDF },    KEYWORDS = { tensor; rbm; restricted boltzmann machine; tvrbm; multiplicative interaction; eeg; },    OWNER = { ngtu },    TIMESTAMP = { 2015.01.29 },    URL = { http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/download/9371/9956 },} C
 Continuous discovery of co-location contexts from Bluetooth data Nguyen, T., Gupta, S., Venkatesh, S. and Phung, D.. Pervasive and Mobile Computing (PMC), 16(B):286 - 304, Jan. 2015. [ | | pdf] The discovery of context is important for context-aware applications in pervasive computing. This problem is challenging because of the stream nature of data, the complexity and changing nature of contexts. We propose a Bayesiannonparametric model for the detection of co-location contexts from Bluetooth signals. By using an Indian buffet process as the prior distribution, the model can discover the number of contexts automatically. We introduce a novel fixed-lag particle filter that process data incrementally. This sampling scheme is especially suitable for pervasive computing as the computational requirements remain constant in spite of growing data. We examine our model on a synthetic dataset and two real world datasets. To verify the discovered contexts, we compare them to the communities detected by the Louvain method, showing a strong correlation between the results of the two methods. The fixed-lag particle filter is compared with the Gibbs sampling in terms of the normalized factorization error that shows a close performance between the two inference methods. As the fixed-lag particle filter process a small chunk of data when it comes and does not need to be restarted, its execution time is significantly shorter than that of the Gibbs sampling. @ARTICLE { nguyen_gupta_venkatesh_phung_pmc15,    AUTHOR = { Nguyen, T. and Gupta, S. and Venkatesh, S. and Phung, D. },    TITLE = { Continuous discovery of co-location contexts from {B}luetooth data },    JOURNAL = { Pervasive and Mobile Computing (PMC) },    YEAR = { 2015 },    VOLUME = { 16 },    NUMBER = { B },    PAGES = { 286 - 304 },    MONTH = { Jan. },    ISSN = { 1574-1192 },    ABSTRACT = { The discovery of context is important for context-aware applications in pervasive computing. This problem is challenging because of the stream nature of data, the complexity and changing nature of contexts. We propose a Bayesiannonparametric model for the detection of co-location contexts from Bluetooth signals. By using an Indian buffet process as the prior distribution, the model can discover the number of contexts automatically. We introduce a novel fixed-lag particle filter that process data incrementally. This sampling scheme is especially suitable for pervasive computing as the computational requirements remain constant in spite of growing data. We examine our model on a synthetic dataset and two real world datasets. To verify the discovered contexts, we compare them to the communities detected by the Louvain method, showing a strong correlation between the results of the two methods. The fixed-lag particle filter is compared with the Gibbs sampling in terms of the normalized factorization error that shows a close performance between the two inference methods. As the fixed-lag particle filter process a small chunk of data when it comes and does not need to be restarted, its execution time is significantly shorter than that of the Gibbs sampling. },    DOI = { 10.1016/j.pmcj.2014.12.005 },    FILE = { :nguyen_gupta_venkatesh_phung_pmc15 - Continuous Discovery of Co Location Contexts from Bluetooth Data.pdf:PDF },    KEYWORDS = { Nonparametric, Indian buffet process, Incremental, Particle filter, Co-location context },    OWNER = { Thuong Nguyen },    PUBLISHER = { Elsevier },    TIMESTAMP = { 2014.12.18 },    URL = { http://www.sciencedirect.com/science/article/pii/S1574119214001941 },} J
 Visual Object Clustering via Mixed-Norm Regularization Zhang, Xin, Pham, Duc-Son, Phung, Dinh, Liu, Wanquan, Saha, Budhaditya and Venkatesh, Svetha. In Winter Conference on Applications of Computer Vision (WACV), pages 1030-1037, Jan. 2015. [ | | pdf] Many vision problems deal with high-dimensional data, such as motion segmentation and face clustering. However, these high-dimensional data usually lie in a low-dimensional structure. Sparse representation is a powerful principle for solving a number of clustering problems with high-dimensional data. This principle is motivated from an ideal modeling of data points according to linear algebra theory. However, real data in computer vision are unlikely to follow the ideal model perfectly. In this paper, we exploit the mixed norm regularization for sparse subspace clustering. This regularization term is a convex combination of the ℓ1 norm, which promotes sparsity at the individual level and the block norm ℓ2/1 which promotes group sparsity. Combining these powerful regularization terms will provide a more accurate modeling, subsequently leading to a better solution for the affinity matrix used in sparse subspace clustering. This could help us achieve better performance on motion segmentation and face clustering problems. This formulation also caters for different types of data corruptions. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other state-of-arts on both motion segmentation and face clustering. @INPROCEEDINGS { zhang_pham_phung_liu_budhaditya_venkatesh_wacv15,    AUTHOR = { Zhang, Xin and Pham, Duc-Son and Phung, Dinh and Liu, Wanquan and Saha, Budhaditya and Venkatesh, Svetha },    TITLE = { Visual Object Clustering via Mixed-Norm Regularization },    BOOKTITLE = { Winter Conference on Applications of Computer Vision (WACV) },    YEAR = { 2015 },    PAGES = { 1030--1037 },    MONTH = { Jan. },    ABSTRACT = { Many vision problems deal with high-dimensional data, such as motion segmentation and face clustering. However, these high-dimensional data usually lie in a low-dimensional structure. Sparse representation is a powerful principle for solving a number of clustering problems with high-dimensional data. This principle is motivated from an ideal modeling of data points according to linear algebra theory. However, real data in computer vision are unlikely to follow the ideal model perfectly. In this paper, we exploit the mixed norm regularization for sparse subspace clustering. This regularization term is a convex combination of the ℓ1 norm, which promotes sparsity at the individual level and the block norm ℓ2/1 which promotes group sparsity. Combining these powerful regularization terms will provide a more accurate modeling, subsequently leading to a better solution for the affinity matrix used in sparse subspace clustering. This could help us achieve better performance on motion segmentation and face clustering problems. This formulation also caters for different types of data corruptions. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other state-of-arts on both motion segmentation and face clustering. },    DOI = { 10.1109/WACV.2015.142 },    FILE = { :zhang_pham_phung_liu_budhaditya_venkatesh_wacv15 - Visual Object Clustering Via Mixed Norm Regularization.pdf:PDF },    KEYWORDS = { computer vision;image segmentation;matrix algebra;pattern clustering;alternating direction method of multipliers framework;computer vision;face clustering problems;linear algebra theory;mixed-norm regularization;motion segmentation;sparse representation;sparse subspace clustering;visual object clustering problem;Clustering algorithms;Computer vision;Data models;Educational institutions;Face;Motion segmentation;Sparse matrices },    OWNER = { Dinh },    TIMESTAMP = { 2015.02.03 },    URL = { http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7045996 },} C
 Web search activity data accurately predicts population chronic disease risk in the United States Nguyen, Thin, Tran, Truyen, Luo, Wei, Gupta, Sunil, Rana, Santu, Phung, Dinh, Nichols, Melanie, Millar, Lynne, Venkatesh, Svetha and Allender, Steve. Journal of Epidemiology \& Community Health, 69(7):693-699, Jan. 2015. [ | | pdf] Background The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors.Methods Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r.Results For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93.Conclusions The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts. @ARTICLE { nguyen_tran_luo_gupta_rana_phung_nichols_millar_venkatesh_allender_jech15,    AUTHOR = { Nguyen, Thin and Tran, Truyen and Luo, Wei and Gupta, Sunil and Rana, Santu and Phung, Dinh and Nichols, Melanie and Millar, Lynne and Venkatesh, Svetha and Allender, Steve },    TITLE = { Web search activity data accurately predicts population chronic disease risk in the {U}nited {S}tates },    JOURNAL = { Journal of Epidemiology \& Community Health },    YEAR = { 2015 },    VOLUME = { 69 },    NUMBER = { 7 },    PAGES = { 693--699 },    MONTH = { Jan. },    ISSN = { 1949-3045 },    ABSTRACT = { Background The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors.Methods Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r.Results For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93.Conclusions The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts. },    DOI = { 10.1136/jech-2014-204523 },    FILE = { :nguyen_tran_luo_gupta_rana_phung_nichols_millar_venkatesh_allender_jech15 - Web Search Activity Data Accurately Predicts Population Chronic Disease Risk in the United States.pdf:PDF },    OWNER = { thinng },    TIMESTAMP = { 2015.01.28 },    URL = { http://jech.bmj.com/content/69/7/693.abstract },} J

Invalid BibTex Entry!

 2014
 A Random Finite Set Model for Data Clustering Phung, D. and Vo, B.N.. In Proceedings of International Conference on Fusion (FUSION), Salamanca, Spain, July 2014. [ | | pdf] Abstract--- The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data. @CONFERENCE { phung_vo_fusion14,    TITLE = { A Random Finite Set Model for Data Clustering },    AUTHOR = { Phung, D. and Vo, B.N. },    BOOKTITLE = { Proceedings of International Conference on Fusion (FUSION) },    YEAR = { 2014 },    ADDRESS = { Salamanca, Spain },    MONTH = { July },    ABSTRACT = { Abstract--- The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data. },    OWNER = { dinh },    TIMESTAMP = { 2014.05.16 },    URL = { http://prada-research.net/~dinh/uploads/Main/Publications/phung_vo_fusion14.pdf },} C
 Learning Latent Activities from Social Signals with Hierarchical Dirichlet Process Phung, D., Nguyen, T. C., Gupta, S. and Venkatesh, S.. In Handbook on Plan, Activity, and Intent Recognition, pages 149-174.Elsevier, , 2014. [ | | pdf | code] Understanding human activities is an important research topic, noticeably in assisted living and health monitoring. Beyond simple forms of activity (e.g., RFID event of entering a building), learning latent activities that are more semantically interpretable, such as sitting at a desk, meeting with people or gathering with friends, remains a challenging problem. Supervised learning has been the typical modeling choice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adapt to the continuing growth of data over time. In this chapter, we explore Bayesian nonparametric method, in particular the Hierarchical Dirichlet Process, to infer latent activities from sensor data acquired in a pervasive setting. Our framework is unsupervised, requires no labeled data and is able to discover new activities as data grows. We present experiments on extracting movement and interaction activities from sociometric badge signals and show how to use them for detection of sub-communities. Using the popular Reality Mining dataset, we further demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups. @INCOLLECTION { phung_nguyen_gupta_venkatesh_pair14,    TITLE = { Learning Latent Activities from Social Signals with Hierarchical {D}irichlet Process },    AUTHOR = { Phung, D. and Nguyen, T. C. and Gupta, S. and Venkatesh, S. },    BOOKTITLE = { Handbook on Plan, Activity, and Intent Recognition },    PUBLISHER = { Elsevier },    YEAR = { 2014 },    EDITOR = { Gita Sukthankar and Christopher Geib and David V. Pynadath and Hung Bui and Robert P. Goldman },    PAGES = { 149--174 },    ABSTRACT = { Understanding human activities is an important research topic, noticeably in assisted living and health monitoring. Beyond simple forms of activity (e.g., RFID event of entering a building), learning latent activities that are more semantically interpretable, such as sitting at a desk, meeting with people or gathering with friends, remains a challenging problem. Supervised learning has been the typical modeling choice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adapt to the continuing growth of data over time. In this chapter, we explore Bayesian nonparametric method, in particular the Hierarchical Dirichlet Process, to infer latent activities from sensor data acquired in a pervasive setting. Our framework is unsupervised, requires no labeled data and is able to discover new activities as data grows. We present experiments on extracting movement and interaction activities from sociometric badge signals and show how to use them for detection of sub-communities. Using the popular Reality Mining dataset, we further demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups. },    CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code },    OWNER = { ctng },    TIMESTAMP = { 2013.07.25 },    URL = { http://prada-research.net/~dinh/uploads/Main/Publications/Phung_etal_pair14.pdf },} BC
 Proceedings of the Sixth Asian Conference on Machine Learning Phung, Dinh and Li, Hang, editor. volume 39 of JMLR Workshop and Conference Proceedings, JMLR, Nov. 2014. [ | | pdf] @PROCEEDINGS { phung_li_acml14proceedings,    TITLE = { Proceedings of the Sixth Asian Conference on Machine Learning },    YEAR = { 2014 },    EDITOR = { Phung, Dinh and Li, Hang },    MONTH = { Nov. },    PUBLISHER = { JMLR },    SERIES = { JMLR Workshop and Conference Proceedings },    VOLUME = { 39 },    LOCATION = { Nha Trang, Vietnam },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2016.04.11 },    URL = { http://jmlr.org/proceedings/papers/v39/ },} P
 Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts Nguyen, V., Phung, D., Venkatesh, S. Nguyen, X.L. and Bui, H.. In Proc. of International Conference on Machine Learning (ICML), pages 288-296, Beijing, China, 2014. [ | ] We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polyaurn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains. @INPROCEEDINGS { nguyen_phung_nguyen_venkatesh_bui_icml14,    TITLE = { {B}ayesian Nonparametric Multilevel Clustering with Group-Level Contexts },    AUTHOR = { Nguyen, V. and Phung, D. and Venkatesh, S. Nguyen, X.L. and Bui, H. },    BOOKTITLE = { Proc. of International Conference on Machine Learning (ICML) },    YEAR = { 2014 },    ADDRESS = { Beijing, China },    PAGES = { 288--296 },    ABSTRACT = { We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polyaurn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains. },    OWNER = { tvnguye },    TIMESTAMP = { 2013.12.13 },} C
 Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter Vo, B-N, Vo, B-T and Phung, Dinh. IEEE Transactions on Signal Processing, 62(24):6554-6567, 2014. [ | ] @ARTICLE { vo_vo_phung_isp14,    TITLE = { Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter },    AUTHOR = { Vo, B-N and Vo, B-T and Phung, Dinh },    JOURNAL = { IEEE Transactions on Signal Processing },    YEAR = { 2014 },    NUMBER = { 24 },    PAGES = { 6554--6567 },    VOLUME = { 62 },    OWNER = { dinh },    TIMESTAMP = { 2014.07.02 },} J
 Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions Gupta, S., Rana, S., Phung, D. and Venkatesh, S.. In Proc. of SIAM Int. Conference on Data Mining (SDM) (accepted), Philadelphia, Pennsylvania, USA, April 2014. [ | ] @INPROCEEDINGS { gupta_rana_phung_venkatesh_sdm14,    TITLE = { Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions },    AUTHOR = { Gupta, S. and Rana, S. and Phung, D. and Venkatesh, S. },    BOOKTITLE = { Proc. of SIAM Int. Conference on Data Mining (SDM) (accepted) },    YEAR = { 2014 },    ADDRESS = { Philadelphia, Pennsylvania, USA },    MONTH = { April },    OWNER = { Thuongnc },    TIMESTAMP = { 2014.01.05 },} C
 Stabilized Sparse Ordinal Regression for Medical Risk Stratification Truyen Tran, Dinh Phung, Wei Luo and Svetha Venkatesh. Knowledge and Information Systems (KAIS), 2014. [ | ] The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability. @ARTICLE { tran_phung_luo_venkatesh_kais14,    TITLE = { Stabilized Sparse Ordinal Regression for Medical Risk Stratification },    AUTHOR = { Truyen Tran and Dinh Phung and Wei Luo and Svetha Venkatesh },    JOURNAL = { Knowledge and Information Systems (KAIS) },    YEAR = { 2014 },    PAGES = { (accepted for publication on 17 Jan 2014) },    ABSTRACT = { The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability. },    OWNER = { dinh },    TIMESTAMP = { 2014.01.28 },} J
 Tree-based Iterated Local Search for Markov Random Fields with Applications in Image Analysis Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Journal of Heuristics, 2015. [ | | pdf] The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain. @ARTICLE { tran_phung_venkatesh_jh14,    TITLE = { Tree-based Iterated Local Search for Markov Random Fields with Applications in Image Analysis },    AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    JOURNAL = { Journal of Heuristics },    YEAR = { 2015 },    PAGES = { accepted on 8 Nov 2014 },    ABSTRACT = { The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain. },    OWNER = { tund },    PUBLISHER = { Springer },    TIMESTAMP = { 2014.10.14 },    URL = { http://link.springer.com/article/10.1007%2Fs10732-014-9270-1 },} J
 Stabilizing Sparse Cox Model using Clinical Structures in Electronic Medical Records Gopakumar, Shivapratap, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proceedings of the Second International Workshop on Pattern Recognition for Healthcare Analytics, 2014. [ | | pdf] Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in highdimensional data which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using clinical structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using a feature graph derived from two types of EMR structures: temporal structure of disease and intervention recurrences, and hierarchical structure of medical knowledge and practices. We demonstrate the efficacy of the method in predicting time-to-readmission of heart failure patients. On two stability measures – the Jaccard index and the Consistency index – the use of clinical structures significantly increased feature stability without hurting discriminative power. Our model reported a competitive AUC of 0.64 (95% CIs: [0.58,0.69]) for 6 months prediction. @INPROCEEDINGS { gopakumar_tran_phung_venkatesh_icpr_ws14,    TITLE = { Stabilizing Sparse Cox Model using Clinical Structures in Electronic Medical Records },    AUTHOR = { Gopakumar, Shivapratap and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { Proceedings of the Second International Workshop on Pattern Recognition for Healthcare Analytics },    YEAR = { 2014 },    ABSTRACT = { Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in highdimensional data which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using clinical structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using a feature graph derived from two types of EMR structures: temporal structure of disease and intervention recurrences, and hierarchical structure of medical knowledge and practices. We demonstrate the efficacy of the method in predicting time-to-readmission of heart failure patients. On two stability measures – the Jaccard index and the Consistency index – the use of clinical structures significantly increased feature stability without hurting discriminative power. Our model reported a competitive AUC of 0.64 (95% CIs: [0.58,0.69]) for 6 months prediction. },    URL = { https://sites.google.com/site/iwprha2/proceedings },} C
 Individualized Arrhythmia Detection with ECG Signals from Wearable Devices Nguyen, Thanh-Binh, Luo, Wei, Caelli, Terry, Venkatesh, Svetha and Phung, Dinh. In The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014), Shanghai,China, 2014. [ | ] Low cost pervasive electrocardiogram (ECG) monitors is changing how sinus arrhythmia are diagnosed among patients with mild symptoms. With the large amount of data generated from long-term monitoring, come new data science and analytical challenges. Although traditional rule-based detection algorithms still work on relatively short clinical quality ECG, they are not optimal for pervasive signals collected from wearable devices—they don’t adapt to individual difference and assume accurate identification of ECG fiducial points. To overcome these short-comings of the rule-based methods, this paper introduces an arrhythmia detection approach for low quality pervasive ECG signals. To achieve the robustness needed, two techniques were applied. First, a set of ECG features with minimal reliance on fiducial point identification were selected. Next, the features were normalized using robust statistics to factors out baseline individual differences and clinically irrelevant temporal drift that is common in pervasive ECG. The proposed method was evaluated using pervasive ECG signals we collected, in combination with clinician validated ECG signals from Physiobank. Empirical evaluation confirms accuracy improvements of the proposed approach over the traditional clinical rules. @INPROCEEDINGS { nguyen_luo_caelli_venkatesh_phung_dsaa14,    TITLE = { Individualized Arrhythmia Detection with ECG Signals from Wearable Devices },    AUTHOR = { Nguyen, Thanh-Binh and Luo, Wei and Caelli, Terry and Venkatesh, Svetha and Phung, Dinh },    BOOKTITLE = { The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014) },    YEAR = { 2014 },    ADDRESS = { Shanghai,China },    ABSTRACT = { Low cost pervasive electrocardiogram (ECG) monitors is changing how sinus arrhythmia are diagnosed among patients with mild symptoms. With the large amount of data generated from long-term monitoring, come new data science and analytical challenges. Although traditional rule-based detection algorithms still work on relatively short clinical quality ECG, they are not optimal for pervasive signals collected from wearable devices—they don’t adapt to individual difference and assume accurate identification of ECG fiducial points. To overcome these short-comings of the rule-based methods, this paper introduces an arrhythmia detection approach for low quality pervasive ECG signals. To achieve the robustness needed, two techniques were applied. First, a set of ECG features with minimal reliance on fiducial point identification were selected. Next, the features were normalized using robust statistics to factors out baseline individual differences and clinically irrelevant temporal drift that is common in pervasive ECG. The proposed method was evaluated using pervasive ECG signals we collected, in combination with clinician validated ECG signals from Physiobank. Empirical evaluation confirms accuracy improvements of the proposed approach over the traditional clinical rules. },    COMMENT = { coauthor },    OWNER = { dbdao },    TIMESTAMP = { 2014.08.21 },} C
 Unsupervised Inference of Significant Locations from WiFi Data for Understanding Human Dynamics Nguyen, Thanh-Binh, Nguyen, Thuong C., Luo, Wei, Venkatesh , Svetha and Phung, Dinh. In The 13th International Conference on Mobile and Ubiquitous Multimedia (MUM2014), pages 232-235, 2014. [ | | pdf] Motion and location are essential to understand human dynamics. This paper presents a method to discover significant locations and daily routines of individuals from WiFi data, which is considered more suitable for study of human dynamics than GPS data. Our method determines significant locations by clustering access points in close proximity using the Affinity Propagation algorithm, which has the advantage of automatically determining the number of locations. We demonstrate our method on the MDC dataset that includes more than 30 million WiFi scans. The experimental results show good clustering performance and also superior temporal coverage in comparison to a multimodal approach on the same dataset. From the discovered location trajectories, we can learn interesting mobility patterns of mobile phone users. The human dynamics of participants is reflected through the entropy of the location distributions which shows interesting correlation with the age and occupations of users. @INPROCEEDINGS { nguyen_nguyen_lou_venkatesh_phung_mum14,    TITLE = { Unsupervised Inference of Significant Locations from WiFi Data for Understanding Human Dynamics },    AUTHOR = { Nguyen, Thanh-Binh and Nguyen, Thuong C. and Luo, Wei and Venkatesh , Svetha and Phung, Dinh },    BOOKTITLE = { The 13th International Conference on Mobile and Ubiquitous Multimedia (MUM2014) },    YEAR = { 2014 },    PAGES = { 232--235 },    ABSTRACT = { Motion and location are essential to understand human dynamics. This paper presents a method to discover significant locations and daily routines of individuals from WiFi data, which is considered more suitable for study of human dynamics than GPS data. Our method determines significant locations by clustering access points in close proximity using the Affinity Propagation algorithm, which has the advantage of automatically determining the number of locations. We demonstrate our method on the MDC dataset that includes more than 30 million WiFi scans. The experimental results show good clustering performance and also superior temporal coverage in comparison to a multimodal approach on the same dataset. From the discovered location trajectories, we can learn interesting mobility patterns of mobile phone users. The human dynamics of participants is reflected through the entropy of the location distributions which shows interesting correlation with the age and occupations of users. },    DOI = { 2677972.2677997 },    FILE = { :papers\\activityrecognition\\nguyen_nguyen_lou_venkatesh_phung_mum14.pdf:PDF },    OWNER = { Thanh-Binh Nguyen },    TIMESTAMP = { 2014.10.20 },    URL = { http://dl.acm.org/citation.cfm?id=2677972.2677997&coll=DL&dl=ACM&CFID=590574626&CFTOKEN=81216827 },} C
 Analysis of Circadian Rhythms from Online Communities of Individuals with Affective Disorders Dao, Bo, Nguyen, Thin, Phung, Dinh and Venkatesh, Svetha. In The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014), Shanghai,China, 2014. [ | ] @INPROCEEDINGS { dao_nguyen_phung_venkatesh_dsaa14,    TITLE = { Analysis of Circadian Rhythms from Online Communities of Individuals with Affective Disorders },    AUTHOR = { Dao, Bo and Nguyen, Thin and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014) },    YEAR = { 2014 },    ADDRESS = { Shanghai,China },    COMMENT = { coauthor },    OWNER = { dbdao },    TIMESTAMP = { 2014.08.21 },} C
 Topic Model Kernel Classification With Probabilistically Reduced Features V. Nguyen, D. Phung and S. Venkatesh. Journal of Data Science, 2014. [ | ] @ARTICLE { nguyen_phung_venkatesh_jds14,    TITLE = { Topic Model Kernel Classification With Probabilistically Reduced Features },    AUTHOR = { V. Nguyen and D. Phung and S. Venkatesh },    JOURNAL = { Journal of Data Science },    YEAR = { 2014 },    PAGES = { accepted on 27/10/2014 },    OWNER = { tvnguye },    TIMESTAMP = { 2014.11.03 },} J
 Affective and Content Analysis of Online Depression Communities Thin Nguyen, Dinh Phung, Bo Dao, Svetha Venkatesh and Michael Berk. IEEE Transactions on Affective Computing, 2014. [ | | pdf] A large number of people use online communities to discuss mental health issues, thus offering opportunities for new understanding of these communities. This paper aims to study the characteristics of online depression communities (CLINICAL) in comparison with those joining other online communities (CONTROL). We use machine learning and statistical methods to discriminate online messages between depression and control communities using mood, psycholinguistic processes and content topics extracted from the posts generated by members of these communities. All aspects including mood, the written content and writing style are found to be significantly different between two types of communities. Sentiment analysis shows the clinical group have lower valence than people in the control group. For language styles and topics, statistical tests reject the hypothesis of equality on psycholinguistic processes and topics between two groups. We show good predictive validity in depression classification using topics and psycholinguistic clues as features. Clear discrimination between writing styles and contents, with good predictive power is an important step in understanding social media and its use in mental health. @ARTICLE { nguyen_phung_dao_venkatesh_berk_tac14,    TITLE = { Affective and Content Analysis of Online Depression Communities },    AUTHOR = { Thin Nguyen and Dinh Phung and Bo Dao and Svetha Venkatesh and Michael Berk },    JOURNAL = { IEEE Transactions on Affective Computing },    YEAR = { 2014 },    PAGES = { (to appear) },    ABSTRACT = { A large number of people use online communities to discuss mental health issues, thus offering opportunities for new understanding of these communities. This paper aims to study the characteristics of online depression communities (CLINICAL) in comparison with those joining other online communities (CONTROL). We use machine learning and statistical methods to discriminate online messages between depression and control communities using mood, psycholinguistic processes and content topics extracted from the posts generated by members of these communities. All aspects including mood, the written content and writing style are found to be significantly different between two types of communities. Sentiment analysis shows the clinical group have lower valence than people in the control group. For language styles and topics, statistical tests reject the hypothesis of equality on psycholinguistic processes and topics between two groups. We show good predictive validity in depression classification using topics and psycholinguistic clues as features. Clear discrimination between writing styles and contents, with good predictive power is an important step in understanding social media and its use in mental health. },    DOI = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6784326 },    OWNER = { thinng },    TIMESTAMP = { 2014.03.31 },    URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6784326 },} J
 Regularizing Topic Discovery in EMRs with Side Information by Using Hierarchical Bayesian Models Li, C., Rana, S. and Phung, D.and Venkatesh, S.. In Proceedings of International Conference on Pattern Recognition (ICPR) (accepted), 2014. [ | ] Abstract--- We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wddCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measure. @INPROCEEDINGS { li_rana_phung_venkatesh_icpr14,    TITLE = { Regularizing Topic Discovery in EMRs with Side Information by Using Hierarchical Bayesian Models },    AUTHOR = { Li, C. and Rana, S. and Phung, D.and Venkatesh, S. },    BOOKTITLE = { Proceedings of International Conference on Pattern Recognition (ICPR) (accepted) },    YEAR = { 2014 },    ABSTRACT = { Abstract--- We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wddCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measure. },    OWNER = { chengl },    TIMESTAMP = { 2014.03.27 },} C
 Effect of Mood, Social Connectivity and Age in Online Depression Community via Topic and Linguistic Analysis Dao, Bo, Nguyen, Thin, Phung, Dinh and Venkatesh, Svetha. In International Conference on Web Information System Engineering (WISE 2014), Thessaloniki, Greece, 2014. [ | | pdf?] Depression afflicts one in four people during their lives. Several studies have shown that for the isolated and mentally ill, the Web and social media provide effective platforms for supports and treatments as well as to acquire scientific, clinical understanding of this mental condition. More and more individuals affected by depression join online communities to seek for information, express themselves, share their concerns and look for supports [11]. For the first time, we collect and study a large online depression community of more than 12,000 active members from Live Journal. We examine the effect of mood, social connectivity and age on the online messages authored by members in an online depression community. The posts are considered in two aspects: what is written (topic) and how it is written (language style). We use statistical and machine learning methods to discriminate the posts made by bloggers in low versus high valence mood, in different age categories and in different degrees of social connectivity. Using statistical tests, language styles are found to be significantly different between low and high valence cohorts, whilst topics are significantly different between people whose different degrees of social connectivity. High performance is achieved for low versus high valence post classification using writing style as features. The finding suggests the potential of using social media in depression screening, especially in online setting. @INPROCEEDINGS { dao_nguyen_phung_venkatesh_wise14,    TITLE = { Effect of Mood, Social Connectivity and Age in Online Depression Community via Topic and Linguistic Analysis },    AUTHOR = { Dao, Bo and Nguyen, Thin and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { International Conference on Web Information System Engineering (WISE 2014) },    YEAR = { 2014 },    ADDRESS = { Thessaloniki, Greece },    ABSTRACT = { Depression afflicts one in four people during their lives. Several studies have shown that for the isolated and mentally ill, the Web and social media provide effective platforms for supports and treatments as well as to acquire scientific, clinical understanding of this mental condition. More and more individuals affected by depression join online communities to seek for information, express themselves, share their concerns and look for supports [11]. For the first time, we collect and study a large online depression community of more than 12,000 active members from Live Journal. We examine the effect of mood, social connectivity and age on the online messages authored by members in an online depression community. The posts are considered in two aspects: what is written (topic) and how it is written (language style). We use statistical and machine learning methods to discriminate the posts made by bloggers in low versus high valence mood, in different age categories and in different degrees of social connectivity. Using statistical tests, language styles are found to be significantly different between low and high valence cohorts, whilst topics are significantly different between people whose different degrees of social connectivity. High performance is achieved for low versus high valence post classification using writing style as features. The finding suggests the potential of using social media in depression screening, especially in online setting. },    COMMENT = { coauthor },    OWNER = { dbdao },    TIMESTAMP = { 2014.07.11 },    URL = { 2014\conferences\dao_nguyen_phung_venkatesh_wise14.pdf },} C
 Affective, Linguistic and Topic Patterns in Online Autism Communities Nguyen, Thin, Duong, Thi, Phung, Dinh and Venkatesh, Svetha. In International Conference on Web Information System Engineering (WISE 2014), Thessaloniki, Greece, 2014. [ | | pdf?] Online communities offer a platform to support and discuss health issues. They provide a more accessible way to bring people of the same concerns or interests. This paper aims to study the characteristics of online autism communities (Clinical) in comparison with other online communities (Control) using data from 110 Live Journal weblog communities. Using machine learning techniques, we analyze these online autism communities comprehensively, studying three key aspects expressed in the blog posts made by members of the communities: sentiment, topics and language style. Sentiment analysis shows that the sentiment of the clinical group has lower valence, indicative of poorer moods than people in control. Topics and language style are shown to be good predictors of autism posts. The result shows the potential of social media in medical studies for a broad range of purposes such as screening, monitoring and subsequently providing supports for fragile communities. @INPROCEEDINGS { nguyen_duong_phung_venkatesh_wise14,    TITLE = { Affective, Linguistic and Topic Patterns in Online Autism Communities },    AUTHOR = { Nguyen, Thin and Duong, Thi and Phung, Dinh and Venkatesh, Svetha },    BOOKTITLE = { International Conference on Web Information System Engineering (WISE 2014) },    YEAR = { 2014 },    ADDRESS = { Thessaloniki, Greece },    ABSTRACT = { Online communities offer a platform to support and discuss health issues. They provide a more accessible way to bring people of the same concerns or interests. This paper aims to study the characteristics of online autism communities (Clinical) in comparison with other online communities (Control) using data from 110 Live Journal weblog communities. Using machine learning techniques, we analyze these online autism communities comprehensively, studying three key aspects expressed in the blog posts made by members of the communities: sentiment, topics and language style. Sentiment analysis shows that the sentiment of the clinical group has lower valence, indicative of poorer moods than people in control. Topics and language style are shown to be good predictors of autism posts. The result shows the potential of social media in medical studies for a broad range of purposes such as screening, monitoring and subsequently providing supports for fragile communities. },    COMMENT = { coauthor },    OWNER = { dbdao },    TIMESTAMP = { 2014.07.11 },    URL = { 2014\conferences\nguyen_duong_phung_venkatesh_wise14.pdf },} C
 A Bayesian Nonparametric Framework for Activity Recognition using Accelerometer Data Nguyen, T.C., Gupta, S., Venkatesh, S. and Phung, D.. In Proceedings of 22nd International Conference on Pattern Recognition (ICPR), pages 2017-2022, 2014. [ | ] Monitoring daily physical activity of human plays an important role in preventing the diseases as well as improving health. In this paper, we demonstrate a framework for monitoring the physical activity level in daily life. We collect the data using accelerometer sensors in a realistic setting without any supervision. The ground truth of activities is provided by the participants themselves using an experience sampling application running on mobile phones. The original data is discretized by the hierarchical Dirichlet process (HDP) into different activity levels and the number of levels are inferred automatically. We validate the accuracy of the extracted patterns by using them for the multi-label classification of activities and demonstrate high performances in various standard evaluation metrics. We further show that the extracted patterns are highly correlated to the daily routine of the users. @INPROCEEDINGS { nguyen_gupta_venkatesh_phung_icpr14,    TITLE = { A {B}ayesian Nonparametric Framework for Activity Recognition using Accelerometer Data },    AUTHOR = { Nguyen, T.C. and Gupta, S. and Venkatesh, S. and Phung, D. },    BOOKTITLE = { Proceedings of 22nd International Conference on Pattern Recognition (ICPR) },    YEAR = { 2014 },    PAGES = { 2017--2022 },    ABSTRACT = { Monitoring daily physical activity of human plays an important role in preventing the diseases as well as improving health. In this paper, we demonstrate a framework for monitoring the physical activity level in daily life. We collect the data using accelerometer sensors in a realistic setting without any supervision. The ground truth of activities is provided by the participants themselves using an experience sampling application running on mobile phones. The original data is discretized by the hierarchical Dirichlet process (HDP) into different activity levels and the number of levels are inferred automatically. We validate the accuracy of the extracted patterns by using them for the multi-label classification of activities and demonstrate high performances in various standard evaluation metrics. We further show that the extracted patterns are highly correlated to the daily routine of the users. },    OWNER = { ctng },    TIMESTAMP = { 2014.02.21 },} C
 Nonparametric Discovery of Learning Patterns and Autism Subgroups from Therapeutic Data Vellanki, P., Duong, T., Venkatesh, S. and Phung, D.. In Proceedings of 22nd International Conference on Pattern Recognition (ICPR) (accepted), pages 1829-1833, 2014. [ | ] Autism Spectrum Disorder (ASD) is growing at a staggering rate; but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to the insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Playpad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is correct specification of number of patterns in advance, which in our case is even more difficulty due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular which use Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and NMF. In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals. @INPROCEEDINGS { vellanki_duong_venkatesh_phung_icpr14,    TITLE = { Nonparametric Discovery of Learning Patterns and Autism Subgroups from Therapeutic Data },    AUTHOR = { Vellanki, P. and Duong, T. and Venkatesh, S. and Phung, D. },    BOOKTITLE = { Proceedings of 22nd International Conference on Pattern Recognition (ICPR) (accepted) },    YEAR = { 2014 },    PAGES = { 1829-1833 },    ABSTRACT = { Autism Spectrum Disorder (ASD) is growing at a staggering rate; but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to the insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Playpad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is correct specification of number of patterns in advance, which in our case is even more difficulty due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular which use Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and NMF. In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals. },    OWNER = { pvellank },    TIMESTAMP = { 2014.04.11 },} C
 Risk stratification using data from electronic medical records better predicts suicide risks than clinician assessments T. Tran, W. Luo, D. Phung, H. Richard, M. Berk, L. Kennedy and S. Venkatesh. BMC Psychiatry, 14(1):76, 2014. [ | | pdf] Background To date, our ability to accurately identify patients at high risk from suicidal behaviour, and thus to target interventions, has been fairly limited. This study examined a large pool of factors that are potentially associated with suicide risk from the comprehensive electronic medical record (EMR) and to derive a predictive model for 1–6 month risk. Methods 7,399 patients undergoing suicide risk assessment were followed up for 180 days. The dataset was divided into a derivation and validation cohorts of 4,911 and 2,488 respectively. Clinicians used an 18-point checklist of known risk factors to divide patients into low, medium, or high risk. Their predictive ability was compared with a risk stratification model derived from the EMR data. The model was based on the continuation-ratio ordinal regression method