Dinh Phung - Main - PubBySelected

Note*:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Please also observe the IEEE Copyright, ACM Copyright and Springer Copyright Notices.

Preprints and Selected Papers

OTLDA: A Geometry-aware Optimal Transport Approach for Topic Modeling
Viet Huynh, Ethan Zhao and Dinh Phung. In In Proc. of the thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), dec 2020. [ | | pdf]
We present an optimal transport framework for learning topics from textual data. While the celebrated Latent Dirichlet allocation (LDA) topic model and its variants have been applied to many disciplines, they mainly focus on word-occurrences and neglect to incorporate semantic regularities in language. Even though recent works have tried to exploit the semantic relationship between words to bridge this gap, however, these models which are usually extensions of LDA or Dirichlet Multinomial mixture (DMM) are tailored to deal effectively with either regular or short documents. The optimal transport distance provides an appealing tool to incorporate the geometry of word semantics into it. Moreover, recent developments on efficient computation of optimal transport distance also promote its application in topic modeling. In this paper we ground on optimal transport theory to naturally exploit the geometric structures of semantically related words in embedding spaces which leads to more interpretable learned topics. Comprehensive experiments illustrate that the proposed framework outperforms competitive approaches in terms of topic coherence on assorted text corpora which include both long and short documents. The representation of learned topic also leads to better accuracy on classification downstream tasks, which is considered as an extrinsic evaluation.

@INPROCEEDINGS { huynh_etal_nisp20_OTLDA, AUTHOR = { Viet Huynh and Ethan Zhao and Dinh Phung }, BOOKTITLE = { In Proc. of the thirty-fourth Conference on Neural Information Processing Systems (NeurIPS) }, TITLE = { {OTLDA}: A Geometry-aware Optimal Transport Approach for Topic Modeling }, YEAR = { 2020 }, MONTH = { dec }, ABSTRACT = { We present an optimal transport framework for learning topics from textual data. While the celebrated Latent Dirichlet allocation (LDA) topic model and its variants have been applied to many disciplines, they mainly focus on word-occurrences and neglect to incorporate semantic regularities in language. Even though recent works have tried to exploit the semantic relationship between words to bridge this gap, however, these models which are usually extensions of LDA or Dirichlet Multinomial mixture (DMM) are tailored to deal effectively with either regular or short documents. The optimal transport distance provides an appealing tool to incorporate the geometry of word semantics into it. Moreover, recent developments on efficient computation of optimal transport distance also promote its application in topic modeling. In this paper we ground on optimal transport theory to naturally exploit the geometric structures of semantically related words in embedding spaces which leads to more interpretable learned topics. Comprehensive experiments illustrate that the proposed framework outperforms competitive approaches in terms of topic coherence on assorted text corpora which include both long and short documents. The representation of learned topic also leads to better accuracy on classification downstream tasks, which is considered as an extrinsic evaluation. }, FILE = { :huynh_etal_nisp20_OTLDA - OTLDA_ a Geometry Aware Optimal Transport Approach for Topic Modeling.pdf:PDF }, URL = { https://proceedings.neurips.cc/paper/2020/hash/d800149d2f947ad4d64f34668f8b20f6-Abstract.html }, }

Parameterized Rate-Distortion Stochastic Encoder
Quan Hoang, Trung Le and Dinh Phung. In Proc. of the 37th International Conference on Machine Learning (ICML), 2020. [ | ]

@INPROCEEDINGS { hoang_etal_icml20_parameterized, AUTHOR = { Quan Hoang and Trung Le and Dinh Phung }, BOOKTITLE = { Proc. of the 37th International Conference on Machine Learning (ICML) }, TITLE = { Parameterized Rate-Distortion Stochastic Encoder }, YEAR = { 2020 }, }

A Relational Memory-based Embedding Model for Triple Classification and Search Personalization
Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. In Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [ | | pdf]
Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task.

@INPROCEEDINGS { nguyen_etal_acl9_relational, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung }, BOOKTITLE = { Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) }, TITLE = { A Relational Memory-based Embedding Model for Triple Classification and Search Personalization }, YEAR = { 2020 }, ABSTRACT = { Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task. }, FILE = { :nguyen_etal_acl9_relational - A Relational Memory Based Embedding Model for Triple Classification and Search Personalization.PDF:PDF }, URL = { https://arxiv.org/abs/1907.06080 }, }

Deep Generative Models of Sparse and Overdispersed Discrete Data
He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung and Mingyuan Zhou. In Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS), 2020. [ | | pdf]
In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models.

@INPROCEEDINGS { zhao_etal_aistats20_deepgenerative, AUTHOR = { He Zhao and Piyush Rai and Lan Du and Wray Buntine and Dinh Phung and Mingyuan Zhou }, TITLE = { Deep Generative Models of Sparse and Overdispersed Discrete Data }, BOOKTITLE = { Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS) }, YEAR = { 2020 }, ABSTRACT = { In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models. }, FILE = { :zhao_etal_aistats20_deepgenerative - Deep Generative Models of Sparse and Overdispersed Discrete Data.pdf:PDF }, URL = { https://www.semanticscholar.org/paper/Deep-Generative-Models-of-Sparse-and-Overdispersed-Zhao-Rai/8136c46488875b09e15e89c08bf02698901322a1 }, }

Learning Generative Adversarial Networks from Multiple Data Sources
Trung Le, Quan Hoang, Hung Vu, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pages 2823-2829, July 2019. [ | | pdf]
Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair.

@INPROCEEDINGS { le_etal_ijcai19_learningGAN, AUTHOR = { Trung Le and Quan Hoang and Hung Vu and Tu Dinh Nguyen and Hung Bui and Dinh Phung }, TITLE = { Learning Generative Adversarial Networks from Multiple Data Sources }, BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) }, YEAR = { 2019 }, PAGES = { 2823--2829 }, MONTH = { July }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, ABSTRACT = { Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair. }, FILE = { :le_etal_ijcai19_learningGAN - Learning Generative Adversarial Networks from Multiple Data Sources.pdf:PDF }, URL = { https://www.ijcai.org/Proceedings/2019/391 }, }

Three-Player Wasserstein GAN via Amortised Duality
Nhan Dam, Quan Hoang, Trung Le, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI), pages 2202-2208, July 2019. [ | | pdf]
We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method.

@INPROCEEDINGS { dam_etal_ijcai19_3pwgan, AUTHOR = { Nhan Dam and Quan Hoang and Trung Le and Tu Dinh Nguyen and Hung Bui and Dinh Phung }, BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI) }, TITLE = { Three-Player {W}asserstein {GAN} via Amortised Duality }, YEAR = { 2019 }, MONTH = { July }, PAGES = { 2202--2208 }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, ABSTRACT = { We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method. }, FILE = { :dam_etal_ijcai19_3pwgan - Three Player Wasserstein GAN Via Amortised Duality.pdf:PDF }, URL = { https://www.ijcai.org/Proceedings/2019/305 }, }

Learning How to Active Learn by Dreaming
Thuy-Trang Vu, Ming Liu, Dinh Phung and Gholamreza Haffari. In In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, jul 2019. [ | ]

@INPROCEEDINGS { vu_etal_acl19_learning, AUTHOR = { Thuy-Trang Vu and Ming Liu and Dinh Phung and Gholamreza Haffari }, TITLE = { Learning How to Active Learn by Dreaming }, BOOKTITLE = { In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL) }, YEAR = { 2019 }, ADDRESS = { Florence, Italy }, MONTH = { jul }, }

A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization
Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, USA, jun 2019. [ | | pdf]
In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset.

@INPROCEEDINGS { nguyen_etal_naaclhtl19_acapsule, AUTHOR = { Dai Quoc Nguyen and Thanh Vu and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung }, TITLE = { A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization }, BOOKTITLE = { In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL) }, YEAR = { 2019 }, ADDRESS = { Minneapolis, USA }, MONTH = { jun }, ABSTRACT = { In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. }, FILE = { :nguyen_etal_naaclhtl19_acapsule - A Capsule Network Based Embedding Model for Knowledge Graph Completion and Search Personalization.pdf:PDF }, URL = { https://arxiv.org/abs/1808.04122 }, }

Probabilistic Multilevel Clustering via Composite Transportation Distance
Viet Huynh, Nhat Ho, Dinh Phung and Michael I. Jordan. In In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan, apr 2019. [ | | pdf]
We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach.

@INPROCEEDINGS { ho_etal_aistats19_probabilistic, AUTHOR = { Viet Huynh and Nhat Ho and Dinh Phung and Michael I. Jordan }, TITLE = { Probabilistic Multilevel Clustering via Composite Transportation Distance }, BOOKTITLE = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) }, YEAR = { 2019 }, ADDRESS = { Okinawa, Japan }, MONTH = { apr }, ABSTRACT = { We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach. }, FILE = { :ho_etal_aistats19_probabilistic - Probabilistic Multilevel Clustering Via Composite Transportation Distance.pdf:PDF }, JOURNAL = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) }, URL = { https://arxiv.org/abs/1810.11911 }, }

Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection
Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel and Lizhen Qu. In International Conference on Learning Representations (ICLR), 2019. [ | | pdf]

@INPROCEEDINGS { le_etal_iclr18_maximal, AUTHOR = { Tue Le and Tuan Nguyen and Trung Le and Dinh Phung and Paul Montague and Olivier De Vel and Lizhen Qu }, TITLE = { Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection }, BOOKTITLE = { International Conference on Learning Representations (ICLR) }, YEAR = { 2019 }, FILE = { :le_etal_iclr18_maximal - Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection.pdf:PDF }, URL = { https://openreview.net/forum?id=ByloIiCqYQ }, }

Robust Anomaly Detection in Videos using Multilevel Representations
Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, USA, 2019. [ | | pdf]

@INPROCEEDINGS { vu_etal_aaai19_robustanomaly, AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung }, TITLE = { Robust Anomaly Detection in Videos using Multilevel Representations }, BOOKTITLE = { In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI) }, YEAR = { 2019 }, ADDRESS = { Honolulu, USA }, FILE = { :vu_etal_aaai19_robustanomaly - Robust Anomaly Detection in Videos Using Multilevel Representations.pdf:PDF }, GROUPS = { Anomaly Detection }, URL = { https://github.com/SeaOtter/vad_gan }, }

Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data
Khanh Nguyen, Trung Le, Tu Nguyen, Geoff Webb and Dinh Phung. In Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), London, UK, aug 2018. [ | ]
Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to eﬀectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals.

@INPROCEEDINGS { nguyen_etal_kdd18_robustbayesian, AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Geoff Webb and Dinh Phung }, TITLE = { Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data }, BOOKTITLE = { Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD) }, YEAR = { 2018 }, ADDRESS = { London, UK }, MONTH = { aug }, PUBLISHER = { ACM }, ABSTRACT = { Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to eﬀectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals. }, FILE = { :nguyen_etal_kdd18_robustbayesian - Robust Bayesian Kernel Machine Via Stein Variational Gradient Descent for Big Data.pdf:PDF }, }

MGAN: Training Generative Adversarial Nets with Multiple Generators
Quan Hoang, Tu Dinh Nguyen, Trung Le and Dinh Phung. In International Conference on Learning Representations (ICLR), 2018. [ | | pdf]
We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators.

@INPROCEEDINGS { hoang_etal_iclr18_mgan, AUTHOR = { Quan Hoang and Tu Dinh Nguyen and Trung Le and Dinh Phung }, TITLE = { {MGAN}: Training Generative Adversarial Nets with Multiple Generators }, BOOKTITLE = { International Conference on Learning Representations (ICLR) }, YEAR = { 2018 }, ABSTRACT = { We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. }, FILE = { :hoang_etal_iclr18_mgan - MGAN_ Training Generative Adversarial Nets with Multiple Generators.pdf:PDF }, URL = { https://openreview.net/forum?id=rkmu5b0a- }, }

Geometric enclosing networks
Trung Le, Hung Vu, Tu Dinh Nguyen and Dinh Phung. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18}, pages 2355-2361, July 2018. [ | ]
Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data.

@INPROCEEDINGS { le_etal_ijcai18_geometric, AUTHOR = { Trung Le and Hung Vu and Tu Dinh Nguyen and Dinh Phung }, TITLE = { Geometric enclosing networks }, BOOKTITLE = { Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18} }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, PAGES = { 2355--2361 }, YEAR = { 2018 }, MONTH = { July }, ABSTRACT = { Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. }, FILE = { :le_etal_ijcai18_geometric - Geometric Enclosing Networks.pdf:PDF }, }

A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018. [ | | pdf]
We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237.

@INPROCEEDINGS { nguyen_etal_naacl18_anovelembedding, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung }, TITLE = { A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network }, BOOKTITLE = { Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) }, YEAR = { 2018 }, ABSTRACT = { We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. }, FILE = { :nguyen_etal_naacl18_anovelembedding - A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network.pdf:PDF }, URL = { https://arxiv.org/abs/1712.02121 }, }

Learning Graph Representation via Frequent Subgraphs
Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In Proc. of SIAM Int. Conf. on Data Mining (SDM), 2018. (Student travel award). [ | ]

@INPROCEEDINGS { nguyen_etal_sdm18_learning, AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung }, TITLE = { Learning Graph Representation via Frequent Subgraphs }, BOOKTITLE = { Proc. of SIAM Int. Conf. on Data Mining (SDM) }, YEAR = { 2018 }, PUBLISHER = { SIAM }, NOTE = { Student travel award }, FILE = { :nguyen_etal_sdm18_learning - Learning Graph Representation Via Frequent Subgraphs.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2018.01.12 }, }

Model-Based Learning for Point Pattern Data
Ba-Ngu Vo, Nhan Dam, Dinh Phung, Quang N. Tran and Ba-Tuong Vo. Pattern Recognition (PR), 84:136-151, 2018. [ | | pdf]
This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.

@ARTICLE { vo_etal_pr18_modelbased, AUTHOR = { Ba-Ngu Vo and Nhan Dam and Dinh Phung and Quang N. Tran and Ba-Tuong Vo }, JOURNAL = { Pattern Recognition (PR) }, TITLE = { Model-Based Learning for Point Pattern Data }, YEAR = { 2018 }, ISSN = { 0031-3203 }, PAGES = { 136--151 }, VOLUME = { 84 }, ABSTRACT = { This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. }, DOI = { https://doi.org/10.1016/j.patcog.2018.07.008 }, FILE = { :vo_etal_pr18_modelbased - Model Based Learning for Point Pattern Data.pdf:PDF }, KEYWORDS = { Point pattern, Point process, Random finite set, Multiple instance learning, Classification, Novelty detection, Clustering }, PUBLISHER = { Elsevier }, URL = { http://www.sciencedirect.com/science/article/pii/S0031320318302395 }, }

Dual Discriminator Generative Adversarial Nets
Tu Dinh Nguyen, Trung Le, Hung Vu and Dinh Phung. In Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS), pages 2667-2677, USA, 2017. [ | | pdf]
We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database.

@INPROCEEDINGS { tu_etal_nips17_d2gan, AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Vu and Dinh Phung }, TITLE = { Dual Discriminator Generative Adversarial Nets }, BOOKTITLE = { Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS) }, YEAR = { 2017 }, SERIES = { NIPS'17 }, PAGES = { 2667--2677 }, ADDRESS = { USA }, PUBLISHER = { Curran Associates Inc. }, ABSTRACT = { We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database. }, ACMID = { 3295027 }, FILE = { :tu_etal_nips17_d2gan - Dual Discriminator Generative Adversarial Nets.pdf:PDF }, ISBN = { 978-1-5108-6096-4 }, LOCATION = { Long Beach, California, USA }, NUMPAGES = { 11 }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.09.06 }, URL = { http://dl.acm.org/citation.cfm?id=3294996.3295027 }, }

GoGP: Fast Online Regression with Gaussian Processes
Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. In International Conference on Data Mining (ICDM), 2017. [ | ]
One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

@INPROCEEDINGS { le_etal_icdm17_gogp, AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung }, TITLE = { {GoGP}: Fast Online Regression with Gaussian Processes }, BOOKTITLE = { International Conference on Data Mining (ICDM) }, YEAR = { 2017 }, ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. }, FILE = { :le_etal_icdm17_gogp - GoGP_ Fast Online Regression with Gaussian Processes.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.09.01 }, }

Supervised Restricted Boltzmann Machines
Tu Dinh Nguyen, Dinh Phung, Viet Huynh and Trung Le. In In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2017. [ | | pdf]
We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation.

@INPROCEEDINGS { nguyen_etal_uai17supervised, AUTHOR = { Tu Dinh Nguyen and Dinh Phung and Viet Huynh and Trung Le }, TITLE = { Supervised Restricted Boltzmann Machines }, BOOKTITLE = { In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2017 }, ABSTRACT = { We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation. }, FILE = { :nguyen_etal_uai17supervised - Supervised Restricted Boltzmann Machines.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.08.29 }, URL = { http://auai.org/uai2017/proceedings/papers/106.pdf }, }

Multilevel clustering via Wasserstein means
Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, Viet Huynh and Dinh Phung. In Proc. of the 34th Internaltional Conference on Machine Learning (ICML), pages 1501-1509, 2017. [ | | pdf]
We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach.

@INPROCEEDINGS { ho_etal_icml17multilevel, AUTHOR = { Nhat Ho and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Viet Huynh and Dinh Phung }, TITLE = { Multilevel clustering via {W}asserstein means }, BOOKTITLE = { Proc. of the 34th Internaltional Conference on Machine Learning (ICML) }, YEAR = { 2017 }, VOLUME = { 70 }, SERIES = { ICML'17 }, PAGES = { 1501--1509 }, PUBLISHER = { JMLR.org }, ABSTRACT = { We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. }, ACMID = { 3305536 }, FILE = { :ho_etal_icml17multilevel - Multilevel Clustering Via Wasserstein Means.pdf:PDF }, LOCATION = { Sydney, NSW, Australia }, NUMPAGES = { 9 }, URL = { http://dl.acm.org/citation.cfm?id=3305381.3305536 }, }

Approximation Vector Machines for Large-scale Online Learning
Trung Le, Tu Dinh Nguyen, Vu Nguyen and Dinh Q. Phung. Journal of Machine Learning Research (JMLR), 2017. [ | | pdf]
One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.

@ARTICLE { le_etal_jmlr17approximation, AUTHOR = { Trung Le and Tu Dinh Nguyen and Vu Nguyen and Dinh Q. Phung }, TITLE = { Approximation Vector Machines for Large-scale Online Learning }, JOURNAL = { Journal of Machine Learning Research (JMLR) }, YEAR = { 2017 }, ABSTRACT = { One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. }, FILE = { :le_etal_jmlr17approximation - Approximation Vector Machines for Large Scale Online Learning.pdf:PDF }, KEYWORDS = { kernel, online learning, large-scale machine learning, sparsity, big data, core set, stochastic gradient descent, convergence analysis }, URL = { https://arxiv.org/abs/1604.06518 }, }

Discriminative Bayesian Nonparametric Clustering
Vu Nguyen, Dinh Phung, Trung Le, Svetha Venkatesh and Hung Bui. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models.

@INPROCEEDINGS { nguyen_etal_ijcai17discriminative, AUTHOR = { Vu Nguyen and Dinh Phung and Trung Le and Svetha Venkatesh and Hung Bui }, TITLE = { Discriminative Bayesian Nonparametric Clustering }, BOOKTITLE = { Proc. of International Joint Conference on Artificial Intelligence (IJCAI) }, YEAR = { 2017 }, ABSTRACT = { We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models. }, FILE = { :nguyen_etal_ijcai17discriminative - Discriminative Bayesian Nonparametric Clustering.pdf:PDF }, URL = { https://www.ijcai.org/proceedings/2017/355 }, }

Large-scale Online Kernel Learning with Random Feature Reparameterization
Tu Dinh Nguyen, Trung Le, Hung Bui and Dinh Phung. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency.

@INPROCEEDINGS { tu_etal_ijcai17_rrf, AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Bui and Dinh Phung }, BOOKTITLE = { Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) }, TITLE = { Large-scale Online Kernel Learning with Random Feature Reparameterization }, YEAR = { 2017 }, SERIES = { IJCAI'17 }, ABSTRACT = { A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency. }, FILE = { :tu_etal_ijcai17_rrf - Large Scale Online Kernel Learning with Random Feature Reparameterization.pdf:PDF }, LOCATION = { Melbourne, Australia }, NUMPAGES = { 7 }, URL = { https://www.ijcai.org/proceedings/2017/354 }, }

Hierarchical semi-Markov conditional random fields for deep recursive sequential data
Truyen Tran, Dinh Phung, Hung H. Bui and Svetha Venkatesh. Artificial Intelligence (AIJ), Feb. 2017. [ | | pdf]
We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.

@ARTICLE { tran_etal_aij17hierarchical, AUTHOR = { Truyen Tran and Dinh Phung and Hung H. Bui and Svetha Venkatesh }, TITLE = { Hierarchical semi-Markov conditional random fields for deep recursive sequential data }, JOURNAL = { Artificial Intelligence (AIJ) }, YEAR = { 2017 }, MONTH = { Feb. }, ABSTRACT = { We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. }, FILE = { :tran_etal_aij17hierarchical - Hierarchical Semi Markov Conditional Random Fields for Deep Recursive Sequential Data.pdf:PDF }, KEYWORDS = { Deep nested sequential processes, Hierarchical semi-Markov conditional random field, Partial labelling, Constrained inference, Numerical scaling }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.02.21 }, URL = { http://www.sciencedirect.com/science/article/pii/S0004370217300231 }, }

See my thesis (chapter 5) for for an equivalent directed graphical model, which is the precusor of this work and where I had described the Assymetric Inside-Outside (AIO) algorithm in great detail. A brief version of this for directed case has also appeared in this AAAI'04's paper. The idea of semi-Markov duration modelling has also been addressed for directed case in these CVPR05 and AIJ09 papers.

Column Networks for Collective Classification
Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In The Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017. [ | | pdf]
Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals.

@CONFERENCE { pham_etal_aaai17column, AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Column Networks for Collective Classification }, BOOKTITLE = { The Thirty-First AAAI Conference on Artificial Intelligence (AAAI) }, YEAR = { 2017 }, ABSTRACT = { Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. }, COMMENT = { Accepted }, FILE = { :pham_etal_aaai17column - Column Networks for Collective Classification.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.11.14 }, URL = { https://arxiv.org/abs/1609.04508 }, }

Dual Space Gradient Descent for Online Learning
Le, Trung, Nguyen, Tu Dinh, Nguyen, Vu and Phung, Dinh. In Advances in Neural Information Processing (NIPS), December 2016. [ | | pdf]
One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines.

@CONFERENCE { le_etal_nips16dual, AUTHOR = { Le, Trung and Nguyen, Tu Dinh and Nguyen, Vu and Phung, Dinh }, TITLE = { Dual Space Gradient Descent for Online Learning }, BOOKTITLE = { Advances in Neural Information Processing (NIPS) }, YEAR = { 2016 }, MONTH = { December }, ABSTRACT = { One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines. }, FILE = { :le_etal_nips16dual - Dual Space Gradient Descent for Online Learning.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.08.16 }, URL = { https://papers.nips.cc/paper/6560-dual-space-gradient-descent-for-online-learning.pdf }, }

Scalable Nonparametric Bayesian Multilevel Clustering
Viet Huynh, Dinh Phung, Svetha Venkatesh, Xuan-Long Nguyen, Matt Hoffman and Hung Bui. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), pages 289-298, June 2016. [ | | pdf]

@CONFERENCE { huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable, AUTHOR = { Viet Huynh and Dinh Phung and Svetha Venkatesh and Xuan-Long Nguyen and Matt Hoffman and Hung Bui }, TITLE = { Scalable Nonparametric {B}ayesian Multilevel Clustering }, BOOKTITLE = { Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2016 }, MONTH = { June }, PUBLISHER = { AUAI Pres }, PAGES = { 289--298 }, FILE = { :huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable - Scalable Nonparametric Bayesian Multilevel Clustering.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.09 }, URL = { http://auai.org/uai2016/proceedings/papers/262.pdf }, }

Budgeted Semi-supervised Support Vector Machine
Le, Trung, Duong, Phuong, Dinh, Mi, Nguyen, Tu, Nguyen, Vu and Phung, Dinh. In 32nd Conference on Uncertainty in Artificial Intelligence (UAI), June 2016. [ | | pdf]

@CONFERENCE { le_duong_dinh_nguyen_nguyen_phung_uai16budgeted, AUTHOR = { Le, Trung and Duong, Phuong and Dinh, Mi and Nguyen, Tu and Nguyen, Vu and Phung, Dinh }, TITLE = { Budgeted Semi-supervised {S}upport {V}ector {M}achine }, BOOKTITLE = { 32nd Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2016 }, MONTH = { June }, FILE = { :le_duong_dinh_nguyen_nguyen_phung_uai16budgeted - Budgeted Semi Supervised Support Vector Machine.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.09 }, URL = { http://auai.org/uai2016/proceedings/papers/110.pdf }, }

Nonparametric Budgeted Stochastic Gradient Descent
Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS), May 2016. [ | | pdf]

@CONFERENCE { le_nguyen_phung_aistats16nonparametric, AUTHOR = { Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh }, TITLE = { Nonparametric Budgeted Stochastic Gradient Descent }, BOOKTITLE = { 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS) }, YEAR = { 2016 }, MONTH = { May }, FILE = { :le_nguyen_phung_aistats16nonparametric - Nonparametric Budgeted Stochastic Gradient Descent.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://www.jmlr.org/proceedings/papers/v51/le16.pdf }, }

One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems
Nguyen, Vu, Nguyen, Tu Dinh, Le, Trung, Phung, Dinh and Venkatesh, Svetha. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1113-1118, Dec 2016. [ | | pdf | code]
Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters.

@CONFERENCE { nguyen_etal_icdm16onepass, AUTHOR = { Nguyen, Vu and Nguyen, Tu Dinh and Le, Trung and Phung, Dinh and Venkatesh, Svetha }, TITLE = { One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems }, BOOKTITLE = { 2016 IEEE 16th International Conference on Data Mining (ICDM) }, YEAR = { 2016 }, PAGES = { 1113-1118 }, MONTH = { Dec }, ABSTRACT = { Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. }, CODE = { https://github.com/ntienvu/ICDM2016_OLR }, DOI = { 10.1109/ICDM.2016.0145 }, FILE = { :nguyen_etal_icdm16onepass - One Pass Logistic Regression for Label Drift and Large Scale Classification on Distributed Systems.pdf:PDF }, KEYWORDS = { Big Data;distributed processing;pattern classification;regression analysis;Big Data cross-validation;Spark-OLR;class labels;data shuffling;distributed systems;execution time;label-drift classification problem;large-scale classification;large-scale datasets;one-pass logistic regression;optimal hyperparameter selection;sparkling OLR;Bayes methods;Big data;Context;Data models;Estimation;Industries;Logistics;Apache Spark;Logistic regression;distributed system;label-drift;large-scale classification }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.09.10 }, URL = { http://ieeexplore.ieee.org/document/7837958/ }, }

A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process
Nguyen, T., Nguyen, V., Salim, F.D., Le, D.V. and Phung, D.. Pervasive and Mobile Computing (PMC), 2016. [ | | pdf]
Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated.

@ARTICLE { nguyen_nguyen_flora_le_phung_pmc16simultaneous, AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Le, D.V. and Phung, D. }, TITLE = { A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested {D}irichlet Process }, JOURNAL = { Pervasive and Mobile Computing (PMC) }, YEAR = { 2016 }, ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated. }, DOI = { http://dx.doi.org/10.1016/j.pmcj.2016.08.019 }, FILE = { :nguyen_nguyen_flora_le_phung_pmc16simultaneous - A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.08.17 }, URL = { http://www.sciencedirect.com/science/article/pii/S1574119216302097 }, }

Streaming Variational Inference for Dirichlet Process Mixtures
Huynh, V., Phung, D. and Venkatesh, S.. In 7th Asian Conference on Machine Learning (ACML), pages 237-252, Nov. 2015. [ | | pdf]
Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data.

@INPROCEEDINGS { huynh_phung_venkatesh_15streaming, AUTHOR = { Huynh, V. and Phung, D. and Venkatesh, S. }, TITLE = { Streaming Variational Inference for {D}irichlet {P}rocess {M}ixtures }, BOOKTITLE = { 7th Asian Conference on Machine Learning (ACML) }, YEAR = { 2015 }, PAGES = { 237--252 }, MONTH = { Nov. }, ABSTRACT = { Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data. }, FILE = { :huynh_phung_venkatesh_15streaming - Streaming Variational Inference for Dirichlet Process Mixtures.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://www.jmlr.org/proceedings/papers/v45/Huynh15.pdf }, }

Tensor-variate Restricted Boltzmann Machines
Nguyen, Tu, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proc. of AAAI Conf. on Artificial Intelligence (AAAI), pages 2887-2893, Austin Texas, USA , January 2015. [ | | pdf]
Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance.

@INPROCEEDINGS { tu_truyen_phung_venkatesh_aaai15, TITLE = { Tensor-variate Restricted {B}oltzmann Machines }, AUTHOR = { Nguyen, Tu and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of AAAI Conf. on Artificial Intelligence (AAAI) }, YEAR = { 2015 }, ADDRESS = { Austin Texas, USA }, MONTH = { January }, PAGES = { 2887--2893 }, ABSTRACT = { Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance. }, KEYWORDS = { tensor; rbm; restricted boltzmann machine; tvrbm; multiplicative interaction; eeg; }, OWNER = { ngtu }, TIMESTAMP = { 2015.01.29 }, URL = { http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9371 }, }

Learning Latent Activities from Social Signals with Hierarchical Dirichlet Process
Phung, Dinh, Nguyen, T. C., Gupta, S. and Venkatesh, Svetha. In Handbook on Plan, Activity, and Intent Recognition, pages 149-174.Elsevier, , 2014. [ | | pdf | code]
Understanding human activities is an important research topic, noticeablyin assisted living and health monitoring. Beyond simple forms ofactivity (e.g., RFID event of entering a building), learning latentactivities that are more semantically interpretable, such as sittingat a desk, meeting with people or gathering with friends, remainsa challenging problem. Supervised learning has been the typical modelingchoice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adaptto the continuing growth of data over time. In this chapter, we exploreBayesian nonparametric method, in particular the Hierarchical DirichletProcess, to infer latent activities from sensor data acquired ina pervasive setting. Our framework is unsupervised, requires no labeleddata and is able to discover new activities as data grows. We presentexperiments on extracting movement and interaction activities fromsociometric badge signals and show how to use them for detectionof sub-communities. Using the popular Reality Mining dataset, wefurther demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups.

@INCOLLECTION { phung_nguyen_gupta_venkatesh_pair14, TITLE = { Learning Latent Activities from Social Signals with Hierarchical {D}irichlet Process }, AUTHOR = { Phung, Dinh and Nguyen, T. C. and Gupta, S. and Venkatesh, Svetha }, BOOKTITLE = { Handbook on Plan, Activity, and Intent Recognition }, PUBLISHER = { Elsevier }, YEAR = { 2014 }, EDITOR = { Gita Sukthankar and Christopher Geib and David V. Pynadath and HungBui and Robert P. Goldman }, PAGES = { 149--174 }, ABSTRACT = { Understanding human activities is an important research topic, noticeablyin assisted living and health monitoring. Beyond simple forms ofactivity (e.g., RFID event of entering a building), learning latentactivities that are more semantically interpretable, such as sittingat a desk, meeting with people or gathering with friends, remainsa challenging problem. Supervised learning has been the typical modelingchoice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adaptto the continuing growth of data over time. In this chapter, we exploreBayesian nonparametric method, in particular the Hierarchical DirichletProcess, to infer latent activities from sensor data acquired ina pervasive setting. Our framework is unsupervised, requires no labeleddata and is able to discover new activities as data grows. We presentexperiments on extracting movement and interaction activities fromsociometric badge signals and show how to use them for detectionof sub-communities. Using the popular Reality Mining dataset, wefurther demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups. }, CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code }, OWNER = { ctng }, TIMESTAMP = { 2013.07.25 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/Phung_etal_pair14.pdf }, }

A Random Finite Set Model for Data Clustering
Phung, Dinh and Vo, Ba-Ngu. In Proc. of Intl. Conf. on Fusion (FUSION), Salamanca, Spain, July 2014. [ | | pdf]
Abstract--- The goal of data clustering is to partition data pointsinto groups to minimize a given objective function. While most existingclustering algorithms treat each data point as vector, in many applicationseach datum is not a vector but a point pattern or a set of points.Moreover, many existing clustering methods require the user to specifythe number of clusters, which is not available in advance. This paperproposes a new class of models for data clustering that addressesset-valued data as well as unknown number of clusters, using a DirichletProcess mixture of Poisson random finite sets. We also develop anefficient Markov Chain Monte Carlo posterior inference techniquethat can learn the number of clusters and mixture parameters automaticallyfrom the data. Numerical studies are presented to demonstrate thesalient features of this new model, in particular its capacity todiscover extremely unbalanced clusters in data.

@INPROCEEDINGS { phung_vo_fusion14, TITLE = { A Random Finite Set Model for Data Clustering }, AUTHOR = { Phung, Dinh and Vo, Ba-Ngu }, BOOKTITLE = { Proc. of Intl. Conf. on Fusion (FUSION) }, YEAR = { 2014 }, ADDRESS = { Salamanca, Spain }, MONTH = { July }, ABSTRACT = { Abstract--- The goal of data clustering is to partition data pointsinto groups to minimize a given objective function. While most existingclustering algorithms treat each data point as vector, in many applicationseach datum is not a vector but a point pattern or a set of points.Moreover, many existing clustering methods require the user to specifythe number of clusters, which is not available in advance. This paperproposes a new class of models for data clustering that addressesset-valued data as well as unknown number of clusters, using a DirichletProcess mixture of Poisson random finite sets. We also develop anefficient Markov Chain Monte Carlo posterior inference techniquethat can learn the number of clusters and mixture parameters automaticallyfrom the data. Numerical studies are presented to demonstrate thesalient features of this new model, in particular its capacity todiscover extremely unbalanced clusters in data. }, OWNER = { dinh }, TIMESTAMP = { 2014.05.16 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/phung_vo_fusion14.pdf }, }

Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter
Vo, Ba-Ngu, Vo, Ba-Tuong and Phung, Dinh. IEEE Transactions on Signal Processing (TSP), 62(24):6554-6567, 2014. [ | ]

@ARTICLE { vo_vo_phung_tsp14, AUTHOR = { Vo, Ba-Ngu and Vo, Ba-Tuong and Phung, Dinh }, TITLE = { Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter }, JOURNAL = { IEEE Transactions on Signal Processing (TSP) }, YEAR = { 2014 }, VOLUME = { 62 }, NUMBER = { 24 }, PAGES = { 6554--6567 }, FILE = { :vo_vo_phung_tsp14 - Labeled Random Finite Sets and the Bayes Multi Target Tracking Filter.pdf:PDF }, OWNER = { dinh }, TIMESTAMP = { 2014.07.02 }, }

Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
Vu Nguyen, Phung, Dinh, XuanLong Nguyen, Venkatesh, Svetha and Hung Bui. In Proc. of Intl. Conf. on Machine Learning (ICML), pages 288-296, 2014. [ | ]

@INPROCEEDINGS { nguyen_phung_nguyen_venkatesh_bui_icml14, TITLE = { Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts }, AUTHOR = { Vu Nguyen and Phung, Dinh and XuanLong Nguyen and Venkatesh, Svetha and Hung Bui }, BOOKTITLE = { Proc. of Intl. Conf. on Machine Learning (ICML) }, YEAR = { 2014 }, PAGES = { 288--296 }, OWNER = { tvnguye }, TIMESTAMP = { 2013.12.13 }, }

Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions
Sunil Kumar Gupta, Santu Rana, Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 235-243, 2014. [ | | pdf]

@INPROCEEDINGS { gupta_rana_phung_venkatesh_sdm14, TITLE = { Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions }, AUTHOR = { Sunil Kumar Gupta and Santu Rana and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) }, YEAR = { 2014 }, PAGES = { 235-243 }, CHAPTER = { 27 }, DOI = { 10.1137/1.9781611973440.27 }, EPRINT = { http://epubs.siam.org/doi/pdf/10.1137/1.9781611973440.27 }, OWNER = { thinng }, TIMESTAMP = { 2015.01.28 }, URL = { http://epubs.siam.org/doi/abs/10.1137/1.9781611973440.27 }, }

An Integrated Framework for Suicide Risk Prediction
Tran, Truyen, Phung, Dinh, Luo, Wei, Harvey,R., Berk,M. and Venkatesh, Svetha. In Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Chicago, US, 2013. [ | ]

@INPROCEEDINGS { tran_phung_luo_harvey_berk_venkatesh_kdd13, TITLE = { An Integrated Framework for Suicide Risk Prediction }, AUTHOR = { Tran, Truyen and Phung, Dinh and Luo, Wei and Harvey,R. and Berk,M. and Venkatesh, Svetha }, BOOKTITLE = { Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD) }, YEAR = { 2013 }, ADDRESS = { Chicago, US }, OWNER = { Dinh }, TIMESTAMP = { 2013.06.07 }, }

Thurstonian Boltzmann Machines: Learning from Multiple Inequalities
Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ]
We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis.

@INPROCEEDINGS { tran_phung_venkatesh_icml13, TITLE = { {T}hurstonian {B}oltzmann Machines: Learning from Multiple Inequalities }, AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { International Conference on Machine Learning (ICML) }, YEAR = { 2013 }, ADDRESS = { Atlanta, USA }, MONTH = { June 16-21 }, ABSTRACT = { We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis. }, OWNER = { dinh }, TIMESTAMP = { 2013.03.01 }, }

Factorial Multi-Task Learning : A Bayesian Nonparametric Approach
Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In Proceedings of International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ]
Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods.

@INPROCEEDINGS { gupta_phung_venkatesh_icml13, TITLE = { Factorial Multi-Task Learning : A Bayesian Nonparametric Approach }, AUTHOR = { Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proceedings of International Conference on Machine Learning (ICML) }, YEAR = { 2013 }, ADDRESS = { Atlanta, USA }, MONTH = { June 16-21 }, ABSTRACT = { Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods. }, OWNER = { Dinh }, TIMESTAMP = { 2013.04.16 }, }

Sparse Subspace Clustering via Group Sparse Coding
Saha, B., Pham, D.S., Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 130-138, 2013. [ | ]
Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably.

@INPROCEEDINGS { saha_pham_phung_venkatesh_sdm13, TITLE = { Sparse Subspace Clustering via Group Sparse Coding }, AUTHOR = { Saha, B. and Pham, D.S. and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) }, YEAR = { 2013 }, PAGES = { 130-138 }, ABSTRACT = { Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably. }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

Bayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster)
Phung, Dinh. In International Conference on Bayesian Nonparametrics, Amsterdam, The Netherlands, June 10-14 2013. [ | | code | poster]
When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's prole and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction.

@INPROCEEDINGS { phung_bnp13, TITLE = { {B}ayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster) }, AUTHOR = { Phung, Dinh }, BOOKTITLE = { International Conference on Bayesian Nonparametrics }, YEAR = { 2013 }, ADDRESS = { Amsterdam, The Netherlands }, MONTH = { June 10-14 }, ABSTRACT = { When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's prole and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction. }, CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code }, OWNER = { dinh }, POSTER = { http://prada-research.net/~dinh/uploads/Main/Publications/A0_poster_BNP13.pdf }, TIMESTAMP = { 2013.03.01 }, }

Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis
Phung, Dinh, Gupta, S. K., Nguyen, T. and Venkatesh, Svetha. IEEE Transactions on Multimedia (TMM), 15:1316-1325, 2013. [ | | pdf]
Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being.

@ARTICLE { phung_gupta_nguyen_venkatesh_tmm13, TITLE = { Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis }, AUTHOR = { Phung, Dinh and Gupta, S. K. and Nguyen, T. and Venkatesh, Svetha }, JOURNAL = { IEEE Transactions on Multimedia (TMM) }, YEAR = { 2013 }, PAGES = { 1316-1325 }, VOLUME = { 15 }, ABSTRACT = { Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being. }, ISSN = { 0219-1377 }, LANGUAGE = { English }, TIMESTAMP = { 2013.04.16 }, URL = { http://prada-research.net/~dinh/uploads/Main/HomePage/phung_gupta_nguyen_venkatesh_tmm13.pdf }, }

Regularized nonnegative shared subspace learning
Gupta, Sunil Kumar, Phung, Dinh, Adams, Brett and Venkatesh, Svetha. Data Mining and Knowledge Discovery, 26(1):57-97, 2013. [ | ]

@ARTICLE { gupta_phung_adams_venkatesh_dami13, TITLE = { Regularized nonnegative shared subspace learning }, AUTHOR = { Gupta, Sunil Kumar and Phung, Dinh and Adams, Brett and Venkatesh, Svetha }, JOURNAL = { Data Mining and Knowledge Discovery }, YEAR = { 2013 }, NUMBER = { 1 }, PAGES = { 57--97 }, VOLUME = { 26 }, OWNER = { thinng }, PUBLISHER = { Springer }, TIMESTAMP = { 2015.01.29 }, }

A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning
Gupta, S., Phung, Dinh and Venkatesh, Svetha. In Proc. of Intl. Conf. on Uncertainty in Artificial Intelligence (UAI), pages 316-325, 2012. [ | ]

@INPROCEEDINGS { gupta_phung_venkatesh_uai12, TITLE = { A Slice Sampler for Restricted Hierarchical {B}eta Process with Applications to Shared Subspace Learning }, AUTHOR = { Gupta, S. and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of Intl. Conf. on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2012 }, PAGES = { 316--325 }, OWNER = { dinh }, TIMESTAMP = { 2012.05.24 }, }

A Bayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources
Gupta, S., Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 200-211, 2012. [ | ]
Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval.

@INPROCEEDINGS { gupta_phung_venkatesh_sdm12, TITLE = { A {B}ayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources }, AUTHOR = { Gupta, S. and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) }, YEAR = { 2012 }, PAGES = { 200--211 }, ABSTRACT = { Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval. }, }

A Sequential Decision Approach to Ordinal Preferences in Recommender Systems
Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proc. of AAAI Conf. on Artificial Intelligence (AAAI), pages 676-682, 2012. [ | ]
We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods.

@INPROCEEDINGS { truyen_phung_venkatesh_aaai12, TITLE = { A Sequential Decision Approach to Ordinal Preferences in Recommender Systems }, AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of AAAI Conf. on Artificial Intelligence (AAAI) }, YEAR = { 2012 }, PAGES = { 676--682 }, ABSTRACT = { We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods. }, TIMESTAMP = { 2012.04.11 }, }

Improved Subspace Clustering via Exploitation of Spatial Constraints
Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 550-557, 2012. [ | ]
We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation.

@INPROCEEDINGS { pham_budhaditya_phung_venkatesh_cvpr12, TITLE = { Improved Subspace Clustering via Exploitation of Spatial Constraints }, AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2012 }, PAGES = { 550--557 }, ABSTRACT = { We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation. }, OWNER = { thinng }, TIMESTAMP = { 2012.04.11 }, }

Sparse Subspace Representation for Spectral Document Clustering
Saha, B., Phung, Dinh, Pham, D.S. and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Data Mining (ICDM), pages 1092-1097, 2012. [ | ]

@INPROCEEDINGS { saha_phung_pham_venkatesh_icdm12, TITLE = { Sparse Subspace Representation for Spectral Document Clustering }, AUTHOR = { Saha, B. and Phung, Dinh and Pham, D.S. and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Data Mining (ICDM) }, YEAR = { 2012 }, PAGES = { 1092--1097 }, OWNER = { dinh }, TIMESTAMP = { 2012.10.31 }, }

Detection of Cross-Channel Anomalies
Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 35(1):33-59, 2013. [ | ]
The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.

@ARTICLE { pham_budhaditya_phung_venkatesh_kais13, TITLE = { Detection of Cross-Channel Anomalies }, AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2013 }, NUMBER = { 1 }, PAGES = { 33--59 }, VOLUME = { 35 }, ABSTRACT = { The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. }, }

Detection of Cross-Channel Anomalies From Multiple Data Channels
Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Data Mining (ICDM), Vancouver, Canada, December 2011. [ | ]
We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.

@INPROCEEDINGS { pham_budhaditya_phung_venkatesh_icdm11, TITLE = { Detection of Cross-Channel Anomalies From Multiple Data Channels }, AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Data Mining (ICDM) }, YEAR = { 2011 }, ADDRESS = { Vancouver, Canada }, MONTH = { December }, ABSTRACT = { We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. }, COMMENT = { coauthor }, OWNER = { thinng }, TIMESTAMP = { 2012.04.11 }, }

Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering
Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Procs. of SIAM Intl. Conf. on Data Mining (SDM), Arizona, USA, April 2011. [ | | pdf]
Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that modelsthe process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals.

@INPROCEEDINGS { truyen_phung_venkatesh_sdm11, TITLE = { Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering }, AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Procs. of SIAM Intl. Conf. on Data Mining (SDM) }, YEAR = { 2011 }, ADDRESS = { Arizona, USA }, MONTH = { April }, ABSTRACT = { Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that modelsthe process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\truyen_phung_venkatesh_sdm11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2011.02.07 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Truyen_etal_sdm11.pdf }, }

Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval
Gupta, Sunil, Phung, Dinh, Adams, Brett, Tran, Truyen and Venkatesh, Svetha. In Proc. of ACM Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Washington DC, USA, July 2010. [ | | pdf]
Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets.

@INPROCEEDINGS { gupta_phung_adams_truyen_venkatesh_sigkdd10, TITLE = { Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval }, AUTHOR = { Gupta, Sunil and Phung, Dinh and Adams, Brett and Tran, Truyen and Venkatesh, Svetha }, BOOKTITLE = { Proc. of ACM Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD) }, YEAR = { 2010 }, ADDRESS = { Washington DC, USA }, MONTH = { July }, ABSTRACT = { Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\gupta_phung_adams_truyen_venkatesh_sigkdd10.pdf:PDF }, OWNER = { Dinh Phung }, TIMESTAMP = { 2010.06.29 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Gupta_etal_sigkdd10.pdf }, }

Efficient duration and hierarchical modeling for human activity recognition
Duong, Thi, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. Artificial Intelligence (AIJ), 173(7-8):830-856, 2009. [ | | pdf | code]
A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling.

@ARTICLE { duong_phung_bui_venkatesh_aij09, AUTHOR = { Duong, Thi and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, TITLE = { Efficient duration and hierarchical modeling for human activity recognition }, JOURNAL = { Artificial Intelligence (AIJ) }, YEAR = { 2009 }, VOLUME = { 173 }, NUMBER = { 7-8 }, PAGES = { 830--856 }, ABSTRACT = { A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling. }, CODE = { https://github.com/DASCIMAL/CxHSMM }, COMMENT = { coauthor }, DOI = { http://dx.doi.org/10.1016/j.artint.2008.12.005 }, FILE = { :duong_phung_bui_venkatesh_aij09 - Efficient Duration and Hierarchical Modeling for Human Activity Recognition.pdf:PDF }, KEYWORDS = { activity, recognition, duration modeling, Coxian, Hidden semi-Markov model, HSMM , smart surveillance }, OWNER = { 184698H }, PUBLISHER = { Elsevier }, TIMESTAMP = { 2010.08.11 }, URL = { http://www.sciencedirect.com/science/article/pii/S0004370208002142 }, }

MCMC for Hierarchical Semi-Markov Conditional Random Fields
Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of Workshop on Deep Learning for Speech Recognition and Related Applications, in conjunction with the Neural Information Processing Systems Conference (NIPS), Whistler, BC, Canada, December 2009. [ | ]
Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length.

@INPROCEEDINGS { truyen_phung_bui_venkatesh_nips09, TITLE = { {MCMC} for Hierarchical Semi-Markov Conditional Random Fields }, AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, BOOKTITLE = { Proc. of Workshop on Deep Learning for Speech Recognition and Related Applications, in conjunction with the Neural Information Processing Systems Conference (NIPS) }, YEAR = { 2009 }, ADDRESS = { Whistler, BC, Canada }, MONTH = { December }, ABSTRACT = { Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length. }, COMMENT = { coauthor }, OWNER = { Dinh Phung }, TIMESTAMP = { 2010.06.29 }, }

Ordinal Boltzmann Machines for Collaborative Filtering
Truyen Tran, Dinh Phung and Svetha Venkatesh. In Proc. of the 25th Conference on Uncertainty in Artificial Intelligence (UAI), pages 548-556, Arlington, Virginia, United States, June 2009. (Runner-up Best Paper Award). [ | | pdf]
Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and itembased processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods.

@INPROCEEDINGS { truyen_phung_venkatesh_uai09, AUTHOR = { Truyen Tran and Dinh Phung and Svetha Venkatesh }, TITLE = { Ordinal Boltzmann Machines for Collaborative Filtering }, BOOKTITLE = { Proc. of the 25th Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2009 }, SERIES = { UAI '09 }, PAGES = { 548--556 }, ADDRESS = { Arlington, Virginia, United States }, MONTH = { June }, PUBLISHER = { AUAI Press }, NOTE = { Runner-up Best Paper Award }, ABSTRACT = { Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and itembased processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods. }, ACMID = { 1795178 }, COMMENT = { coauthor }, FILE = { :truyen_phung_venkatesh_uai09 - Ordinal Boltzmann Machines for Collaborative Filtering.pdf:PDF }, ISBN = { 978-0-9749039-5-8 }, LOCATION = { Montreal, Quebec, Canada }, NUMPAGES = { 9 }, OWNER = { Dinh Phung }, TIMESTAMP = { 2009.09.22 }, URL = { http://dl.acm.org/citation.cfm?id=1795114.1795178 }, }

The Hidden Permutation Model and Location-Based Activity Recognition
Bui, Hung, Phung, Dinh, Venkatesh, Svetha and Phan, Hai. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 1345-1350, Chicago, USA, July 2008. [ | | pdf]
Permutation modeling is challenging because of the combinatorial nature of the problem. However, such modeling is often required in many real-world applications, including activity recognition where subactivities are often permuted and partially ordered. This paper introduces a novel Hidden Permutation Model (HPM) that can learn the partial ordering constraints in permuted state sequences. The HPMis parameterized as an exponential family distribution and is flexible so that it can encode constraints via different feature functions. A chain-flipping Metropolis-Hastings Markov chain Monte Carlo (MCMC) is employed for inference to overcome the O(n!) complexity. Gradient-based maximum likelihood parameter learning is presented for two cases when the permutation is known and when it is hidden. The HPM is evaluated using both simulated and real data from a location-based activity recognition domain. Experimental results indicate that the HPM performs far better than other baseline models, including the naive Bayes classifier, the HMM classifier, and Kirshners multinomial permutation model. Our presented HPM is generic and can potentially be utilized in any problem where the modeling of permuted states from noisy data is needed.

@INPROCEEDINGS { bui_phung_venkatesh_phan_aaai08, TITLE = { The Hidden Permutation Model and Location-Based Activity Recognition }, AUTHOR = { Bui, Hung and Phung, Dinh and Venkatesh, Svetha and Phan, Hai }, BOOKTITLE = { Proceedings of the National Conference on Artificial Intelligence (AAAI) }, YEAR = { 2008 }, ADDRESS = { Chicago, USA }, MONTH = { July }, PAGES = { 1345--1350 }, VOLUME = { 8 }, ABSTRACT = { Permutation modeling is challenging because of the combinatorial nature of the problem. However, such modeling is often required in many real-world applications, including activity recognition where subactivities are often permuted and partially ordered. This paper introduces a novel Hidden Permutation Model (HPM) that can learn the partial ordering constraints in permuted state sequences. The HPMis parameterized as an exponential family distribution and is flexible so that it can encode constraints via different feature functions. A chain-flipping Metropolis-Hastings Markov chain Monte Carlo (MCMC) is employed for inference to overcome the O(n!) complexity. Gradient-based maximum likelihood parameter learning is presented for two cases when the permutation is known and when it is hidden. The HPM is evaluated using both simulated and real data from a location-based activity recognition domain. Experimental results indicate that the HPM performs far better than other baseline models, including the naive Bayes classifier, the HMM classifier, and Kirshners multinomial permutation model. Our presented HPM is generic and can potentially be utilized in any problem where the modeling of permuted states from noisy data is needed. }, FILE = { :papers\\phung\\bui_phung_venkatesh_phan_aaai08.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { http://www.aaai.org/Papers/AAAI/2008/AAAI08-213.pdf }, }

Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data
Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. Advances in Neural Information Processing (NIPS), December 2008. [ | | ]
Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.

@ARTICLE { truyen_phung_bui_venkatesh_nips08, TITLE = { Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data }, AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, JOURNAL = { Advances in Neural Information Processing (NIPS) }, YEAR = { 2008 }, MONTH = { December }, ABSTRACT = { Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. }, ADDRESS = { Vancouver, Canada }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2008/conferences/truyen_phung_bui_venkatesh_nips08.pdf }, }

AdaBoost.MRF: Boosted Markov Random Forests and Application to Multilevel Activity Recognition
Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1686-1693, New York, USA, June 2006. [ | ]
Activity recognition is an important issue in building intelligent monitoring systems. We address the recognition of multilevel activities in this paper via a conditional Markov random field (MRF), known as the dynamic conditional random field (DCRF). Parameter estimation in general MRFs using maximum likelihood is known to be computationally challenging (except for extreme cases), and thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Distinct from most existing work, our algorithm can handle hidden variables (missing labels) and is particularly attractive for smarthouse domains where reliable labels are often sparsely observed. Furthermore, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithmto a home video surveillance application and demonstrate its efficacy.

@INPROCEEDINGS { truyen_phung_bui_venkatesh_cvpr06, TITLE = { {AdaBoost.MRF}: Boosted {M}arkov Random Forests and Application to Multilevel Activity Recognition }, AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2006 }, ADDRESS = { New York, USA }, MONTH = { June }, PAGES = { 1686-1693 }, ABSTRACT = { Activity recognition is an important issue in building intelligent monitoring systems. We address the recognition of multilevel activities in this paper via a conditional Markov random field (MRF), known as the dynamic conditional random field (DCRF). Parameter estimation in general MRFs using maximum likelihood is known to be computationally challenging (except for extreme cases), and thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Distinct from most existing work, our algorithm can handle hidden variables (missing labels) and is particularly attractive for smarthouse domains where reliable labels are often sparsely observed. Furthermore, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithmto a home video surveillance application and demonstrate its efficacy. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model
Duong, Thi, Bui, Hung, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 838-845, San Diego, 20-26 June 2005. [ | ]
This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model.

@INPROCEEDINGS { duong_bui_phung_venkatesh_cvpr05, TITLE = { Activity Recognition and Abnormality Detection with the {S}witching {H}idden {S}emi-{M}arkov {M}odel }, AUTHOR = { Duong, Thi and Bui, Hung and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2005 }, ADDRESS = { San Diego }, MONTH = { 20-26 June }, PAGES = { 838--845 }, PUBLISHER = { IEEE Computer Society }, VOLUME = { 1 }, ABSTRACT = { This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model. }, KEYWORDS = { Activity Recognition, Abnormality detection, semi-Markov, hierarchical HSMM }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model
Nguyen, N., Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 955-960, San Diego, 2005. [ | ]
Directly modeling the inherent hierarchy and shared structures of human behaviors, we present an application of the hierarchical hidden Markov model (HHMM) for the problem of activity recognition. We argue that to robustly model and recognize complex human activities, it is crucial to exploit both the natural hierarchical decomposition and shared semantics embedded in the movement trajectories. To this end, we propose the use of the HHMM, a rich stochastic model that has been recently extended to handle shared structures, for representing and recognizing a set of complex indoor activities. Furthermore, in the need of real-time recognition, we propose a Rao-Blackwellised particle filter (RBPF) that efficiently computes the filtering distribution at a constant time complexity for each new observation arrival. The main contributions of this paper lie in the application of the sharedstructure HHMM, the estimation of the model's parameters at all levels simultaneously, and a construction of an RBPF approximate inference scheme. The experimental results in a real-world environment have confirmed our belief that directly modeling shared structures not only reduces computational cost, but also improves recognition accuracy when compared with the tree HHMM and the flat HMM.

@INPROCEEDINGS { nguyen_phung_bui_venkatesh_cvpr05, TITLE = { Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model }, AUTHOR = { Nguyen, N. and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2005 }, ADDRESS = { San Diego }, PAGES = { 955--960 }, PUBLISHER = { IEEE Computer Soceity }, VOLUME = { 1 }, ABSTRACT = { Directly modeling the inherent hierarchy and shared structures of human behaviors, we present an application of the hierarchical hidden Markov model (HHMM) for the problem of activity recognition. We argue that to robustly model and recognize complex human activities, it is crucial to exploit both the natural hierarchical decomposition and shared semantics embedded in the movement trajectories. To this end, we propose the use of the HHMM, a rich stochastic model that has been recently extended to handle shared structures, for representing and recognizing a set of complex indoor activities. Furthermore, in the need of real-time recognition, we propose a Rao-Blackwellised particle filter (RBPF) that efficiently computes the filtering distribution at a constant time complexity for each new observation arrival. The main contributions of this paper lie in the application of the sharedstructure HHMM, the estimation of the model's parameters at all levels simultaneously, and a construction of an RBPF approximate inference scheme. The experimental results in a real-world environment have confirmed our belief that directly modeling shared structures not only reduces computational cost, but also improves recognition accuracy when compared with the tree HHMM and the flat HMM. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models
Phung, Dinh, Duong, Thi, Bui, Hung and Venkatesh, Svetha. In Proc. of ACM Intl. Conf. on Multimedia (ACM-MM), Singapore, 6--11 Nov. 2005. [ | ]
In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling.

@INPROCEEDINGS { phung_duong_bui_venkatesh_acmmm05, TITLE = { Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models }, AUTHOR = { Phung, Dinh and Duong, Thi and Bui, Hung and Venkatesh, Svetha }, BOOKTITLE = { Proc. of ACM Intl. Conf. on Multimedia (ACM-MM) }, YEAR = { 2005 }, ADDRESS = { Singapore }, MONTH = { 6--11 Nov. }, ABSTRACT = { In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Hierarchical Hidden Markov Models with General State Hierarchy
Bui, Hung, Phung, Dinh and Venkatesh, Svetha. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 324-329, San Jose, California, USA, 2004. [ | | pdf]
The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition.

@INPROCEEDINGS { bui_phung_venkatesh_aaai04, TITLE = { Hierarchical Hidden Markov Models with General State Hierarchy }, AUTHOR = { Bui, Hung and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proceedings of the National Conference on Artificial Intelligence (AAAI) }, YEAR = { 2004 }, ADDRESS = { San Jose, California, USA }, EDITOR = { McGuinness, Deborah L. and Ferguson, George }, PAGES = { 324--329 }, PUBLISHER = { MIT Press }, ABSTRACT = { The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition. }, FILE = { :papers\\phung\\bui_phung_venkatesh_aaai04.pdf:PDF }, GROUP = { Statistics, Hierarchical Hidden Markov Models (HMM,HHMM) }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { http://www.aaai.org/Papers/AAAI/2004/AAAI04-052.pdf }, }