Dinh Phung - Main

Data and Codes

NBVAE: Variational Autoencoders for Sparse and Overdispersed Discrete Data
Keywords: deep learning, variational autoencoders (VAE), generative AI, discrete data, collaborative filtering, topic modelling, multi-label learning | Contact: Ethan Zhao

NBVAE (Negative-Binomial VAE) is the software of a state-of-the-art Variational AutoEncoder (VAE) for modelling discrete data such as texts or relational data. NBVAE achieves improved performance on multiple tasks including text analysis, collaborative filtering, and multi-label classification. NBVAE is implemented in TensorFlow and runs efficiently with GPUs.

R-MeN: Transformer-based Relational Memory for Knowledge Graph Embeddings
Keywords: representation learning, knowledge graph embedding, triple classification, search personalization | Contact: Dai Quoc Nguyen

This program provides the python implementation of our KG embedding model R-MeN as described in our ACL 2020 paper, where we integrate transformer self-attention mechanism-based memory interactions with a CNN decoder to capture potential dependencies among relations and entities effectively for triple classification and search personalization.

A Relational Memory-based Embedding Model for Triple Classification and Search Personalization
Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. In Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [ | | pdf]
Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task.

@INPROCEEDINGS { nguyen_etal_acl9_relational, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung }, BOOKTITLE = { Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) }, TITLE = { A Relational Memory-based Embedding Model for Triple Classification and Search Personalization }, YEAR = { 2020 }, ABSTRACT = { Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task. }, FILE = { :nguyen_etal_acl9_relational - A Relational Memory Based Embedding Model for Triple Classification and Search Personalization.PDF:PDF }, URL = { https://arxiv.org/abs/1907.06080 }, }

DCKM: Deep Cost-sensitive Kernel Machine for Binary Software Vulnerability Detection
Keywords: Classification, Cost-sensitive learning, Deep Learning, Kernel Methods | Contact: Tuan Nguyen

This is a python implementation of the Deep Cost-sensitive Kernel Machine (DCKM) model as described in the Deep Cost-sensitive Kernel Machine for Binary Software Vulnerability Detection paper. DCKM model is a combination of a number of diverse techniques, including deep learning, kernel methods, and the cost-sensitive based approach, aiming to detect efficiently potential vulnerabilities in binary software. The model is trained on two binary datasets, NDSS18 and 6 open-source which is a new real-world binary dataset whose source code was collected from six open-source projects.

Deep Cost-sensitive Kernel Machine for Binary Software Vulnerability Detection
Tuan Nguyen, Trung Le, Khanh Nguyen, Olivier de Vel, Paul Montague, John C Grundy and and Dinh Phung. In Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2020. [ | ]
Owing to the sharp rise in the severity of the threats imposed by software vulnerabilities, software vulnerability detection has become an important concern in the software industry, such as the embedded systems industry, and in the field of computer security. Software vulnerability detection can be carried out at the source code or binary level. However, the latter is more impactful and practical since when using commercial software, we usually only possess binary software. In this paper, we leverage deep learning and kernel methods to propose the Deep Cost-sensitive Kernel Machine, a method that inherits the advantages of deep learning methods in efficiently tackling structural data and kernel methods in learning the characteristic of vulnerable binary examples with high generalization capacity. We conduct experiments on two real-world binary datasets. The experimental results have shown a convincing outperformance of our proposed method over the baselines.

@INPROCEEDINGS { nguyen_etal_pakdd20_deepcost, AUTHOR = { Tuan Nguyen and Trung Le and Khanh Nguyen and Olivier de Vel and Paul Montague and John C Grundy and and Dinh Phung }, BOOKTITLE = { Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, TITLE = { Deep Cost-sensitive Kernel Machine for Binary Software Vulnerability Detection }, YEAR = { 2020 }, ABSTRACT = { Owing to the sharp rise in the severity of the threats imposed by software vulnerabilities, software vulnerability detection has become an important concern in the software industry, such as the embedded systems industry, and in the field of computer security. Software vulnerability detection can be carried out at the source code or binary level. However, the latter is more impactful and practical since when using commercial software, we usually only possess binary software. In this paper, we leverage deep learning and kernel methods to propose the Deep Cost-sensitive Kernel Machine, a method that inherits the advantages of deep learning methods in efficiently tackling structural data and kernel methods in learning the characteristic of vulnerable binary examples with high generalization capacity. We conduct experiments on two real-world binary datasets. The experimental results have shown a convincing outperformance of our proposed method over the baselines. }, FILE = { :nguyen_etal_pakdd20_deepcost - Deep Cost Sensitive Kernel Machine for Binary Software Vulnerability Detection.pdf:PDF }, }

SVGD-VR: Stein variational gradient descent with variance reduction
Keywords: probabilistic inference, variance reduction, Stein variational gradient descent | Contact: Nhan Dam

Code to reproduce the results in the paper 'Stein variational gradient descent with variance reduction', IJCNN 2020.

Stein variational gradient descent with variance reduction
Nhan Dam, Trung Le, Viet Huynh and Dinh Phung. In Proc. of the 2020 Int. Joint Conference on Neural Networks (IJCNN), jul 2020. [ | ]
Probabilistic inference is a common and important task in statistical machine learning. The recently proposed Stein variational gradient descent (SVGD) is a generic Bayesian inference method that has been shown to be successfully applied in a wide range of contexts, especially in dealing with large datasets, where existing probabilistic inference methods have been known to be ineffective. In a large-scale data setting, SVGD employs the mini-batch strategy but its mini-batch estimator has large variance, hence compromising its estimation quality in practice. To this end, we propose in this paper a generic SVGD-based inference method that can significantly reduce the variance of mini-batch estimator when working with large datasets. Our experiments on 14 datasets show that the proposed method enjoys substantial and consistent improvements compared with baseline methods in binary classification task and its pseudo-online learning setting, and regression task. Furthermore, our framework is generic and applicable to a wide range of probabilistic inference problems such as in Bayesian neural networks and Markov random fields.

@INPROCEEDINGS { dam_etal_ijcnn20_steinvariational, AUTHOR = { Nhan Dam and Trung Le and Viet Huynh and Dinh Phung }, BOOKTITLE = { Proc. of the 2020 Int. Joint Conference on Neural Networks (IJCNN) }, TITLE = { Stein variational gradient descent with variance reduction }, YEAR = { 2020 }, MONTH = { jul }, ABSTRACT = { Probabilistic inference is a common and important task in statistical machine learning. The recently proposed Stein variational gradient descent (SVGD) is a generic Bayesian inference method that has been shown to be successfully applied in a wide range of contexts, especially in dealing with large datasets, where existing probabilistic inference methods have been known to be ineffective. In a large-scale data setting, SVGD employs the mini-batch strategy but its mini-batch estimator has large variance, hence compromising its estimation quality in practice. To this end, we propose in this paper a generic SVGD-based inference method that can significantly reduce the variance of mini-batch estimator when working with large datasets. Our experiments on 14 datasets show that the proposed method enjoys substantial and consistent improvements compared with baseline methods in binary classification task and its pseudo-online learning setting, and regression task. Furthermore, our framework is generic and applicable to a wide range of probabilistic inference problems such as in Bayesian neural networks and Markov random fields. }, FILE = { :dam_etal_ijcnn20_steinvariational - Stein Variational Gradient Descent with Variance Reduction.pdf:PDF }, }

CapsE: A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization
Keywords: Knowledge graph embedding, knowledge graph completion, search personalization | Contact: Dai Quoc Nguyen

This program provides the python implementation of the capsule network-based model CapsE for the knowledge graph embeddings published at the NAACL 2019 conference.

A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization
Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, USA, jun 2019. [ | | pdf]
In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset.

@INPROCEEDINGS { nguyen_etal_naaclhtl19_acapsule, AUTHOR = { Dai Quoc Nguyen and Thanh Vu and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung }, TITLE = { A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization }, BOOKTITLE = { In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL) }, YEAR = { 2019 }, ADDRESS = { Minneapolis, USA }, MONTH = { jun }, ABSTRACT = { In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. }, FILE = { :nguyen_etal_naaclhtl19_acapsule - A Capsule Network Based Embedding Model for Knowledge Graph Completion and Search Personalization.pdf:PDF }, URL = { https://arxiv.org/abs/1808.04122 }, }

CSMRI-3DCSC: Frequency-splitting Dynamic MRI Reconstruction using Multi-scale 3D Convolutional Sparse Coding and Automatic Parameter Selection
Keywords: Compressed sensing, Dynamic MRI, Parallel MRI, Image reconstruction, Frequency filter, Multi-scale 3D convolutional sparse coding, Elastic net regularization, Total variation, Genetic algorithm, GPU | Contact: Thanh Nguyen-Duc

This repository holds the original Matlab code for the paper "Frequency-splitting dynamic MRI reconstruction using multi-scale 3D convolutional sparse coding and automatic parameter selection". This implemented a method recovers high-frequency information using a shared 3D convolution-based dictionary built progressively during the reconstruction process in an unsupervised manner, while low-frequency information is recovered using a total variation-based energy minimization method that leverages temporal coherence in dynamic MRI.

U2GNN: Transformer for Graph Classification
Keywords: Graph neural network, graph embedding, graph classification, node embedding, transformer-as-graph | Contact: Dai Quoc Nguyen

This program provides the python implementation of our U2GNN as described in our paper "Universal Self-Attention Network for Graph Classification", where we use a transformer self-attention network to learn node and graph embeddings. In general, our supervised and unsupervised U2GNN models produce new highest accuracies on most of the benchmark datasets.

Caps2NE: A Capsule Network-based Model for Learning Node Embeddings
Keywords: Capsule network, node embedding | Contact: Dai Quoc Nguyen

This program provides the python implementation of our unsupervised node embedding model Caps2NE where we use a capsule network to unsupervisedly learn embeddings of nodes in genereated random walks.

MDSeqVAE: Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection
Keywords: Vulnerabilities Detection, Sequential Auto-Encoder, Separable Representation | Contact: Tue Le

This is the python implementation of our SeqVAE as described in our paper "Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection", together with some baselinses (Vanilla RNN, Para2Vec, VulDeePecker) to compare with. Our labeled dataset for binary code vulnerability detection is also provided in the GitHub repository.

Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection
Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel and Lizhen Qu. In International Conference on Learning Representations (ICLR), 2019. [ | | pdf]

@INPROCEEDINGS { le_etal_iclr18_maximal, AUTHOR = { Tue Le and Tuan Nguyen and Trung Le and Dinh Phung and Paul Montague and Olivier De Vel and Lizhen Qu }, TITLE = { Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection }, BOOKTITLE = { International Conference on Learning Representations (ICLR) }, YEAR = { 2019 }, FILE = { :le_etal_iclr18_maximal - Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection.pdf:PDF }, URL = { https://openreview.net/forum?id=ByloIiCqYQ }, }

ConvKB: A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
Keywords: Knowledge graph embedding, knowledge graph completion | Contact: Dai Quoc Nguyen

This program provides the python implementation of the CNN-based model ConvKB for the knowledge base completion task. ConvKB obtains new state-of-the-art results on two standard datasets: WN18RR and FB15k-237 as described in the NAACL 2018 paper.

A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018. [ | | pdf]
We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237.

@INPROCEEDINGS { nguyen_etal_naacl18_anovelembedding, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung }, TITLE = { A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network }, BOOKTITLE = { Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) }, YEAR = { 2018 }, ABSTRACT = { We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. }, FILE = { :nguyen_etal_naacl18_anovelembedding - A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network.pdf:PDF }, URL = { https://arxiv.org/abs/1712.02121 }, }

MGAN: Mixture Generative Adversarial Nets
Keywords: Generative adversarial nets, unsupervised learning | Contact: Quan Hoang

MGAN: Training Generative Adversarial Nets with Multiple Generators
Quan Hoang, Tu Dinh Nguyen, Trung Le and Dinh Phung. In International Conference on Learning Representations (ICLR), 2018. [ | | pdf]
We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators.

@INPROCEEDINGS { hoang_etal_iclr18_mgan, AUTHOR = { Quan Hoang and Tu Dinh Nguyen and Trung Le and Dinh Phung }, TITLE = { {MGAN}: Training Generative Adversarial Nets with Multiple Generators }, BOOKTITLE = { International Conference on Learning Representations (ICLR) }, YEAR = { 2018 }, ABSTRACT = { We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. }, FILE = { :hoang_etal_iclr18_mgan - MGAN_ Training Generative Adversarial Nets with Multiple Generators.pdf:PDF }, URL = { https://openreview.net/forum?id=rkmu5b0a- }, }

GoGP: Fast Online Regression with Gaussian Processes
Keywords: Gaussian process, Online learning, Kernel methods, Random feature, Regression | Contact: Trung Le

GoGP: Fast Online Regression with Gaussian Processes
Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. In International Conference on Data Mining (ICDM), 2017. [ | ]
One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

@INPROCEEDINGS { le_etal_icdm17_gogp, AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung }, TITLE = { {GoGP}: Fast Online Regression with Gaussian Processes }, BOOKTITLE = { International Conference on Data Mining (ICDM) }, YEAR = { 2017 }, ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. }, FILE = { :le_etal_icdm17_gogp - GoGP_ Fast Online Regression with Gaussian Processes.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.09.01 }, }

WMeans: Multilevel Clustering via Wasserstein Means
Keywords: optimal transport, scalable multilevel clustering, | Contact: Nhat Ho/Viet Huynh

Multilevel clustering via Wasserstein means
Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, Viet Huynh and Dinh Phung. In Proc. of the 34th Internaltional Conference on Machine Learning (ICML), pages 1501-1509, 2017. [ | | pdf]
We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach.

@INPROCEEDINGS { ho_etal_icml17multilevel, AUTHOR = { Nhat Ho and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Viet Huynh and Dinh Phung }, TITLE = { Multilevel clustering via {W}asserstein means }, BOOKTITLE = { Proc. of the 34th Internaltional Conference on Machine Learning (ICML) }, YEAR = { 2017 }, VOLUME = { 70 }, SERIES = { ICML'17 }, PAGES = { 1501--1509 }, PUBLISHER = { JMLR.org }, ABSTRACT = { We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. }, ACMID = { 3305536 }, FILE = { :ho_etal_icml17multilevel - Multilevel Clustering Via Wasserstein Means.pdf:PDF }, LOCATION = { Sydney, NSW, Australia }, NUMPAGES = { 9 }, URL = { http://dl.acm.org/citation.cfm?id=3305381.3305536 }, }

NB-SGD: Nonparametric Budgeted Stochastic Gradient Descent
Keywords: optimisation, online learning, SVG | Contact: Trung Le

This is a Matlab code for our proposed method Non-parametric budgeted SGD.

Nonparametric Budgeted Stochastic Gradient Descent
Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS), May 2016. [ | | pdf]

@CONFERENCE { le_nguyen_phung_aistats16nonparametric, AUTHOR = { Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh }, TITLE = { Nonparametric Budgeted Stochastic Gradient Descent }, BOOKTITLE = { 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS) }, YEAR = { 2016 }, MONTH = { May }, FILE = { :le_nguyen_phung_aistats16nonparametric - Nonparametric Budgeted Stochastic Gradient Descent.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://www.jmlr.org/proceedings/papers/v51/le16.pdf }, }

Spark-OLR: One-pass Logistic Regression for Label-drift and Large-scale Classification on Distributed Systems
Keywords: Logistic regression, large-scale classification, label-drift, Apache Spark, distributed system | Contact: Vu Nguyen

One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems
Nguyen, Vu, Nguyen, Tu Dinh, Le, Trung, Phung, Dinh and Venkatesh, Svetha. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1113-1118, Dec 2016. [ | | pdf | code]
Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters.

@CONFERENCE { nguyen_etal_icdm16onepass, AUTHOR = { Nguyen, Vu and Nguyen, Tu Dinh and Le, Trung and Phung, Dinh and Venkatesh, Svetha }, TITLE = { One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems }, BOOKTITLE = { 2016 IEEE 16th International Conference on Data Mining (ICDM) }, YEAR = { 2016 }, PAGES = { 1113-1118 }, MONTH = { Dec }, ABSTRACT = { Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. }, CODE = { https://github.com/ntienvu/ICDM2016_OLR }, DOI = { 10.1109/ICDM.2016.0145 }, FILE = { :nguyen_etal_icdm16onepass - One Pass Logistic Regression for Label Drift and Large Scale Classification on Distributed Systems.pdf:PDF }, KEYWORDS = { Big Data;distributed processing;pattern classification;regression analysis;Big Data cross-validation;Spark-OLR;class labels;data shuffling;distributed systems;execution time;label-drift classification problem;large-scale classification;large-scale datasets;one-pass logistic regression;optimal hyperparameter selection;sparkling OLR;Bayes methods;Big data;Context;Data models;Estimation;Industries;Logistics;Apache Spark;Logistic regression;distributed system;label-drift;large-scale classification }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.09.10 }, URL = { http://ieeexplore.ieee.org/document/7837958/ }, }

MC2SVI: Scalable Variational Inference for Nonparametric Multilevel Clustering
Keywords: Variational Inference, Scalable Nonparametric Bayesian, Multilevel Clustering, Probabilistic Inference, Graphical Models | Contact: Viet Huynh

This python code provides a fast python implementation for the following paper. It provides a multilevel clustering package for grouped data such as collections of text documents or images. But beyond the usual setting, it allows each document to have contextual or meta information such as date, tags, geo-coded information, etc. It is nonparametric in the sense that the number of clusters will be learned automatically based on the principle of Dirichlet processes. The typical sampling-based method will be slow and usually can't scale up. This implementation uses deterministic structured variational inference, hence can scale well to large datasets.

Scalable Nonparametric Bayesian Multilevel Clustering
Viet Huynh, Dinh Phung, Svetha Venkatesh, Xuan-Long Nguyen, Matt Hoffman and Hung Bui. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), pages 289-298, June 2016. [ | | pdf]

@CONFERENCE { huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable, AUTHOR = { Viet Huynh and Dinh Phung and Svetha Venkatesh and Xuan-Long Nguyen and Matt Hoffman and Hung Bui }, TITLE = { Scalable Nonparametric {B}ayesian Multilevel Clustering }, BOOKTITLE = { Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2016 }, MONTH = { June }, PUBLISHER = { AUAI Pres }, PAGES = { 289--298 }, FILE = { :huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable - Scalable Nonparametric Bayesian Multilevel Clustering.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.09 }, URL = { http://auai.org/uai2016/proceedings/papers/262.pdf }, }

ADVS: Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance
Keywords: Abnormal detection, Bayesian nonparametric, User interface, Multilevel data structure, Video segmentation, Spatio-temporal browsing | Contact: Vu Nguyen

Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance
Nguyen, Vu, Phung, Dinh, Pham, Duc-Son and Venkatesh, Svetha. Annals of Data Science (AoDS), 2(1):21-41, March 2015. [ | | pdf]
In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events.

@ARTICLE { nguyen_phung_pham_venkatesh_aods15bayesian, AUTHOR = { Nguyen, Vu and Phung, Dinh and Pham, Duc-Son and Venkatesh, Svetha }, TITLE = { {B}ayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance }, JOURNAL = { Annals of Data Science (AoDS) }, YEAR = { 2015 }, VOLUME = { 2 }, NUMBER = { 1 }, PAGES = { 21--41 }, MONTH = { March }, ABSTRACT = { In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events. }, DOI = { 10.1007/s40745-015-0030-3 }, FILE = { :nguyen_phung_pham_venkatesh_aods15bayesian - Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance.pdf:PDF }, KEYWORDS = { Abnormal detection Bayesian nonparametric User interface Multilevel data structure Video segmentation Spatio-temporal browsing }, OWNER = { dinh }, PUBLISHER = { Springer Berlin Heidelberg }, TIMESTAMP = { 2015.06.10 }, URL = { http://link.springer.com/article/10.1007%2Fs40745-015-0030-3 }, }

SVA-DDPM: Large Sample Asymptotic for Nonparametric Mixture Model with Count Data
Keywords: Bayesian nonparametric, Small Variance Asymptotic, scalable clustering | Contact: Vu Nguyen

Spark-SVM: Sparkling Vector Machines
Keywords: SVM, Apache Spark | Contact: Tu Dinh Nguyen

IBS-AVS: Interactive Browsing System for Anomaly Video Surveillance
Keywords: Robustness, Principal component analysis, Feature extraction, Matrix decomposition, Cameras, Data models, Hidden Markov models | Contact: Vu Nguyen

Interactive Browsing System for Anomaly Video Surveillance
Nguyen, Vu, Phung, Dinh, Sunil, Gupta and Venkatesh, Svetha. In IEEE Intl. Conf. on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pages 384 - 389, 2013. [ | ]
Existing anomaly detection methods in video surveillance exhibit lack of congruence between rare events detected by algorithms and what is considered anomalous by users. This paper introduces a novel browsing model to address this issue, allowing users to interactively examine rare events in an intuitive manner. Introducing a novel way to compute rare motion patterns, we estimate latent factors of foreground motion patterns through Bayesian Nonparametric Factor analysis. Each factor corresponds to a typical motion pattern. A rarity score for each factor is computed, and ordered in decreasing order of rarity, permitting users to browse events using any proportion of rare factors. Rare events correspond to frames that contain the rare factors chosen. We present the user with an interface to inspect events that incorporate these rarest factors in a spatial-temporal manner. We demonstrate the system on a public video data set, showing key aspects of the browsing paradigm.

@INPROCEEDINGS { nguyen_phung_gupta_venkatesh_issnip13, AUTHOR = { Nguyen, Vu and Phung, Dinh and Sunil, Gupta and Venkatesh, Svetha }, TITLE = { Interactive Browsing System for Anomaly Video Surveillance }, BOOKTITLE = { IEEE Intl. Conf. on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP) }, YEAR = { 2013 }, PAGES = { 384 - 389 }, ABSTRACT = { Existing anomaly detection methods in video surveillance exhibit lack of congruence between rare events detected by algorithms and what is considered anomalous by users. This paper introduces a novel browsing model to address this issue, allowing users to interactively examine rare events in an intuitive manner. Introducing a novel way to compute rare motion patterns, we estimate latent factors of foreground motion patterns through Bayesian Nonparametric Factor analysis. Each factor corresponds to a typical motion pattern. A rarity score for each factor is computed, and ordered in decreasing order of rarity, permitting users to browse events using any proportion of rare factors. Rare events correspond to frames that contain the rare factors chosen. We present the user with an interface to inspect events that incorporate these rarest factors in a spatial-temporal manner. We demonstrate the system on a public video data set, showing key aspects of the browsing paradigm. }, FILE = { :nguyen_phung_gupta_venkatesh_issnip13 - Interactive Browsing System for Anomaly Video Surveillance.pdf:PDF }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

MAD: Multi-modal Abnormality Detection in Video with Unknown Data Segmentation
Keywords: Hidden Markov models, Vectors, Data models, Computational modeling, Cameras, Detectors, Surveillance | Contact: Vu Nguyen

Multi-modal Abnormality Detection in Video with Unknown Data Segmentation
Nguyen, Tien Vu, Phung, Dinh, Rana, Santu, Pham, Duc Son and Venkatesh, Svetha. In Intl. Conf. on Pattern Recognition (ICPR), pages 1322-1325, Tsukuba, Japan. IEEE, November 2012. [ | ]
This paper examines a new problem in large scale stream data: abnormality detection which is localised to a data segmentation process. Unlike traditional abnormality detection methods which typically build one unified model across data stream, we propose that building multiple detection models focused on different coherent sections of the video stream would result in better detection performance. One key challenge is to segment the data into coherent sections as the number of segments is not known in advance and can vary greatly across cameras; and a principled way approach is required. To this end, we first employ the recently proposed infinite HMM and collapsed Gibbs inference to automatically infer data segmentation followed by constructing abnormality detection models which are localised to each segmentation.We demonstrate the superior performance of the proposed framework in a realworld surveillance camera data over 14 days.

@INPROCEEDINGS { nguyen_phung_rana_pham_venkatesh_icpr12, AUTHOR = { Nguyen, Tien Vu and Phung, Dinh and Rana, Santu and Pham, Duc Son and Venkatesh, Svetha }, TITLE = { Multi-modal Abnormality Detection in Video with Unknown Data Segmentation }, BOOKTITLE = { Intl. Conf. on Pattern Recognition (ICPR) }, YEAR = { 2012 }, PAGES = { 1322--1325 }, ADDRESS = { Tsukuba, Japan }, MONTH = { November }, ORGANIZATION = { IEEE }, ABSTRACT = { This paper examines a new problem in large scale stream data: abnormality detection which is localised to a data segmentation process. Unlike traditional abnormality detection methods which typically build one unified model across data stream, we propose that building multiple detection models focused on different coherent sections of the video stream would result in better detection performance. One key challenge is to segment the data into coherent sections as the number of segments is not known in advance and can vary greatly across cameras; and a principled way approach is required. To this end, we first employ the recently proposed infinite HMM and collapsed Gibbs inference to automatically infer data segmentation followed by constructing abnormality detection models which are localised to each segmentation.We demonstrate the superior performance of the proposed framework in a realworld surveillance camera data over 14 days. }, FILE = { :nguyen_phung_rana_pham_venkatesh_icpr12 - Multi Modal Abnormality Detection in Video with Unknown Data Segmentation.pdf:PDF }, OWNER = { dinh }, TIMESTAMP = { 2012.06.26 }, }

CxHSMM: Semi-Markov modelling using Coxian phase-type distribution for flat and hierarchical data
Keywords: Coxian, Markov modelling, HMM, semi-HMM, hierarchical HMM, time-series, forcasting, dynamic Baeysian networks, probablistic inference | Contact: Dinh Phung

This Matlab package implements the Coxian Hidden Semi-Markov Models (CxHSMM) described in the following papers:

Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model
Duong, T., Bui, H., Phung, D. and Venkatesh, S.. In IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 838-845, San Diego, 20-26 June 2005. [ | ]
This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model.

@INPROCEEDINGS { duong_bui_phung_venkatesh_cvpr05, TITLE = { Activity Recognition and Abnormality Detection with the {S}witching {H}idden {S}emi-{M}arkov {M}odel }, AUTHOR = { Duong, T. and Bui, H. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2005 }, ADDRESS = { San Diego }, MONTH = { 20-26 June }, PAGES = { 838--845 }, PUBLISHER = { IEEE Computer Society }, VOLUME = { 1 }, ABSTRACT = { This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model. }, KEYWORDS = { Activity Recognition, Abnormality detection, semi-Markov, hierarchical HSMM }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Efficient duration and hierarchical modeling for human activity recognition
Duong, Thi, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. Artificial Intelligence (AIJ), 173(7-8):830-856, 2009. [ | | pdf | code]
A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling.

@ARTICLE { duong_phung_bui_venkatesh_aij09, AUTHOR = { Duong, Thi and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, TITLE = { Efficient duration and hierarchical modeling for human activity recognition }, JOURNAL = { Artificial Intelligence (AIJ) }, YEAR = { 2009 }, VOLUME = { 173 }, NUMBER = { 7-8 }, PAGES = { 830--856 }, ABSTRACT = { A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling. }, CODE = { https://github.com/DASCIMAL/CxHSMM }, COMMENT = { coauthor }, DOI = { http://dx.doi.org/10.1016/j.artint.2008.12.005 }, FILE = { :duong_phung_bui_venkatesh_aij09 - Efficient Duration and Hierarchical Modeling for Human Activity Recognition.pdf:PDF }, KEYWORDS = { activity, recognition, duration modeling, Coxian, Hidden semi-Markov model, HSMM , smart surveillance }, OWNER = { 184698H }, PUBLISHER = { Elsevier }, TIMESTAMP = { 2010.08.11 }, URL = { http://www.sciencedirect.com/science/article/pii/S0004370208002142 }, }

Other related papers are:

Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models
Phung, D., Duong, T., Bui, H. and Venkatesh, S.. In ACM Int. Conf on Multimedia (ACM-MM), Singapore, 6--11 Nov. 2005. [ | ]
In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling.

@INPROCEEDINGS { phung_duong_bui_venkatesh_acmmm05, TITLE = { Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models }, AUTHOR = { Phung, D. and Duong, T. and Bui, H. and Venkatesh, S. }, BOOKTITLE = { ACM Int. Conf on Multimedia (ACM-MM) }, YEAR = { 2005 }, ADDRESS = { Singapore }, MONTH = { 6--11 Nov. }, ABSTRACT = { In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }